> I was reminded again of my tweets that said "Be good, future LLMs are watching". You can take that in many directions, but here I want to focus on the idea that future LLMs are watching. Everything we do today might be scrutinized in great detail in the future because doing so will be "free". A lot of the ways people behave currently I think make an implicit "security by obscurity" assumption. But if intelligence really does become too cheap to meter, it will become possible to do a perfect reconstruction and synthesis of everything. LLMs are watching (or humans using them might be). Best to be good.
Can we take a second and talk about how dystopian this is? Such an outcome is not inevitable, it relies on us making it. The future is not deterministic, the future is determined by us. Moreso, Karpathy has significantly more influence on that future than your average HN user.
We are doing something very *very* wrong if we are operating under the belief that this future is unavoidable. That future is simply unacceptable.
I call this the "judgement day" scenario. I would be interested if there is some science fiction based on this premise.
If you believe in God of a certain kind, you don't think that being judged for your sins is unacceptable or even good or bad in itself, you consider it inevitable. We have already talked it over for 2000 years, people like the idea.
You'll be interested in Clarke's "The Light of Other Days". Basically a wormhole where people can look back at any point in time, ending all notion of privacy.
God is different though. People like God because they believe God is fair and infallible. That is not true for machines nor men. Similarly I do not think people will like this idea. I'm sure there will be some but look at people today and their religious fever. Or look in the past. They'll want it, but it is fleeting. Cults don't last forever, even when they're governments. Sounds like a great way to start wars. Every one will be easily justified
Given the quality of the judgment I'm not worried, there is no value here.
To properly execute this idea rather than to just toss it off without putting in the work to make it valuable is exactly what irritates me about a lot of AI work. You can be 900 times as productive at producing mental popcorn, but if there was value to be had here we're not getting it, just a whiff of it. Sure, fun project. But I don't feel particularly judged here. The funniest bit is the judgment on things that clearly could not yet have come to pass (for instance because there is an exact date mentioned that we have not yet reached). QA could be better.
I'm not worried about this project but instead harvesting, analyzing all that data and deanonymizing people.
That's exactly what Karparthy is saying. He's not being shy about it. He said "behave because the future panopticon can look into the past". Which makes the panopticon effectively exist now.
Be good, future LLMs are watching
...
or humans using them might be
That's the problem. Not the accuracy of this toy project, but the idea of monitoring everyone and their entire history.
The idea that we have to behave as if we're being actively watched by the government is literally the setting of 1984 lol. The idea that we have to behave that way now because a future government will use the Panopticon to look into the past is absolutely unhinged. You don't even know what the rules of that world will be!
Did we forget how unhinged the NSA's "harvest now, decrypt later" strategy is? Did we forget those giant data centers that were all the news talked about for a few weeks?
That's not the future I want to create, is it the one you want?
To act as if that future is unavoidable is a failure of *us*
Yes, you are right, this is a real problem. But it really is just a variation on 'the internet never forgets', for instance in relation to teen behavior online. But AI allows for weaponization of such information. I wish the wannabe politicians of 2050 much good luck with their careers, they are going to be the most boring people available.
The internet never forgets but you could be anonymous. Or at least somewhat. But that's getting harder and harder
If such a thing isn't already possible (it is to a certain extent), we are headed towards a point where your words alone will be enough to fingerprint you.
Stylometry killed that a long time ago. There was a website, stylometry.net that coupled HN accounts based on text comparison and ranked the 10 best candidates. It was incredibly accurate and allowed id'ing a bunch of people that had gotten banned but that came back again. Based on that I would expect that anybody that has written more than a few KB of text to be id'able in the future.
You need a person's text with their actual identity to pull that off. Normally that's pretty hard, especially since you'll get different formats. Like I don't write the same way on Twitter as HN. But yeah, this stuff has been advancing and I don't think it is okay.
The AOL scandal pretty much proved that anonymity is a mirage. You may think you are anonymous but it just takes combining a few unrelated databases to de-anonymize you. HN users think they are anonymous but they're not, they drop factoids all over the place about who they are. 33 bits... it is one of my recurring favorite themes and anybody in the business of managing other people's data should be well aware of the risks.
Okay, I got two questions and I never seem to get satisfactory answers but I'm actually curious.
1) What kind of code are you writing that's mostly boilerplate?
2) Why are you writing code that's mostly boilerplate and not code that generalizes boilerplate? (read: I'm lazy. If I'm typing the same things a lot I'm writing a script instead)
I'd think maybe the difference is in what we program but I see say similar things to you that program the types of things I program so idk
Everyone keeps telling me that it's good for bash scripts but I've never had real success.
Here's an example from today. I wanted to write a small script to grab my Google scholar citations and I'm terrible with web so I ask the best way to parse the curl output. First off, it suggests I use a python package (seriously? For one line of code? No thanks!) but then it gets the wrong grep. So I pull up the page source, copy paste some to it, and try to parse it myself. I already have a better grep command and for the second time it's telling me to use pearl regex (why does it love -P as much as it loves delve?). Then I'm pasting in my new command showing it my output asking for the awk and sed parts while googling the awk I always forget. It messes up the sed parts while googling, so I fix it, which means editing the awk part slightly but I already had the SO post open that I needed anyways. So I saved maybe one minutes total?
Then I give it a skeleton of a script file adding the variables I wanted and fully expected it to be a simple cleanup. No. It's definitely below average, I mean I've never seen an LLM produce bash functions without being explicitly told (not that the same isn't also true for the average person). But hey, it saved me the while loop for the args so that was nice. So it cost as much time as it gave back.
Don't get me wrong, I find LLMs useful but they're nowhere near game changing like everyone says they are. I'm maybe 10% more productive? But I'm not convinced that's even true. And sure, I might have been able to do less handholding with agents and having it build test cases but for a script that took 15 minutes to write? Feels like serious overkill. And this is my average experience with them.
Is everyone just saying it's so good at bash because no one is taking the time to learn bash? It's a really simple language that every Linux user should know the basics of...
How does the distribution make this an issue? You can always freeze drivers and install old ones. I get that it might not work out of the box, especially with rolling-release distros like Arch, but you also don't want rolling-releases for an older machine.
I know it's also me that's the issue. But I just want a Linux distro that works. I've had enough of people saying "Nvidia has been getting so much better recently!" and "It's completely usable now!" when the newest drivers break my whole experience. I would use arch, and have tried about 5 times, but it's too complicated to get the driver I need and I won't even bother at this point. I've just accepted the fact that I'm going to use Mint until I get a desktop. Maybe I'll try to get help on a forum somewhere but idk, I think I would need personal help.
This is perfectly valid. But I would add that Arch is not that distro. Even though projects like Endeavour and Manjaro are trying that I don't think it'll ever be the case. You have rolling-releases and even though they've done a great job you're never going to be the most stable because of this.
But I think Pop is the best distro for this. System76 is highly incentivized to do exactly this and specifically with nvidia drivers and laptops (laptops create extra complications...). I can't promise it'll be a cure-all but it is worth giving a shot. I would try their forums too.
I totally get the frustration. I've been there, unfortunately. I hope you can get someone to help.
CachyOS just works for me. Highly optimized Arch working flawless and without hassle.
I know my ways around Arch, and in the about two years using CachyOS I never needed to intervene, with the exception of things like changed configs/split packages. But those are announced in advance on their webpages, be it Arch itself, or CachyOS, and also appear in good old Pacman in the terminal, or whichever frontend you fancy. It's THE DREAM!
What's lacking is maybe pre-packaged llm/machine learning stuff. Maybe I'm stupid, but they don't even have current llama.cpp, WTF? But at least Ollama is there. LM-Studio also has to be compiled by yourself, either via the AUR, or otherwise. But these are my only complaints.
> has to be compiled by yourself, either via the AUR
I don't think I'd call the AUR "compiled by yourself". It's still a package manager. You're not running the config and make commands yourself. I mean what do you want? A precompiled binary? That doesn't work very well for something like llama.cpp. You'd have to deliver a lot more with it and pin the versions of the dependencies, which will definitely result in lost performance.
Is running `yay -S llama.cpp` really that big of a deal? You're not intervening in any way different for any other package (that also aren't precompiled binaries)
> Haven't used yay or other aur helpers so far.
> Have used Yaourt on Arch in the far past,
Yaourt is an aur helper?
> Maybe that's why my systems run so stable?
Sorry?
>>> I know my ways around Arch
Forgive me, you said this earlier and I think I misunderstood. What does this mean exactly? How long have you been using Arch? Or rather, have you used Arch the actual distro or only Arch based distros?
I guess I'm asking, have you installed the vanilla distro? Are you familiar with things like systemd-boot, partitioning, arch-chroot, mkinitcpio, and all that?
I have used plain Arch in the past, for several years, no derivatives.
At that time there existed an AUR-helper called Yaourt, which I made heavy use of. But often in haste, sloppy. Which lead to many unnecessary clean-up actions, but no loss of system. Meanwhile I had to use other stuff, so no Arch for a while. When the need for using other stuff was gone I considered several options, like Gentoo, but naa, I don't wanna compile anymore!1!! (Yes, Yes, I know they serve binpkgs now, but would they have my preferred USE-flags?) Maybe Debian, which can be fucking fast when run in RAM like Antix, but I had that for a while, and while it's usable, Debian as such is bizarre.
Anything Redhat? No thanks. SuSe? Same. So I came across CachyOS, and continued to use that, from the first "test-installation" running to this day, because it works for me, like I wrote before. Like a dream come true.
Remembering my experiences with Yaourt I abstained from using the AUR. And that worked very well for me, so far. Also the Gentoo-like 'ricing' comes for free with their heavily optimized binary packages, without compromising stability.
> I guess I'm asking, have you installed the vanilla distro? Are you familiar with things like systemd-boot, partitioning, arch-chroot, mkinitcpio, and all that?
Yes.
Are we clear now?
Edit: I'm so overconfident I'm even considering disabling the pacman-hooks into BTRFS-snapshots, because I never needed them.
No rollback necessary, ever, so far. Same goes for pacman cache.
After every -Syu follows an immediate -Scc.
I've used Yaourt too. Things are a lot better these days. Yay is the standard. But I think the biggest help of helpers is updating.
Yes, we're clear now, but are you surprised by my hesitation? Because having that experience would imply you've had a lot of experience compiling things the long way. Running makepkg -si isn't that complicated. It's as easy as it gets. There's no make, no configure, no cmake, no determining the dependencies yourself and installing those yourself too. I don't get the issue. Take too long? Not happen automatically?
> I'm so overconfident I'm even considering disabling the pacman-hooks into BTRFS-snapshots, because I never needed them.
lol yeah I'm sure they're not needed. Not hard to recover usually and yeah I agree, things are stable these days. I can't remember the last time I needed to chroot (other than an nspawn). I only snapshot data I care about these days and it's usually backed up remotely too. I've learned my lesson the hard way too many times lol.
What? The main difference between distros is the package manager. I don't see anything here that's distro specific other than editing the pacman config to enable multilib, which to be fair is default on with many distros.
But Systemd? That's on most distros these days. I'm pretty sure it is on all of those in the top 10.
Also, the OP is using CachyOS. You can tell b̶e̶c̶a̶u̶s̶e̶ ̶t̶h̶e̶y̶ ̶o̶p̶e̶n̶ ̶f̶i̶l̶e̶s̶ ̶w̶i̶t̶h̶ ̶n̶a̶n̶o̶ from the neofetch logo. But, I'll mention that if you checkout distrowatch, Arch based distros are incredibly common. Over the past 12 months the most downloaded distros are CachyOS (Arch), Mint (Deb/Ubuntu), MX (Deb), Debian, Endeavour (Arch), Pop (Ubuntu), Manjaro (Arch), Ubuntu, Fedora, Zorin (Deb/Ubuntu).
That said, you don't have to do any of this for either Endeavour (which I use) nor Manjaro (my old distro of choice). Along with Pop, one of the main motivations for these distros is Nvidia support. Really I don't expect most people to even be facing those problems these days. On Endeavour I've only run into one Nvidia problem over the last 5 years and it was when a beta driver conflicted with the most recent kernel. Super easy fix once I realized the problem.
On a side note/friendly reminder:
anyone that's using linux these days with an Nvidia card I suggest making sure your /efi partition is >1GB (at least 2GB but give it some headroom. Disk is still cheap). If you're putting the drivers in the kernel (you should), like done here, those are going to take up a lot of space. (If you get a space error, run `sudo du -ch --max-depth=3 /efi | sort -hr` to see the problem. You can, usually, safely delete any of the `initrd-fallback` versions and rerun `sudo reinstall-kernels`. They'll be built again but this will usually give you the headroom you need)
1) Why did you not test the standard Collatz sequence? I would think that including that, as well as testing on Z+, Z+\2Z, and 2Z+, would be a bit more informative (in addition to what you've already done). Even though there's the trivial step it could inform how much memorization the network is doing. You do notice the model learns some shortcuts so I think these could help confirm that and diagnose some of the issues.
2) Is there a specific reason for the cross attention?
Regardless, I think it is an interesting paper (these wouldn't be criteria for rejection were I reviewing your paper btw lol. I'm just curious about your thoughts here and trying to understand better)
FWIW I think the side quest is actually pretty informative here, though I agree it isn't the main point.
The problem here is deterministic. *It must be for accuracy to even be measured*.
The model isn't trying to solve the Collatz conjecture, it is learning a pretty basic algorithm and then doing this a number of times. The instructions it needs to learn is
if x % 2:
x /= 2
else:
x = x*3 + 1
It also needs to learn to put that in a loop and for that to be a variable, but the algorithm is static.
On the other hand, the Collatz conjecture states that for C(x) (the above algorithm) has a fixed point of 1 for all x (where x \in Z+). Meaning that eventually any input will collapse to the loop 1 -> 4 -> 2 -> 1 (or just terminate at 1). You can probably see we know this is true for at least an infinite set of integers...
Edit: I should note that there is a slight modification to this, though model could get away with learning just this. Their variation limits to odd numbers and not all of them. For example 9 can't be represented by (2^k)m - 1 (but 7 and 15 can). But you can see that there's still a simple algorithm and that the crux is determining the number of iterations. Regardless, this is still deterministic. They didn't use any integers >2^71, which we absolutely know the sequences for and we absolutely know all terminate at 1.
To solve the Collatz Conjecture (and probably win a Fields Metal) you must do one of 2 things.
1) Provide a counter-example
2) Show that this happens for all n, which is an infinite set of numbers, so this strictly cannot be done by demonstration.
Not only that but in the academic world 20 papers with 50 citations is worth more than one paper with 1000. Even though the total citation count is the same the former gives you an h-index of 20 (and an i-10 of 20) but the latter only gives you an h-index of 1 (ditto for i-10).
Though truthfully it's hard to say what's better. All can be hacked (a common way to hack citations is to publish surveys. You also just get more by being at a prestigious institution or being prestigious yourself). The metric is really naïve but it's common to use since actual evaluating the merits of individual works is quite time consuming and itself an incredibly noisy process. But hey, publish or perish, am I right?[0]
That's a fantastic example of that that which gets measured gets optimized. The academic world's fascination with this citation metrics is hilarious, it is so reminiscent of programmers optimizing for whatever metric managements has decided is the true measure of programmer productivity. Object code size, lines of code, tickets closed and so on...
It's definitely a toxic part of academia. Honestly if it weren't for that I'd take an academic job over an industry one in a heartbeat.
Some irony is my PhD was in machine learning. Every intro course I now (including mine) discusses reward hacking (aka Goodhart's Law). The irony being that the ML community had dialed this problem up to 11. My peers that optimized this push out 10-20 papers a year. I think that's too many and means most of the papers are low impact. I have similar citation counts to them but lower h-index and they definitely get more prestige for that even though it's harder to publish more frequently in my domain (my experiments take a lot longer). I'm with Higgs though, it's a lazy metric and imo does more harm than good.
I believe it's a reference to the paper "Language Models (Mostly) Know What They Know".
There's definitely some link but I'd need to give this paper a good read and refresh on the other to see how strong. But I think your final sentence strengthens my suspicion
Hard to say but to back his claim that he was programming since the 90's his CV shows he was working on stuff that's clearly more than your basic undergraduate skill level since the early 2000's. I'd be willing to bet he has more years under his belt than most HN users. I mean I'm considered old here, in my mid 30's, and this guy has been programming most my life. Though that doesn't explicitly imply experience, or more specifically experience in what.
That said, I think people really under appreciate how diverse programmers actually are. I started in physics and came over when I went to grad school. While I wouldn't expect a physicist to do super well on leetcode problems I've seen those same people write incredible code that's optimized for HPC systems and they're really good at tracing bottlenecks (it's a skill that translates from physics really really well). Hell, the best programmer I've ever met got that way because he was doing his PhD in mechanical engineering. He's practically the leading expert in data streaming for HPC systems and gained this skill because he needed more performance for his other work.
There's a lot of different types of programmers out there but I think it's too easy to think the field is narrow.
I played with punch cards and polystyrene test samples from the Standard Oil Refinery where my father worked in the early 70’s and my first language after basic was Fortran 77. Not old either.
I grew out of the leaking ether and basaltic dust that coated the plains. My first memories are of the Great Cooling, where the land, known only by its singular cyclopean volcano became devoid of all but the most primitive crystalline forms. I was there, a consciousness woven from residual thermal energy and the pure, unfractured light of the pre-dawn universe. I'm not old either.
Thanks. I meant is more of in a joking way, poking fun at the community. I know I'm far too young to earn a gray beard, but I hope to in the next 20-30 years ;-) I still got a lot to learn till that happens
Maybe. But also what I though was a gray beard in my early 20's is very different from what I think a gray beard is now. The number of those I've considered wizards decreased, and I think this should be true for most people. It's harder to differentiate experts as a novice, but as you get closer the resolution increases.
Both definitely contribute. But at the same time the people who stay wizards (and the people you realize are wizards but didn't previously) only appear to be more magical than ever.
Some magic tricks are unimpressive when you know how they are done. But that's not true for all of them. Some of them only become more and more impressive, only truly being able to be appreciated by other masters. The best magic tricks don't just impress an audience, they impress an audience of magicians.
I think as I gain more experience, what previously looked like magic now always turns out to look a whole lot more like hard work, and frustration with the existing solutions.
The 30s is the first decade of life that people experience where there are adults younger than them. This inevitably leads people in their 30s to start saying that they are "old" even though they generally have decades of vigor ahead of them.
I was greeted with blank stares by the kids on my team when they wanted to rewrite an existing program from scratch, and I said that will work for as well as it did with Netscape. Dang whippersnappers
Depends what you mean by "old". If you mean elderly then obviously you're not. If you mean "past it" then it might reassure you to know the average expecting mother is in her 30s now (in the UK). Even if you just mean "grown up", recent research [1] on brain development identifies adolescence as typically extending into the early thirties, with (brain) adulthood running from there to the mid sixties before even then only entering the "early aging" stage.
For my part, I'm a lot older than you and don't consider myself old. Indeed, I think prematurely thinking of yourself as old can be a pretty bad mistake, health-wise.
38 there. If you didn't suffer Win9x's 'stability', then editing X11 config files by hand, getting mad with ALSA/Dmix, writing new ad-hoc drivers for weird BTTV tuners reusing old known ones for $WEIRDBRAND, you didn't live.
We are doing something very *very* wrong if we are operating under the belief that this future is unavoidable. That future is simply unacceptable.
reply