What would be your intuition as to which 'quality' of the LLMs this tournament then actually measures? Could we still use it as a proxy for a kind of intelligence, since they need to compensate for the fact that they are not really built to do well in a game like poker?
The tournament measures the cumulative winnings. However, those can be far from the statistical expectation due to the variance of card distribution in poker.
To establish a real winner, you need to play many games:
> As seen in the Claudico match (20), even 80,000 games may not be enough to statistically significantly separate players whose skill differs by a considerable margin [1]
It is possible to reduce the number of required games thanks to variance reduction techniques [1], but I don't think this is what the website does.
To answer the question - "which 'quality' of the LLMs this tournament then actually measures" - since we can't tell the winner reliably, I don't think we can even make particular claims about the LLMs.
However, it could be interesting to analyze the play from a "psychology profile perspective" of dark triad (psychopaths / machiavellians / narcissists).
Essentially, these personality types have been observed to prefer some strategies and this can be quantified [2].
Sorry to be completely off-topic, but: I'm really reluctant to click on anything with these 'share IDs' and usually remove them from any link I share with anyone. I don't want to make it even easier for the platforms to build networks of associated accounts.
It'd be really cool if we had 'upgradable codec FPGAs' in our machines that you could just use flash to the newest codec... but that'd probably be noticeably more expensive, and also not really in the interest of the manufacturers, who want to have reasons to sell new chips.
Back in ~2004, I worked on a project to define a codec virtual machine, with the goal of each file being able to reference the standard it was encoded against, along with a link to a reference decoder built for that VM. My thought was that you could compile that codec for the system you were running on and decode in software, or if a sufficient DSP or FPGA was available, target that.
While it worked, I don't think it ever left my machine. Never moved past software decoding -- I was a broke teen with no access to non-standard hardware. But the idea has stuck with me and feels more relevant than ever, with the proliferation of codecs we're seeing now.
It has the Sufficiently Smart Compiler problem baked in, but I tried to define things to be SIMD-native from the start (which could be split however it needed to be for the hardware) and I suspect it could work. Somehow.
> FPGAs' in our machines that you could just use flash to the newest codec
They're called GPUs... They're ASICs rather than FPGAs, but it's easy to update the driver software to handle new video codecs. The difficulty is motivating GPU manufacturers to do so... They'd rather sell you a new one with newer codec support as a feature.
A lot of the GPUs have fixed function hardware to accelerate parts of encode/decode. If the new codec is compatible, sure.
But often a new codec requires decoders to know how to work with new things that the fixed function hardware likely can't do.
Encoding might actually be different. If your encoder hardware can only do fixed block sizes, and can only detect some types of motion, a driver change might be able to package it up as the new codec. Probably not a lot of benefit, other than ticking a box... but might be useful sometimes. Especially if you say offload motion detection, but the new codec needs different arithmetic encoding, you'd need to use cpu (or general purpose gpu) to do the arithmetic encoding and presumably get a size saving over the old codec.
They’re programmable parallel compute units, and as such pretty much the conceptual opposite of ASICs.
The main point of having ASICs for video codecs these days is efficiency, not being able to real-time decode a stream at all (as even many embedded CPUs can do that at this point).
I think 'ending up with an accidental BahnCard and losing a painful amount of money because of it' might be almost a rite of passage at this point.
Happened to me as well; I had a 'youth' card for people below the age of 27, even remembered that some cards auto-renew and checked online to see if mine would, because I wanted to make sure I wouldn't just get upgraded to the regular and much more expensive BahnCard... couldn't find a renewal date and thought I'd be fine. But apparently I didn't check thoroughly enough, and only got informed of now having 200€ less and a shiny new BahnCard by email. Also emailed support, also didn't get anywhere.
Later I mention this to a friend... and he says 'ah, yeah, same with me'.
I knew without clicking this would be Philosophize This.
I friggin love that podcast, and keep recommending it to friends. The only problem I have with it is that I like to listen to it while driving, but I can't stop to take notes every five minutes.
This podcast is one of my favorites to listen to while out riding my bike. Something about the cardio + his way of breaking down the core meaning behind philosophers' works is just a very edifying and enjoyable experience.
I had no idea who Byung-Chul_Han was before listening to this podcast- he has a lot of interesting things to say about the current state of our capitalist society. ( https://en.wikipedia.org/wiki/Byung-Chul_Han )
I've found the YT transcripts to be severely lacking sometimes, in accuracy and features. Especially speaker identification is really useful if you want to e.g. summarize podcasts or interviews, so if this project here delivers on that then it's definitely better than the YT transcripts.
An approach I've been using recently is to rely on pyannote/tinydiarize only for the speaker_turn timestamps, but prefer the larger model (or in this case YT's autotranscript) for the actual text.
YT transcripts definitely lack speaker ID. LLMs can infer speakers from context but miss nuance without proper speaker recognition.
I have been tackling this while building VideoToBe.com.
My current pipeline is Download Video -> Whisper Transcription with diarization -> Replace speaker tags with AI generated speaker ID + human fallback.
Reliable ML speaker identification is still surprisingly hard.
For podcast summarization, speaker ID is a game-changer vs basic YT transcripts.
I’ve had some success with running them through another LLM to have it clean up the transcription errors based on the context. But this obviously does nothing for speaker identitication.
That depends on the hack. If the hack is something that is traceable to you then the hack becomes fraud and the police will be at your door. This assumes that the likes of Russia and North Korea have decided that there is more value in bitcoin remaining operational than the one time haul of money they can get from the fraud (which to be fair seems unlikely since it is prisoners dilemma where the defector chooses the final round)
North Korea recently executed what I believe is the largest known theft in history, $1.5 billion in ETH stolen from the ByBit exchange. It was easily traceable to a state-run North Korean hacking group. No police at the door, and ETH only had a temporary dip.
I'd think that if NK was sitting on a $1-10 billion Bitcoin bug, they'd execute it too before it got fixed or exploited by someone else.
> If the hack is something that is traceable to you then the hack becomes fraud and the police will be at your door.
That would be somewhat ironic, given the "code is law" mentality of many blockchain proponents.
I don't doubt that many people would file police reports and lawsuits if any fundamental paradigm of blockchain cryptography were to suddenly be revealed as insecure, but I'd be following the lawsuits with a big bowl of popcorn.
I dislike bitcoin, but you gotta admit that that's a rather clever aspect of it: Anybody with the power to destroy it is better off participating in it instead.
We'll need to find our way out of that logic eventually. Scarcity in general and proof of work in particular are terrible bases for an economy. But it is a respectable foe.
it is a prisoner dilemma where the defector controls when the final round is. If you know of a flaw you can win more long term by not exploiting it - but if someone else exploits it bitcoin becomes worthless. Thus if you know of a flaw there is pressure to exploit it first before someone else gets the benifits of defecting and ends the game
It depends on the flaw, for most of the attack surface that bitcoin has, your "flaw" is just an unfair advantage against the other miners, which you'll likely keep secret and keep on mining. That's not exactly a "bitcoin becomes worthless" scenario, it's not really that different from a halving, which are block-height-scheduled events built into the protocol.