Hacker Newsnew | past | comments | ask | show | jobs | submit | looobay's commentslogin

In the Pavel case, it involved child pornography groups on Telegram and the fact that they ignore a court order.

But I agree with you for the authoritarian logics in Europe (even America) with Chat Control and other actions like the French gov. just did....


really cool!


That's an awesome project! It's literally a gold mine lol. Congrats and thank you for this!


It means something that is too out-of-data. For example if you try to make an LLM write a program in a strange or very new language it will struggle in non-trivial tasks.


I understand what "a new problem for an LLM is", my question is about what in the math discussion qualifies as a one.

I see references to "improvements", "optimizing" and what I would describe as "iterating over existing solutions" work, not something that's "new". But as I'm not well versed into maths I was hoping that someone that considers the thread as definite proof for that, like parent seems to be, is capable of offering a dumbed down explanation for the five year olds among us. :)


There was research on LLMs training and distillation that if two models have a similar architecture (probably the case for Xai) the "master" model will distill knowledge to the model even if its not in the distillation data. So they probably need to train a new model from scratch.

(sorry i don't remember the name but there was an example with a model liking howl to showcase this)



If true, bad news for Elon Musk and xAI because they have to start over. He's already indicated this in regards to Wikipedia. He wants to train on Grokepedia and not Wikipedia. Removing NSFW material gives him another reason.


He received money from Libya for his presidential campaign [0], he's just a criminal ex-president...

[0]: https://en.wikipedia.org/wiki/Libyan_financing_in_the_2007_F...


It's way more complicated than this.

First, this is mostly about things that happened before his election.

The tribunal ruled he did not personally benefit, and he did not directly solicit money to finance his campaign either.

However, some of his closest allies (who would become his ministers later) did the latter. The tribunal could not find any direct proof he was involved but ruled there were enough "converging indications" that he knew and did nothing to stop it.


To be fair, the probability that the short explanation "He received money from Libya for his presidential campaign" is actually the truth is very high.

There is no formal proofs, but as you say, (the judges deliberated that) there is enough "converging indications" to support the idea that the short explanation is true.


I'm sure the court could have gotten him on other charges, but they went with the absolutely 100% safe one rather than the other 99% safe ones.

Sarkozy and all of his billionaire media allies are already trying their hardest to undermine the credibility of the justice system at every turn with extremely dangerous rhetoric; I dread to imagine what this would have been like had they gone with ever-so-slightly-less-safe charges


How would one differentiate Sarkozy being in the know, and one of Sarkozy's inner circle doing it and keeping him in the dark?


That is a very good question.

The short answer is you can't. But There is enough hints that he maybe implicated at least as much as his collaborators.

One for example, is a testimony of a "smuggler" that he deposited the dirty money 2 times to his collaborator and once directly to Sarkozy. Not enough, he could lie.

A write-up of a meeting preparing the coming of Sarkozy (in arabic) that suggests there is another important subject to the visit of Sarkozy in Lybia. In a way that coincides, we know that the discuss alone (Gaddafy, Sarkozy and Guéant without any diplomatic representative only translators). Not enough, maybe it was another secret subject.

That may explains the famous trip of Gaddafy in Paris. (10 of December 2007, which was an unexpected move regarding his implication in multiple "plane terrorist attacks" (DC10 UTA ( UTA 772),Pan Am Flight 103 (Lockerbie)) and the "greatness" of the trip which was in "great fanfare" very uncommon one. https://abcnews.go.com/International/story?id=3984020 Maybe Sarkozy really trade welcoming trip for good contracts but nobody trusts that.

It can also explains the implication of Sarkozy in nato air strike on Lybia to help the rebels (that leads to Gaddafy death). Gaddafy may have ask for help to interfere the revolt, and Sarkozy couln't politicaly explains it so did the opposite. At that moment, Lybia official reported that he must get the money back and that he was financed by their money (one of the two who reported it is dead, the other one is in exile and it's more complicated because he first support Sarkozy to get extracted from Lybia as he was caught by the rebels). At that time, nobody trusted the Lybia representative as the regime was a terrorist state.

Sooo, you can't tell that he knows, but it does explains a lot.


If the justice system doesn’t know exactly why it’s putting Sarkozy in prison, he does...


This is disinformation.

The tribunal didn't rule he didn't personally benefit. It ruled that he conspired to corrupt the leaders of Lybia to steal money from the Lybian people and fund his electoral campaign. In my book becoming president of France is certainly a "personal benefit". There are numerous factual evidence, documents from Lybia, fund transfers, secret meetings of his closest friends with Abdullah Senussi, who has been convicted to life in prison in France for orchestrating the bombing of UTA flight 772 which resulted in 170 deaths and is also currently investigated for another plane bombing.

The money he got allowed him to spend about twice the allowed amount on his campaign, giving him an unfair advantage in the election. In other words he dealt with terrorists to potentially steal the presidential election. What Sarkozy did is extremely severe, I'd call that high treason. He got far less that he deserved.

Also it's worth mentioning that it is his third conviction. He already got a 2 years and 1 year sentence which were confirmed in appeal in other cases.


I read that the ruling mention that they couldn't prove the money was used for the campaign and that the conviction is all about the participation in the conspiration you mention.

To be honest, what I would want to know is if he sent us to war in Libya to hide his crimes. That would be the real evil to me.

Getting him to jail for asking someone for campaign money really gives a weird feeling in that sense.


> The tribunal ruled he did not personally benefit

the money didn't go in his pocket, but he benefited from it by being elected president (partly thanks to this illegal funding), which to this day gives him a life of money and various privileges.


Not just from Libya, he met & received money from the brother in law of the Libyan dictator Muhamar Kadhaffi

The brother in law personally orchestrated the crashe of a civilian airliner, killing 170 passengers


Not only this, but he plotted to whitewash the terrorist responsible for a terror attack on a plane which killed more French people than the terror attacks of the Bataclan... this guy is despicable and merits to be behind bars


This was 36 years ago. He became president 18 years ago, and only now in prison. Justice sure takes its time. I used to live in the same street as this prison, it's only a 5 km walk to Elysée.


LLMs are compute heavy with quadratic scaling (in compute) per tokens. They are trying to compress text tokens into visual tokens with their VLM.

Maybe they would render texts to an image before tokenizing to reduce the compute cost.


But naively wouldn't you expect the representation of a piece of text in terms of vision tokens to be roughly the same number of bits (or more) than the representation as textual token? You're changing representation sure, but that by itself doesn't give you any compute advantages unless there is some sparsity/compressability you can take advantage of in the domain you transform to right?

So I guess my question is where is the juice being squeezed from, why does the vision token representation end up being more efficient than text tokens.


The trick is that the vision tokens are continuous valued vectors, while the text tokens are elements from a small discrete set (which are converted into continuous valued vectors by a lookup table). So, vision tokens can convey significantly more bits per token than text tokens. This allows them to pack the content of multiple text tokens into a single vision token.


Couldn't you do something like add a bidirectional encoder after your embedding look up table to compress your text into some smaller token-count semantic space before feeding your transformer blocks to get a similar effect, then?


Yes, you can get good compression of a long sequence of "base" text tokens into a shorter sequence of "meta" text tokens, where each meta token represents the information from multiple base tokens. But, grouping a fixed number of base tokens into each meta token isn't ideal, since that won't align neatly with sensible semantic boundaries, like words, phrases, sentences, etc. So, the trick is how decide which base tokens should be grouped into each meta token....

This sort of "dynamic chunking" of low-level information, perhaps down to the level of raw bytes, into shorter sequences of meta tokens for input to some big sequence processing model is an active area of research. Eg, one neat paper exploring this direction is: "Dynamic Chunking for End-to-End Hierarchical Sequence Modeling" [1], from one of the main guys behind Mamba and other major advances in state-space models.

[1] - https://arxiv.org/abs/2507.07955


Vision is how humans see text. So text must have built in adaptations to protect from visual noise. For example, two words that look similar must never appear in similar contexts, or else they would be conflated. Hence we can safely reduce such words to the same token. Or something like that.


That also works purely on text and it's the trick I used in my German speech recognition engine ( https://arxiv.org/abs/2206.12693 ).

"I'm studying at Oxford Univ" has basically no loss in meaning even though "University" was truncated to less than half its characters.


This is like how many CLIs accept the shortest unique version of commands.


Is that really factual/true?

Lots of words have multiple meanings and can mean different things even if used in the same sentence/context just from the interpretation of the person reading it.

Heck, it'd argue that most (not all) dayjob conflicts are down to such differences in interpretation /miscommunications


A text token generally represents a portion of a single word, while a vision token represents a portion of the entire page, which may include multiple words. This is where the "compression factor" comes from.

The number of bits to represent a text or vision token is the same, since they are both represented as embeddings of a fixed number of dimensions defined by the Transformer (maybe a few thousand for a large SOTA model).

Whether a vision token actually contains enough information to accurately extract (OCR) all the text data from that portion of the image is going to depend on how many pixels that vision token represents and how many words were present in that area of the image. It's just like considering images of the same page of text at different resolutions - a 1024x1024 image vs a 64x64 one, etc. As the resolution decreases so will OCR accuracy. At some point the resolution is insufficient and the words become a blurry mess and OCR accuracy suffers.

This is what DeepSeek are reporting - OCR accuracy if you try to use a single vision token to represent, say, 10 text tokens, vs 20 text tokens. The vision token may have enough resolution to represent 10 tokens well, but not enough for 20.


I wonder if text written using chinese characters is more compatible with such vision centric compression than latin text.


I think it's not the case. Chinese characters have the highest information entropy of all writing systems. However, Chinese characters are all independent symbols, which means if you want the LLM to support 5000 Chinese characters, you need to put 5000 characters into the lookup table (obviously there's no root, prefix, and suffix in Chinese, you cannot split the character into multiple reusable word pieces). As a result, you may need fewer characters to represent the same meaning compared to latin languages, but LLMs may also need to activate more token embeddings.


Vision tokens are a good compression medium because with one vision token you have one vector of N elements, but with textual tokens you have M vectors of N elements, because one vision token represent multiple pixels (and possibly multiple words). This is why its a good compression medium for compute.

It will never be as precise as textual tokens but it can be really good as they show in the paper.


>with one vision token you have one vector of N elements, but with textual tokens you have M vectors of N elements

Each vision token represents a 16x16 patch, but to fully cover a word you might need multiple vision tokens. So assuming that the embedding size of the vision token and text token is the same `d` (which I think has to be the case for multimodal models), then wouldn't the fair comparison be `x * d` elements for a sentence in terms of vision tokens, and `y * d` for the same sentence in terms of text tokens? I don't see how you could see a priori that x << y (especially by a factor of 10 as quoted in the paper).

That said, if I do experimentally try this by shrinking this very comment down to the smallest font size I can read it at, then seeing how many 16x16 tokens it takes, you can fit more text than I expected in each "vision token". So I can maybe buy that x is at least not greater than y. But it can't be as simple as "each vision token can cover more text", since that only enables better compression if the encoder can actually uncover some sort of redundancy within each token. (And presumably the type of redundancy it uncovers probably isn't something that "classical" compression techniques can exploit, otherwise it seems like it would have been tried by now?).


You should read the 6th page of the paper (and page 5 for architecture breakdown), they show that they are compressing the vision tokens with convolution to keep a strong semantic understanding and keep a small amount of tokens.

But I think it's still experimentall.


just a hunch but like, from something to do with Unicode?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: