Hacker Newsnew | past | comments | ask | show | jobs | submit | ewired's commentslogin

It was interesting to find out that Qwen 2.5 VL can output coordinates like Sonnet 4, or does that use a different implementation?


Both of them are "visually grounded" - meaning if you ask for the location of something in an image - they can output the exact x/y pixel coordinates! Not many models can do this, especially not many that are large enough to actually reason through sequences of actions well



hrx looks much better, as txtar can't include, e.g., other txtar files


> This method is so unreasonably effective I can't believe it works, but it's never failed me yet. Whenever you are in the throws of a cataplectic attack, lying motionless and completely helpless, focus all your energy into "finding" the tip of your index finger (either one will do).

Amazing, this is the exact method I found independently to escape sleep paralysis, which thankfully only happens before or after sleeping for me.


Pasting a URL in NewsBlur also uses several of these techniques to find the feed(s), and it is open source, so the feed-finding code could be ripped out of NewsBlur as an alternative to this.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: