More

baby · 2026-02-11T14:04:28 1770818668

I’ve always thought that it’s crazy how so many extensions can basically read the content of the webpages your browse. I’m wondering if the research should go further: find all extensions that have URLs backed in them or hashes (of domains?) then check what they do when you visit these URLs

qcontinuum1 · 2026-02-11T14:11:44 1770819104

Without any doubt the research could continue on this. We had many opportunities to make the scan even wider and almost certainly we would uncover more extensions. The number of leaking extensions should not be taken as definite.

There are resource constrains. Those extensions try to actively detect if you are in developer mode. Took us a while to avoid such measures and we are certain we missed many extensions due to for example usage of Docker container. Ideally you want to use env as close to the real one as possible.

Without infrastructure this doesn't scale.

The same goes for the code analysis you have proposed. There are already tools that do that (see Secure Annex). Often the extensions download remote code that is responsible for data exfiltration or the code is obfuscated multiple times. Ideally you want to run the extension in browser and inspect its code during execution.

baby · 2026-02-11T00:31:08 1770769868

Why do you want to preserve that artifact?

nihonde · 2026-02-11T01:00:21 1770771621

If you don't have a record of questions asked/answered and rationale for decisions taken, I've noticed it's easy for subsequent feature plans to clash. Maintaining a line of consistency across each feature plan is a good thing.

baby · 2026-02-11T14:08:40 1770818920

I think thats what docs/comments are for, ask your agent to add/maintain them

nihonde · 2026-02-14T05:33:00 1771047180

No functional difference whatsoever. Literally the exact same solution, but somehow dressed up to be contrarian.

baby · 2026-02-05T11:05:10 1770289510

RAG seems odd when you can just have a coding agent manage memory by managing folders. Multi agent also feels weird when you have subagents.

simonw · 2026-02-05T16:31:48 1770309108

Yeah, vector embeddings based RAG has fallen out of fashion somewhat.

It was great when LLMs had 4,000 or 8,000 token context windows and the biggest challenge was efficiently figuring out the most likely chunks of text to feed into that window to answer a question.

These days LLMS all have 100,000+ context windows, which means you don't have to be nearly as selective. They're also exceptionally good at running search tools - give them grep or rg or even `select * from t where body like ...` and they'll almost certainly be able to find the information they need after a few loops.

Vector embeddings give you fuzzy search, so "dog" also matches "puppy" - but a good LLM with a search tool will search for "dog" and then try a second search for "puppy" if the first one doesn't return the results it needs.

visarga · 2026-02-05T16:52:46 1770310366

The fundamental problem wit RAG is that it extracts only surface level features, "31+24" won't embed close to "55", while "not happy" will be close to "happy". Another issue is that embedding similarity does not indicate logical dependency, you won't retrieve the callers of a function with RAG, you need a LLM or code for that. Third issue is chunking, to embed you need to chunk, but if you chunk you exclude information that might be essential.

The best way to search I think is a coding agent with grep and file system access, and that is because the agent can adapt and explore instead of one shotting it.

I am making my own search tool based on the principle of LoD (level of detail) - any large text input can be trimmed down to about 10KB size by doing clever trimming, for example you could trim the middle of a paragraph keeping the start and end, or you could trim the middle of a large file. Then an agent can zoom in and out of a large file. It skims structure first, then drills into the relevant sections. Using it for analyzing logs, repos, zip files, long PDFs, and coding agent sessions which can run into MB size. Depending on content type we can do different types of compression for code and tree structured data. There is also a "tall narrow cut" (like cut -c -50 on a file).

The promise is - any size input fit into 10KB "glances" and the model can find things more efficiently this way without loading the whole thing.

andai · 2026-02-06T01:02:45 1770339765

>The best way to search I think is a coding agent with grep and file system access, and that is because the agent can adapt and explore instead of one shotting it.

I tried the knowledge base feature in Claude web recently, uploaded a long textbook.

The indexer crashed and the book never fully indexed, but Claude had access to some kind of VM and reverse engineered the (automatically converted) book's fileformat and used shell tools to search it for the answers to my questions.

(Again, this was the web version of Claude, not Claude Code on my computer!)

I thought that was really neat, a little silly, and a little scary.

visarga · 2026-02-05T19:01:34 1770318094

Ok 2 hours later here is the release: https://github.com/horiacristescu/nub

andai · 2026-02-06T01:11:26 1770340286

>Code retains its function signatures

Nice. In original GPT-4 days (April 2023), I made a simple coding agent that worked with GPT-4's 8K (!) context window. The original version used some kind of AST walker, but then I realized I can get basically identical result (for Python) with `grep def` and `grep class`...

Took a look at your repo though, I am impressed you put a lot of thought into this.

It's interesting that Anthropic doesn't seem to be incentivized to do anything like this. Their approach seems to be "spawn a bunch of Haikus to grep around whole codebase until one of them finds something". You'd think a few lines of code could give you an overview of the codebase before you go blindly poking around. But maybe they're optimized for massive codebases where even the skeletons end up eating huge amounts of context.

The subagents "solve" context pollution by dying. If they find something, they only tell you the parent agent where it is. If not, they tell nothing. I guess that works but it feels heavy-handed somehow.

In CC I added a startup hook that similar to yours, dumps the skeleton of current dir, files, function names etc. into context, and the "time spent poking around" drops to zero.

gervwyk · 2026-02-05T21:32:15 1770327135

This is a very cool idea. I’ve been dragging CC around very large code bases with a lot of docs and stuff. it does great but can be a swing and a miss.. have been wondering if there is a more efficient / effective way. This got me thinking. Thanks for sharing!

y1n0 · 2026-02-05T16:42:26 1770309746

Context rot is still a problem though, so maybe vector search will stick around in some form. Perhaps we will end up with a tool called `vector grep` or `vg` that handles the vectorized search independent of the agent.

rando77 · 2026-02-05T12:14:14 1770293654

I've been leaning towards multi agent because sub agent relies on the main agent having all the power and using it responsibly.

andai · 2026-02-06T01:12:06 1770340326

What does that mean?

PlatoIsADisease · 2026-02-05T13:43:49 1770299029

Interesting.

I guess RAG is faster? But I'm realizing I'm outdated now.

lxgr · 2026-02-05T13:48:31 1770299311

No, RAG is definitely preferable once your memory size grows above a few hundred lines of text (which you can just dump into the context for most current models), since you're no longer fighting context limits and needle-in-a-haystack LLM retrieval performance problems.

Aurornis · 2026-02-05T16:38:25 1770309505

> once your memory size grows above a few hundred lines of text (which you can just dump into the context for most current models)

A few hundred lines of text is nothing for current LLMs.

You can dump the entire contents of The Great Gatsby into any of the frontier LLMs and it’s only around 70K tokens. This is less than 1/3 of common context window sizes. That’s even true for models I run locally on modest hardware now.

The days of chunking everything into paragraphs or pages and building complex workflows to store embeddings, search, and rerank in a big complex pipeline are going away for many common use cases. Having LLMs use simpler tools like grep based on an array of similar search terms and then evaluating what comes up is faster in many cases and doesn’t require elaborate pipelines built around specific context lengths.

lxgr · 2026-02-05T18:27:01 1770316021

Yes, but how good will the recall performance be? Just because your prompt fits into context doesn't mean that the model won't be overwhelmed by it.

When I last tried this with some Gemini models, they couldn't reliably identify specific scenes in a 50K word novel unless I trimmed down the context to a few thousands of words.

> Having LLMs use simpler tools like grep based on an array of similar search terms and then evaluating what comes up is faster in many cases

Sure, but then you're dependent on (you or the model) picking the right phrases to search for. With embeddings, you get much better search performance.

Aurornis · 2026-02-05T18:34:49 1770316489

> Yes, but how good will the recall performance be? Just because your prompt fits into context doesn't mean that the model won't be overwhelmed by it.

With current models it's very good.

Anthropic used a needle-in-haystack example with The Great Gatsby to demonstrate the performance of their large context windows all the way back in 2023: https://www.anthropic.com/news/100k-context-windows

Everything has become even better in the nearly 3 years since then.

> Sure, but then you're dependent on (you or the model) picking the right phrases to search for. With embeddings, you get much better search performance.

How do are those embeddings generated?

You're dependent on the embedding model to generate embeddings the way you expect.

lxgr · 2026-02-05T18:52:33 1770317553

That doesn’t match my experience, both in test and actual usage scenarios.

Gemini 3 Pro fails to satisfy pretty straightforward semantic content lookup requests for PDFs longer than a hundred pages for me, for example.

Aurornis · 2026-02-05T20:02:47 1770321767

> for PDFs longer than a hundred pages for me

Your original comment that I responded to said a "few hundred lines of text", not hundred page PDFs.

MarcelOlsz · 2026-02-06T00:38:54 1770338334

It's you who misunderstood. They have one large word on each page.

rdedev · 2026-02-05T16:42:41 1770309761

I think it still has a place of your agent is part of a bigger application that you are running and you want to quickly get something in your models context for a quick turnaround

antirez · 2026-02-05T11:38:29 1770291509

Totally useless indeed.

baby · 2026-02-05T10:51:00 1770288660

Man this is rough, I spend a year with a folding phone on android and the AI integration was amazing. Just switched back to iOS and it’s just sad.

baby · 2026-02-05T10:49:06 1770288546

I’m wondering if it makes sense to distribute your architecture so that workers who do most of the heavy lifting are in hetzner, while the other stuff is in costly AWS. On the other hand this means you don’t have easy access to S3, etc.

rockwotj · 2026-02-05T10:56:26 1770288986

networking costs are so high in AWS I doubt this makes sense

mattbillenstein · 2026-02-05T20:52:27 1770324747

Depends on how data-heavy the work is. We run a bunch of gpu training jobs on other clouds with the data ending up in S3 - the extra transfer costs wrt what we save on getting the gpus from the cheapest cloud available, it makes a lot of sense.

Also, just availability of these things on AWS has been a real pain - I think every startup got a lot of credits there, so flood of people trying to then use them.

baby · 2026-02-04T22:56:55 1770245815

There is closed or moderated social media, and there is open and unmoderated social media. Twitter is the latter and it’s… really bad.

direwolf20 · 2026-02-04T22:58:52 1770245932

Twitter is heavily moderated, in bad ways. Can you say "cisgender" yet?

baby · 2026-02-05T10:46:55 1770288415

Explain how it is moderated

direwolf20 · 2026-02-05T19:27:12 1770319632

Did you read my comment?

baby · 2026-02-07T18:40:47 1770489647

Do you want me to tweet that word to prove a point?

baby · 2026-02-04T22:51:42 1770245502

I’m on the other hand, I have a million ideas and AI has allowed me to implement so many of them.

baby · 2026-01-31T07:41:51 1769845311

What scares me is that lots of American believe there is a plan and that the current administration is competent

baby · 2026-01-31T07:38:50 1769845130

Can US debt be seen as equity? We mint equity from times to times, do buy backs, allow people to tokenize it to pay with it and so on...

baby · 2026-01-31T03:35:40 1769830540

For people trying to say the "j" sound correctly, as in "jiu" (old), just say "dz", so in that example "dziu"