Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

LLMs can never provide 100% reliability - there's a random number generator in the mix after all (reflected in the "temperature" setting).

For documentation answering the newer long context models are wildly effective in my experience. You can dump a million tokens (easily a full codebase or two for most projects) into Gemini 2.5 Pro and get great answers to almost anything.

There are some new anonymous preview models with 1m token limits floating around right now which I suspect may be upcoming OpenAI models. https://openrouter.ai/openrouter/optimus-alpha

I actually use LLMs for command line arguments for tools like ffmpeg all the time, I built a plugin for that: https://simonwillison.net/2024/Mar/26/llm-cmd/



> random number generator

But the use of randomness inside the system should not prevent, in theory, as-if-full reliability - this stresses that the architecture could be unfinished, as I expressed with the example of RAG. (E.g.: well trained natural minds use check systems over provisional output, however obtained.)

> newer long context models

Practical question: if the query-contextual documentation needs to be part of the input (I am not aware of a more efficient way), does not that massively impact the processing time? Suppose you have to examine interactively the content of a Standard Hefty Document of 1MB of text... If so, that would make local LLM use prohibitive.


Longer context is definitely slower, especially for local models. Hosted models running on who knows what kind of overpowered hardware can crunch through them pretty fast though. There's also token caching available for OpenAI, Anthropic, Gemini and DeepSeek which can dramatically speed up processing of long context prompts if they've been previously cached.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: