Well this is timely, I was just yesterday needing a tool that would easily do embeddings for me so I could do RAG. LLM seems ideal, thanks Simon!
One question, I saw a comment here that doing RAG efficiently entails some more trickery, like chunking the embeddings. In your experience, is stuff like that necessary, or do I pass the returned documents to GPT-4 and that's it for my RAG?
For an example of what I'm doing now, I bought an ESP32-Box (think basically an OSS Amazon Echo) and want to ask it questions about my (Markdown) notes. What would be the easiest way to do that?
I'm still trying to figure out the answer to that question myself.
The absolute easiest approach right now is to use Claude, since it has a 100,000 token limit - so you can stuff a ton of documentation into it at once and start asking questions.
Doing RAG with smaller models requires much more cleverness, which I'm only just starting to explore.
That's fair, thanks! Do you plan to integrate the cleverness into LLM, so we can benefit from it too? I'm not sure if LLM can be used as a library, currently, I've only been using it as a cli, but it would be great if I could use it in my programs without shelling out.
One question, I saw a comment here that doing RAG efficiently entails some more trickery, like chunking the embeddings. In your experience, is stuff like that necessary, or do I pass the returned documents to GPT-4 and that's it for my RAG?
For an example of what I'm doing now, I bought an ESP32-Box (think basically an OSS Amazon Echo) and want to ask it questions about my (Markdown) notes. What would be the easiest way to do that?