Ask HN: What are you building LLM/RAG chatbots with

simonw · on March 19, 2024

I'm mainly hacking around with my LLM CLI tool, experimenting with different combinations of embedding models and LLMs: https://til.simonwillison.net/llms/embed-paragraphs#user-con...

I really need to add a web interface to that so it's a bit more accessible to people who don't live in the terminal!

petervandijck · on March 19, 2024

Simon! Still loving your blog posts about this stuff. Thank you for doing that.

Agreed that not everyone lives in the terminal, but you know.

petervandijck · on March 19, 2024

(Do you remember this btw https://petervandijck.com/xfml/spec/1.0.html)

simonw · on March 19, 2024

Absolutely! https://simonwillison.net/tags/xfml/

LorenDB · on March 19, 2024

I'm taking a DIY approach to RAG/function calling for a work tool. We're looking for data sovereignty, so we're probably going to self-host. To that end, I'm using Ollama to serve some models. If you want to do DIY I would highly recommend using NexusRaven for your function calling model.

No promises but I'm hopeful we can opensource our work eventually.

Jayakumark · on March 19, 2024

+1, Did post the exact query in Nexusraven discord asking for an example or quick start with ollama yesterday. Before that, tried to hack their NexusRaven pip client which uses TGI inference endpoint and non-langchain.py from their evaluation repo which uses TGI pipeline. Both failed.

petervandijck · on March 19, 2024

why NexusRaven specifically, what has your experience been?

Jayakumark · on March 20, 2024

In my testing it seems good at function calling including nested ones even when compared to GPT4 , since OpenAI function definition does not allow to specify return value name and its type . With ollama it’s quantized and can run on laptop GPU. While there are other ones like Functionary and fireworks.ai function calling on hugging face , they are not quantized so could not test them.

bovem · on March 20, 2024

I used LangChain and models hosted on Ollama for my latest project [1]. Since I have a GPU now and Ollama is now available for Windows I can build LLM based applications quickly with local debugging.

[1] https://github.com/bovem/chat-with-doc