Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Good job!

On the limitations you wrote: ``` Similarly, we also found that the most popular the Llama 2 70B model variants (even those fine-tuned for function calling) would consistently generate incorrect function calls or even hallucinate functions outside the providede schema. ```

You could use grammar-based sampling [0] to ensure that the function call is at least syntactically correct.

[0] https://github.com/ggerganov/llama.cpp/tree/master/grammars



Grammar-based sampling is a great idea and a perfect fit for something like MemGPT! In our experiments using MemGPT with non-gpt-4 models, the biggest issue impacting performance ended up being incorrect use of function parameters and function hallucination. For example, even large models finetuned on function call data (eg https://huggingface.co/jondurbin/airoboros-l2-70b-2.1#agentf...) would generally output correct parsable JSON, but the arguments or function name would be wrong. For example the LLM might output a call to `personal_diary.add` (never specified in the preprompt) instead of the correct `working_context.append` call (explicitly specified in the preprompt) when trying to write data.


I don't see why you'd need to stop at "grammar". If you have something like intellisense that can tell you all legal completions (eg. list of member functions of the class of the last-referenced object) then you could use the same approach to limit sampling to function names that actually exist.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: