Doesn't the BNF grammar approach in llama.cpp solve this issue in a generic way ...

ejones · on Aug 6, 2024

Similar approach to llama.cpp under the hood - they convert the schema to a grammar. Llama.cpp's implementation was specific to the ggml stack, but what they've built sounds similar to Outlines, which they acknowledged.

HanClinto · on Aug 6, 2024

llama.cpp's GBNF grammar is generic, and indeed works with any model.

I can't speak for other approaches, but -- while llama.cpp's implementation is nice in that it always generates valid grammars token-by-token (and doesn't require any backtracking), it is tough in that -- in the case of ambiguous grammars (where we're not always sure where we're at in the grammar until it finishes generating), then it keeps all valid parsing option stacks in memory at the same time. This is good for the no-backtracking case, but it adds a (sometimes significant) cost in terms of being rather "explosive" in the memory usage (especially if one uses a particularly large or poorly-formed grammar). Creating a grammar that is openly hostile and crashes the inference server is not difficult.

People have done a lot of work to try and address some of the more egregious cases, but the memory load can be significant.

One example of memory optimization: https://github.com/ggerganov/llama.cpp/pull/6616

I'm not entirely sure what other options there are for approaches to take, but I'd be curious to learn how other libraries (Outlines, jsonformer) handle syntax validation.