Isn't your example showing an issue w/ the opposite approach, where someone is getting bad output w/ an earlier openAI json mode that worked via training rather than mechanical output restriction to conform to a schema?
FWIW (not too much!) I have used llama.cpp grammars to restrict to specific formats (not particular json, but an expected format), fine-tuned phi2 models, and I didn't hit any issues like this.
I am not intuitively seeing why restricting sampling to tokens matching a schema would cause the LLM to converge on valid tokens that make no sense...
Are there examples of this happening w/ people using e.g. jsonformer?
FWIW (not too much!) I have used llama.cpp grammars to restrict to specific formats (not particular json, but an expected format), fine-tuned phi2 models, and I didn't hit any issues like this.
I am not intuitively seeing why restricting sampling to tokens matching a schema would cause the LLM to converge on valid tokens that make no sense...
Are there examples of this happening w/ people using e.g. jsonformer?