Hacker Newsnew | past | comments | ask | show | jobs | submit | chhabraamit's commentslogin

what's our estimate of the cost to finetune this?


I don't know the cost, but they supposedly did all their work in 3 weeks based on something they said in this video: https://www.youtube.com/watch?v=5_m-kN64Exc


one of the things I would want in such custom client is that having features which are part of right their corresponding official clients like artifacts, python code execution, "whatever chatgpt does when the convo surpasses the context length" etc


How does llama.cpp’s grammar adherence work?

Does it keep validating the predicted tokens and backtrack when it’s not valid?


It's essentially an Earley Parser[0]. It maintains a set of all possible currently valid parses, and zeroes out the probability of any token that isn't valid in at least 1 of the current potential parse trees.

There are contrived grammars you can give it that will make it use exponential memory, but in practice most real-world grammars aren't like this.

[0] https://en.wikipedia.org/wiki/Earley_parser


Earley parsers should not need more than O(n^2) memory.


You don't even need that for JSON. JSON can be expressed using a LR(1) grammar, so you can do it in linear time and space.


Yes, the llama.cpp work supports arbitrary CFGs, not just JSON


yup, nice idea. keep collecting logs in a flow and only log when there is an error. Or

Start logging in a buffer and only flush when there is an error.


I think this works well if you think about sampling traces not logs.

Basically, every log message should be attached to a trace. Then, you might choose to throw away the trace data based on criteria, e.g. throw away 98% of "successful" traces, and 0% of "error" traces.

The (admittedly not particularly hard) challenge then is building the infra that knows how to essentially make one buffer per trace, and keep/discard collections of related logs as required.


It sounds nice, but also consider: 1) depending on how your app crashes, are you sure the buffer will be flushed, and 2) if logging is expensive from a performance perspective, your base performance profile may be operating under the assumption that you’re humming along not logging anything. Some errors may beget more errors and have a snowball effect.


Both solved by having a sidecar (think of as a local ingestion point) that records everything (no waiting for flush on error), and then does tail sampling on the spans where status is non OK - i.e. everything thats non OK gets sent to Datadog, Baselime, your Grafana setup, your custom Clickhouse 100PB storage nodes. Or take your pick of any of 1000+ OpenTelemetry compatible providers. https://opentelemetry.io/docs/concepts/sampling/#tail-sampli...

Pattern is the ~same.


You're nearly there. Tail sampling on non OK states.

https://opentelemetry.io/docs/concepts/sampling/#tail-sampli...


tell me more about your searching process here


I would love if I could run this on my TV


Looking to extend, tvOS is a bit technically difficult


And how do we make that happen?


get them dreaming about what they could build if they have the skill!


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: