chhabraamit's comments

chhabraamit · on Sept 5, 2024

what's our estimate of the cost to finetune this?

hank808 · on Sept 6, 2024

I don't know the cost, but they supposedly did all their work in 3 weeks based on something they said in this video: https://www.youtube.com/watch?v=5_m-kN64Exc

chhabraamit · on Aug 26, 2024

one of the things I would want in such custom client is that having features which are part of right their corresponding official clients like artifacts, python code execution, "whatever chatgpt does when the convo surpasses the context length" etc

chhabraamit · on Aug 6, 2024

How does llama.cpp’s grammar adherence work?

Does it keep validating the predicted tokens and backtrack when it’s not valid?

gamegoblin · on Aug 6, 2024

It's essentially an Earley Parser[0]. It maintains a set of all possible currently valid parses, and zeroes out the probability of any token that isn't valid in at least 1 of the current potential parse trees.

There are contrived grammars you can give it that will make it use exponential memory, but in practice most real-world grammars aren't like this.

[0] https://en.wikipedia.org/wiki/Earley_parser

jules · on Aug 7, 2024

Earley parsers should not need more than O(n^2) memory.

orlp · on Aug 7, 2024

You don't even need that for JSON. JSON can be expressed using a LR(1) grammar, so you can do it in linear time and space.

gamegoblin · on Aug 7, 2024

Yes, the llama.cpp work supports arbitrary CFGs, not just JSON

chhabraamit · on July 11, 2024

yup, nice idea. keep collecting logs in a flow and only log when there is an error. Or

Start logging in a buffer and only flush when there is an error.

growse · on July 13, 2024

I think this works well if you think about sampling traces not logs.

Basically, every log message should be attached to a trace. Then, you might choose to throw away the trace data based on criteria, e.g. throw away 98% of "successful" traces, and 0% of "error" traces.

The (admittedly not particularly hard) challenge then is building the infra that knows how to essentially make one buffer per trace, and keep/discard collections of related logs as required.

mnutt · on July 11, 2024

It sounds nice, but also consider: 1) depending on how your app crashes, are you sure the buffer will be flushed, and 2) if logging is expensive from a performance perspective, your base performance profile may be operating under the assumption that you’re humming along not logging anything. Some errors may beget more errors and have a snowball effect.

yepitsthat · on July 12, 2024

Both solved by having a sidecar (think of as a local ingestion point) that records everything (no waiting for flush on error), and then does tail sampling on the spans where status is non OK - i.e. everything thats non OK gets sent to Datadog, Baselime, your Grafana setup, your custom Clickhouse 100PB storage nodes. Or take your pick of any of 1000+ OpenTelemetry compatible providers. https://opentelemetry.io/docs/concepts/sampling/#tail-sampli...

Pattern is the ~same.

yepitsthat · on July 12, 2024

You're nearly there. Tail sampling on non OK states.

https://opentelemetry.io/docs/concepts/sampling/#tail-sampli...

chhabraamit · on Oct 9, 2023

tell me more about your searching process here

chhabraamit · on July 21, 2023

I would love if I could run this on my TV

Couch_Surfer · on July 22, 2023

Looking to extend, tvOS is a bit technically difficult

chhabraamit · on Dec 23, 2019

And how do we make that happen?

dack · on Dec 23, 2019

get them dreaming about what they could build if they have the skill!