yeesian's comments

yeesian · on Nov 1, 2023

FST implementation in Rust: https://news.ycombinator.com/item?id=38096511

yeesian · on Nov 1, 2023

Relatedly (from a data structures perspective): https://news.ycombinator.com/item?id=10551280

yeesian · on Oct 31, 2023

This was posted in the past (https://news.ycombinator.com/item?id=20502032); just re-posting now in light of their potential application to LLM grammars (https://news.ycombinator.com/item?id=38082219 for context)

yeesian · on Oct 31, 2023

Relatedly:

Llama: Add grammar-based sampling - https://news.ycombinator.com/item?id=36819906 (105 comments)

ReLLM: Exact Structure for Large Language Model Completions - https://news.ycombinator.com/item?id=35829399 (13 comments)

Show HN: Structured output from LLMs without reprompting - https://news.ycombinator.com/item?id=36750083 (54 comments)

Show HN: LLMs can generate valid JSON 100% of the time - https://news.ycombinator.com/item?id=37125118 (303 comments)

A guidance language for controlling LLMs - https://news.ycombinator.com/item?id=35963936 (190 comments)

LMQL: A query language for programming (large) language models - https://news.ycombinator.com/item?id=35956484 (12 comments)

yeesian · on Oct 31, 2023

https://archive.ph/hlh06

yeesian · on Sept 8, 2023

https://cloud.google.com/vertex-ai/docs/start/explore-models...

yeesian · on Sept 8, 2023

https://cloud.google.com/vertex-ai/docs/start/explore-models...

yeesian · on June 16, 2023

(This is more of a link-dump than a paper discussion --)

For the line of inquiry w.r.t tensor compilers and MLIR/LLVM (linalg, polyhedral, [sparse_]tensor, etc), I personally found the following really helpful: https://news.ycombinator.com/item?id=25545373 (links to a survey), https://github.com/merrymercy/awesome-tensor-compilers

I also have an interest in the community more widely associated with pandas/dataframes-like languages (e.g. modin/dask/ray/polars/ibis) with substrait/calcite/arrow their choice of IR. Some links: https://github.com/modin-project/modin, https://github.com/dask/dask/issues/8980, https://news.ycombinator.com/item?id=16510610, https://news.ycombinator.com/item?id=35521785

I broadly classify them as such since the former has a stronger disposition towards linear/tensor-algebra, while the latter towards relational algebra, and it isn't yet clear (to me) how well innovations in one carry over to the other (if they do), and hence I'm also curious to hear more about proposals for a unified language across linalg and relational alg (e.g. https://news.ycombinator.com/item?id=36349015).

I'm particularly interested in pandas precisely because it seems to be right at the intersection of both forms of algebra (and draws a strong reaction from people who are familiar/comfortable with one community and not the other). See e.g. https://datapythonista.me/blog/pandas-20-and-the-arrow-revol... and https://wesmckinney.com/blog/apache-arrow-pandas-internals/

yeesian · on April 20, 2023

Previous discussion: https://news.ycombinator.com/item?id=12399580

yeesian · on April 12, 2023

That's cool, thanks for sharing! Do you know how close they are to their example use cases [1]? So far I've only been able to find a tool for calcite SQL parsing [2] but not the portion connecting to Arrow C++ compute kernel yet.

[1]: https://substrait.io/#example-use-cases [2]: https://substrait.io/tools/producer_tools/

hobofan · on April 12, 2023

I'd check out the Slack, which is where I've seen a few projects integrating it coordinating.

I think DuckDB is one of the projects that has the best support for executing Substrait query plans. I think for most other projects there are some forks with substrait support (e.g. datafusion) but nothing merged to upstream yet.

I'm not sure if there are any systems where it is integrated and yields tangible benefits yet (though there is decent progress on common tooling, so that shouldn't be too far in the future).