Ohm: A library and language for building parsers, interpreters, compilers, etc

pdubroy · on Oct 31, 2023

Hi HN, I'm one of the authors of Ohm. Happy to answer any questions you have.

We've been on here a few times before:

• Ohm – A library and language for building parsers, interpreters, compilers, etc.: https://news.ycombinator.com/item?id=26603393 (March 2021)

• Ohm - Parsing Made Easy: https://news.ycombinator.com/item?id=15491336 (Oct 2017)

You might also want to check out WebAssembly from the Ground Up, an online book we're writing that uses Ohm to teach you WebAssembly: https://wasmgroundup.com

SteveMorin · on Oct 31, 2023

Know anyone trying to use it to generate Erlang/Beam byte code? Or compiling wasm to beam?

weatherlight · on Oct 31, 2023

ooo, what are you working on?

Trung0246 · on Nov 1, 2023

Is there any future plans to support basic non-regular language functionality like c++ raw string with custom string delimiter?

fuzzythinker · on Oct 31, 2023

Ohm is a wonderful tool. I used it to create mation-spec [0], a readable structured configuration and specification format to automate and run code. I look hard trying to find something like it before giving up and creating one myself with the help of Ohm. The mation-spec is the basis of an origami fold simulation language to describe and simulate origami folds. PM me if you like to see it before I post the simulator on HN.

[0] https://github.com/mationai/mation-spec

microflash · on Oct 31, 2023

How does this compare with Chevrotain[1]?

More specifically, can I build lexers with Ohm? Can it generate a syntax diagram from a grammar?

[1]: https://github.com/chevrotain/chevrotain

pdubroy · on Oct 31, 2023

Ohm and Chevrotain solve a similar problem, but chose quite different spots in the solution space.

Ohm is focused on being easy to use. Grammars are generally very readable, and people love our online editor (https://ohmjs.org/editor/). Performance is fine for many use cases (hobby programming languages, query and schema parsers, etc.) but not good enough for a production programming language.

Chevrotain is much more focused on performance — honestly, it blows Ohm (and some other tools like ANTLR and PEG.js) out of the water. The tradeoff is that writing grammars is a bit more complex than with Ohm, and the result is not as readable.

fjfaase · on Oct 31, 2023

Looks interesting. I understand that you can just provide the grammar as a string. Which seems to imply that it is an interpretting parser. There are not many interpretting parsers. (I did implement one myself [1].)

I wonder if it can also deal with ambigious grammars and/or if it is a back-tracking parser.

It also has a single grammar for lexical and syntax description. (My interpretting parser is based on a hard-coded lexer.)

[1] https://fransfaase.github.io/ParserWorkshop/Online_inter_par...

pdubroy · on Oct 31, 2023

Ohm is based on parsing expression grammars: https://bford.info/pub/lang/peg/

The current implementation does use a tree-walk interpreter, but I'm considering creating a version that compiles to WebAssembly.

catapart · on Oct 31, 2023

Oh, this is awesome!

Truth be told, I'm glad I didn't know about it when I wrote a much more simplified project (shameless plug: https://github.com/catapart/Magnit.Tokenization), because I DEFINITELY would have just used your solution, even though its a bit overkill for those needs.

That said, after having finished what I needed, of course I started to wonder about what else I could add to it, with the main stopping force being the need to rewrite the parsing engine (regex ain't going to cut it for more complicated syntaxes). Which is one of those dev projects that linger in the back of your mind until you either see it through, or see that someone else has done it.

And, on that record, I think you've done a better job than I could ever attempt, so I'm very glad to know about this library, now! I don't have anything specifically in mind for it, but having the doors it opens available is quite nice!

souenzzo · on Oct 31, 2023

reminded me of a clojure library: https://github.com/Engelberg/instaparse

JonChesterfield · on Oct 31, 2023

Building an interpreter or a compiler from a grammar is an interesting idea. I can't immediately see how to go about it - the grammar would need to match on SSA or similar.

The examples have a lisp-like interpreter at https://github.com/ohmjs/ohm/blob/main/examples/simple-lisp/... which definitely uses a grammar for parsing and might use a generic AST representation.

Will have to think more - a grammar might be a worthwhile way to specify a nanopass style compiler pipeline.

pjmlp · on Oct 31, 2023

Well that was one of the ways attribute grammars and compiler toolkits were all about a couple of decades ago, like Silver, Coco/R, JavaCC, ANTLR, Amsterdam Compiler toolkit.

Usually they never end up in production, as while quite productive to prototype, they aren't that good in error recovery and meaningfull error messages.

FrankyHollywood · on Oct 31, 2023

This is new to me, sounds interesting!

I once used Codemod [0] to migrate an old JS codebase. Would this be a use case for Ohm as well?

[0] https://github.com/facebookarchive/codemod

bbkane · on Oct 31, 2023

Would this be a good library to build a json with comments formatter for my VSCode settings.json? I want something that sorts the JSON keys AND also moves any associated comments.

I'd also love to hear of any existing tooling that does this

usernamesp · on Oct 31, 2023

It looks very similar to tree-sitter but js-only https://tree-sitter.github.io/tree-sitter/

MrManatee · on Oct 31, 2023

For "similar to tree-sitter but js-only", Lezer might be an even closer match:

https://lezer.codemirror.net/

I have used both Ohm and Lezer - for different use cases - and have been happy with both.

If you want a parser that makes it possible for code editors to provide syntax highlighting, code folding, and other such features, Tree-sitter and Lezer work great for that use case. They are incremental, so it's possible to parse the file every time a new character is added. Also, for the editor use case it is essential that they can produce some kind of parse tree even if there are syntax errors.

I wouldn't try to build a syntax highlighter on top of Ohm. Ohm is, as the title says, meant for building parsers, interpreters, compilers, etc. And for those use cases, I think Ohm is easier to build upon than Lezer is.

usernamesp · on Oct 31, 2023

What makes Ohm better for building parsers and compilers?

MrManatee · on Oct 31, 2023

This is more of an interpreter than a compiler, but if you look at the "Arithmetic example with semantics" [1] linked on Ohm's GitHub page, you can see how Ohm can be simultaneously used for both (1) defining a language for arithmetic expressions and (2) defining how those expressions should be evaluated. I don't think Lezer even tries to help with something like this.

[1] https://jsfiddle.net/pdubroy/15k63qae/

FrustratedMonky · on Oct 31, 2023

Can anybody give quick run down on how this is different than parser/combinators used in lot of functional languages? I'm curious if it is similar or if not, what is making this better.

giraffe_lady · on Oct 31, 2023

The theory of PEGs is fundamentally different but I haven't been in either's guts recently to be a reliable explainer of them.

For practical purposes, PEGs are always linear time and can easily handle prioritized choice, with the downside that declared priority is the only way to resolve ambiguity. Backtracking is cheap but they are stateless in a very specific technical sense, so can't handle some semantic constructs. The no ambiguity thing means they only succeed or fail, no incremental parsing of incomplete grammar for example like you would want in a syntax highlighter. Some PEG libraries aren't "pure" PEGs and can get around these limitations though.

In real use they tend to have a very straightforward mapping between declaration and grammar. I got used to them in Janet, which has them in the standard lib instead of regex and for that purpose they are incredible. Much easier to write and especially edit. They also work well for some languages but there are some real world language features that they simply can't parse.