Chroma – A General purpose syntax highlighter in Go

trishume · on Sept 24, 2017

Shameless plug for my syntax highlighting library in Rust which uses Sublime Text 3 grammars, which give much richer highlighting and semantic information than Pygments: https://github.com/trishume/syntect

Sourcegraph also wrote a server using syntect which provides an API for highlighting, which they use to power their new server, so you can use it from any language (at a cost): https://about.sourcegraph.com/blog/announcing-sourcegraph-2 https://github.com/sourcegraph/syntect_server

alecthomas · on Sept 29, 2017

Looking at the Sublime Text 3 documentation [1], I see very little functional difference to Pygments. In fact, they seem remarkably similar. Pygments models a state machine via regex matching and actions, including pushing/popping state, sub-lexers, and so on.

What is the "richer highlighting and semantic information" to which you refer?

[1] https://www.sublimetext.com/docs/3/syntax.html

chmike · on Sept 24, 2017

I'm interested in your package, but as a go programmer. I spend my hollidays writing a [colorizer](https://github.com/chmike/clrz) package in go. I compared many colorizer packages and tried to make something original. I also wanted to support faster lexers than the one based on regex. I had to stop by the end of hollidays. Do you have a documentation on how your lexer works ? Does it use regex only ? I don't understand rust and don't want to learn it.

trishume · on Sept 24, 2017

Syntect's grammars are regex based, they're based on ST3's grammars which are an extension of the fundamental model used by Textmate/Atom/VSCode. They're stronger than just regexes though, there's a fancy stack machine with all sorts of features laid on top that allows it to do full parsing of a lot of languages.

The more modern grammars are written with a regex style that makes heavier use of the stack machine and so only uses regexes that can be turned into a DFA. There's ongoing work to use a layer on top of Rust's super-fast DFA-based regex engine to accelerate these grammars https://github.com/trishume/syntect/pull/34.

The problem with using non-regex based grammars is that you have to write them yourself. Syntect is something like 4000 lines of code but the grammars it uses total around 35k lines, and that's just the included ones, not the full ecosystem of online grammars. Basically unless you only want to support a small set of languages, a non-regex-based highlighting library is fairly infeasibly for a single hobbyist.

GrayShade · on Sept 24, 2017

> There's ongoing work to use a layer on top of Rust's super-fast DFA-based regex engine to accelerate these grammars

Will it really be faster? It didn't seem so from that GitHub thread.

As a data point for anyone curious, I'm using Syntect myself in a toy project. With Oniguruma (C NFA regexes), it highlights 200 lines of Rust in 40 ms on a 10 W TDP Celeron, which is all right, but a bit slower than I expected.

trishume · on Sept 24, 2017

It'll hopefully be faster. The Rust regex engine is quite fast, I haven't done any profiling to figure out why performance was the same in my initial test. There might be something easy to fix.

It's definitely possible to get better performance out of the underlying model, since Sublime does, but they have a custom DFA-based engine that can test regexes in parallel with captures, which Rust's regex engine can't.

thesmallestcat · on Sept 24, 2017

> I don't understand rust and don't want to learn it.

Tread lightly, for one more comment about Rust will summon the Rust Evangelism Strike Force.

blaenk · on Sept 25, 2017

The only ones that ever show are the harbingers of the purported evangelism strike force :)

chmike · on Sept 25, 2017

Sorry. You are right. I'm close to 55year old, and start to feel the limit of number of neurons available. I have to use them sparingly. That is why I enjoy so much Go. There is no judgment of Rust. It's just because of me. :)

hnbroseph · on Sept 24, 2017

[flagged]

sctb · on Sept 26, 2017

Could you please stop posting like this and instead comment civilly and substantively?

https://news.ycombinator.com/newsguidelines.html

Tade0 · on Sept 24, 2017

One thing that I love about this language is that downloaded stuff compiles and runs even though I barely know what I'm doing.

Anyway, great job!

hnbroseph · on Sept 24, 2017

> Shameless plug for my syntax highlighting library in Rust

was this really necessary?

trishume · on Sept 25, 2017

I mean, it wasn't, but no comment is. Judging by all the upvotes people seemed to get value from it.

I think it's interesting to compare syntax highlighting approaches. If syntect was literally the same thing, but only usable in Rust, I wouldn't have commented. But syntect uses a different approach that's better for some use cases, and as demonstrated by Sourcegraph, is useable from a Go program (albeit with a cost), so is a plausible alternative to consider. The tradeoff is of course that it isn't directly in Go, and so may be slower, and also it supports fewer languages out of the box (although you could probably exceed Pygments with all online tmLanguage and sublime-syntax files).

graysonk · on Sept 24, 2017

It's a time-honored tradition to respond to any post with your rust port of it.

ubercow · on Sept 24, 2017

I love this so much. So many times I've had to install Python just for Pygments.

A static binary will be so much easier to maintain.

josteink · on Sept 24, 2017

> A static binary will be so much easier to maintain

Repeat that mentality enough times, and I can't wait for the next heart-bleed to come out.

Distro-maintainers can't exactly be ecstatic about people using Go for more and more software.

eikenberry · on Sept 24, 2017

Go has supported shared libraries for a couple years now. I'm sure distro maintainers are taking advantage of this if they think it helpful.

conroy · on Sept 24, 2017

Perfect timing, I've been looking to add syntax highlighting to my blog. Took me about an hour to integrate it this morning. Here's a working example using the excellent blackfriday package.

https://gist.github.com/kyleconroy/a2741b9e6cf45beb3515d81ee...

alecthomas · on Sept 25, 2017

Nice! I have been meaning to look into integrating Chroma with Blackfriday. The plugin API looks really nice.

Note that lexers.Analyse() will almost always fail at the moment, as I've only written support for a couple of languages.

archgoon · on Sept 24, 2017

> and includes translaters for Pygments lexers and styles.

Kudos! So often we see a translation of tool or library to another language, but no way to leverage existing data / code bases. Nice!

nerdponx · on Sept 24, 2017

Good thing it's Pygments-compatible. There's no sense in completely reinventing the wheel for every programming language.

aorth · on Sept 24, 2017

It's also really fast:

https://mobile.twitter.com/GoHugoIO/status/91158254224715366...

nerdponx · on Sept 25, 2017

Nice. How much of the improvement is "algorithmic", versus just being written in a more efficient language?

alecthomas · on Sept 25, 2017

I would say almost zero is algorithmic, as Chroma very closely adheres to the design of Pygments.

The improvement is due to two factors, with the first being by far the biggest factor:

1. Hugo no longer has to call an external `pygmentize` tool for every highlight. This removes the overhead of the fork/exec, as well as the (not-insignificant) overhead of the Python interpreter starting up.

2. Go is generally a faster language than Python.

The caveat with 2 is that Python can spend large amounts of time in C, eg. doing regex matching.