Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Chroma – A General purpose syntax highlighter in Go (github.com/alecthomas)
102 points by kiyanwang on Sept 24, 2017 | hide | past | favorite | 24 comments


Shameless plug for my syntax highlighting library in Rust which uses Sublime Text 3 grammars, which give much richer highlighting and semantic information than Pygments: https://github.com/trishume/syntect

Sourcegraph also wrote a server using syntect which provides an API for highlighting, which they use to power their new server, so you can use it from any language (at a cost): https://about.sourcegraph.com/blog/announcing-sourcegraph-2 https://github.com/sourcegraph/syntect_server


Looking at the Sublime Text 3 documentation [1], I see very little functional difference to Pygments. In fact, they seem remarkably similar. Pygments models a state machine via regex matching and actions, including pushing/popping state, sub-lexers, and so on.

What is the "richer highlighting and semantic information" to which you refer?

[1] https://www.sublimetext.com/docs/3/syntax.html


I'm interested in your package, but as a go programmer. I spend my hollidays writing a [colorizer](https://github.com/chmike/clrz) package in go. I compared many colorizer packages and tried to make something original. I also wanted to support faster lexers than the one based on regex. I had to stop by the end of hollidays. Do you have a documentation on how your lexer works ? Does it use regex only ? I don't understand rust and don't want to learn it.


Syntect's grammars are regex based, they're based on ST3's grammars which are an extension of the fundamental model used by Textmate/Atom/VSCode. They're stronger than just regexes though, there's a fancy stack machine with all sorts of features laid on top that allows it to do full parsing of a lot of languages.

The more modern grammars are written with a regex style that makes heavier use of the stack machine and so only uses regexes that can be turned into a DFA. There's ongoing work to use a layer on top of Rust's super-fast DFA-based regex engine to accelerate these grammars https://github.com/trishume/syntect/pull/34.

The problem with using non-regex based grammars is that you have to write them yourself. Syntect is something like 4000 lines of code but the grammars it uses total around 35k lines, and that's just the included ones, not the full ecosystem of online grammars. Basically unless you only want to support a small set of languages, a non-regex-based highlighting library is fairly infeasibly for a single hobbyist.


> There's ongoing work to use a layer on top of Rust's super-fast DFA-based regex engine to accelerate these grammars

Will it really be faster? It didn't seem so from that GitHub thread.

As a data point for anyone curious, I'm using Syntect myself in a toy project. With Oniguruma (C NFA regexes), it highlights 200 lines of Rust in 40 ms on a 10 W TDP Celeron, which is all right, but a bit slower than I expected.


It'll hopefully be faster. The Rust regex engine is quite fast, I haven't done any profiling to figure out why performance was the same in my initial test. There might be something easy to fix.

It's definitely possible to get better performance out of the underlying model, since Sublime does, but they have a custom DFA-based engine that can test regexes in parallel with captures, which Rust's regex engine can't.


> I don't understand rust and don't want to learn it.

Tread lightly, for one more comment about Rust will summon the Rust Evangelism Strike Force.


The only ones that ever show are the harbingers of the purported evangelism strike force :)


Sorry. You are right. I'm close to 55year old, and start to feel the limit of number of neurons available. I have to use them sparingly. That is why I enjoy so much Go. There is no judgment of Rust. It's just because of me. :)


[flagged]


Could you please stop posting like this and instead comment civilly and substantively?

https://news.ycombinator.com/newsguidelines.html


One thing that I love about this language is that downloaded stuff compiles and runs even though I barely know what I'm doing.

Anyway, great job!


> Shameless plug for my syntax highlighting library in Rust

was this really necessary?


I mean, it wasn't, but no comment is. Judging by all the upvotes people seemed to get value from it.

I think it's interesting to compare syntax highlighting approaches. If syntect was literally the same thing, but only usable in Rust, I wouldn't have commented. But syntect uses a different approach that's better for some use cases, and as demonstrated by Sourcegraph, is useable from a Go program (albeit with a cost), so is a plausible alternative to consider. The tradeoff is of course that it isn't directly in Go, and so may be slower, and also it supports fewer languages out of the box (although you could probably exceed Pygments with all online tmLanguage and sublime-syntax files).


It's a time-honored tradition to respond to any post with your rust port of it.


I love this so much. So many times I've had to install Python just for Pygments.

A static binary will be so much easier to maintain.


> A static binary will be so much easier to maintain

Repeat that mentality enough times, and I can't wait for the next heart-bleed to come out.

Distro-maintainers can't exactly be ecstatic about people using Go for more and more software.


Go has supported shared libraries for a couple years now. I'm sure distro maintainers are taking advantage of this if they think it helpful.


Perfect timing, I've been looking to add syntax highlighting to my blog. Took me about an hour to integrate it this morning. Here's a working example using the excellent blackfriday package.

https://gist.github.com/kyleconroy/a2741b9e6cf45beb3515d81ee...


Nice! I have been meaning to look into integrating Chroma with Blackfriday. The plugin API looks really nice.

Note that lexers.Analyse() will almost always fail at the moment, as I've only written support for a couple of languages.


> and includes translaters for Pygments lexers and styles.

Kudos! So often we see a translation of tool or library to another language, but no way to leverage existing data / code bases. Nice!


Good thing it's Pygments-compatible. There's no sense in completely reinventing the wheel for every programming language.



Nice. How much of the improvement is "algorithmic", versus just being written in a more efficient language?


I would say almost zero is algorithmic, as Chroma very closely adheres to the design of Pygments.

The improvement is due to two factors, with the first being by far the biggest factor:

1. Hugo no longer has to call an external `pygmentize` tool for every highlight. This removes the overhead of the fork/exec, as well as the (not-insignificant) overhead of the Python interpreter starting up.

2. Go is generally a faster language than Python.

The caveat with 2 is that Python can spend large amounts of time in C, eg. doing regex matching.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: