Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah, so that's very reasonable.

In Python-land this is all pretty easy. For example, HTTP/2 is a protocol of the first kind ("variable-length data structures with length prefixes") at the framing layer, which is implemented in a Python packager called hyperframe. This uses a combination of the `struct` module and bytestring operations to achieve its results. A similar approach works for the second kind as well.

Basically, in Python this is almost always much easier because struct sizing and memory allocation isn't a concern like it is in a C-like language (though even there, dynamically sized structures and pointers are your friends).

But I agree, there is a lack of good discussion about "how do I actually do this?" I'd like to elaborate on that at some point for sure, because the reality is that it's remarkably simple.



> Basically, in Python this is almost always much easier because struct sizing and memory allocation isn't a concern like it is in a C-like language (though even there, dynamically sized structures and pointers are your friends).

I definitely don't want C anywhere near parsers for untrusted data, for so many reasons, this among them.

> But I agree, there is a lack of good discussion about "how do I actually do this?" I'd like to elaborate on that at some point for sure, because the reality is that it's remarkably simple.

Perhaps it would help to have some worked examples for some additional protocols?

Would you be interested in collaborating on a Python parser for some non-trivial data structures? I have a collection of such parsers as part of BITS (https://biosbits.org/) that really need reworking to decouple them from I/O, and I suspect the result would make a good article and/or conference talk.


I am the author of one such library for the problem of writing parsers (particularly for binary protocols). The declaration of the protocol structures are separate from anything involving I/O. Not trying to push it too hard but it is one approach: https://github.com/digidotcom/python-suitcase

There is also Construct which has a different syntax but is similar in many ways: http://construct.readthedocs.io/en/latest/index.html

Both suitcase/construct are definitely better suited for parsing binary protocols -- In my line of work, that limitation hasn't been a deal breaker. With suitcase, at least, I haven't done much work to optimize performance (mostly because if I cared, I wouldn't be using Python).


Both of those look great; thanks for the pointer to them!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: