Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In the short term, .tex might be the most realistic option. It's not exactly the most accessible format, but a lot of people already use it so it's less additional work.

It's plain text so you can throw parsers at it to extract text, data, and formulae. It can be automatically converted into HTML for online display, PDF for downloading, etc. It's not pretty, of course, but we already have plenty of experience parsing and extracting useful information from another widely used format that mixes semantics and presentation all over the place, so why not?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: