Have you ever asked yourself “how did research get done before LateX?”

jessriedel · on May 28, 2020

I've said it before and I'll say it again: the single most cost effective way to speed up math and physics research by a philanthropist would be taking a few million dollars and hiring a team of developers to bring LaTeX into the 21st century: Take the 100 most popular packages and fold them into the main code base (so I don't need to install conflicting packages just to get three columns!). Create a non-buggy desktop IDE/wordprocessor that works seamlessly using a traditional interface or a LyX-like interface. Merge/buy/replace Overleaf with an online service that syncs reliably with the desktop IDE. And, most importantly and difficultly, create a standard way for generating beautiful web pages (including mobile) from the same LaTeX++ files that can be used to generate beautiful PDFs.

The annual NSF physics budget is roughly a quarter billion dollars. This only needs to make NSF-funded physicists 1% more productive over a decade to justify spending $20M, and the entire world would benefit.

EDIT: Lots of commenters worry that this will end up like various closed-source for-profit tools like Google Docs. But for a closer analogy they should look at some of the excellent free open-source science software being produced with philanthropic funding such as Zotero. Physicists will only adopt it if it makes their lives easier (and maybe not even then...).

EDIT 2: There are some pretty strong parallels between the commenters suggesting this can already be achieved with homegrown tools and the infamous HN Dropbox comment.

inimino · on May 28, 2020

Knuth wrote TeX as a labor of love. The surest way to kill everything good about it would be a few million dollars, and a team, and to turn it into fucking Google Docs for science.

Ask Knuth to name one person who has the clearest vision for the future of TeX, and give that person a stipend to work on it for the next ten years, or for life, and you might actually do some good.

scubbo · on May 28, 2020

Can you elaborate on what you think is bad about Google Docs? Aside from the data privacy concerns from the fact that it's owned and operated by Google (which I agree with), it seems like a fantastically usable, available, and powerful product. What do you fear would happen to LaTeX in this situation?

inimino · on May 28, 2020

It's a fantastic product for locking the organizational knowledge of countless teams in the data centres of the biggest (publicly traded) enemy to privacy the US has ever seen. This would be a disaster for science, would suck the life out of the TeX ecosystem, and would be a perfect acquisition target for evil science publishing parasites (you know the names) to further strangle the scholarly discourse in the name of convenience. If the files don't live in my hard drive and aren't published to my web server, I don't want it and will fight against it.

And before someone asks, yes, I feel similarly about GitHub, and wasn't the least bit surprised when it sold itself to the self-declared enemy of open source of previous decades (now reformed, of course!).

Yes, these things are convenient, they have great UX nudging you toward what serves the interests of whoever owns them. I'm not arguing against the usability of Google Docs but against transplanting that model to a community built on entirely different principles.

scubbo · on June 8, 2020

OK - thanks for clarifying! I was wondering if there was some objection beyond privacy and control (which are certainly sufficient in-and-of themselves!).

> evil science publishing parasites (you know the names)

Not being part of the scientific community, I don't - but I trust you that they exist!

btmiller · on May 28, 2020

2 things, 1) Why do you consider Google Docs to be any different than Word? They both serve the same market, so you might as well be asking why not standardize around Word. And 2) given the intended uses of Google Docs/Word, the biggest reason not to use it for research are exactly the set of asides you mentioned.

izzydata · on May 28, 2020

That seems irrelevant in this context as both Microsoft Word or Google Docs would fit the analogy in the same way.

scubbo · on June 8, 2020

> you might as well be asking why not standardize around Word.

The comment I was replying to specifically called out Google Docs as bad, so that was what I was referring to. I agree that they appear to have mostly-similar features (though, to me as someone who doesn't use either as a Power User, GD seems to be more focused on online collaboration, and MW on local editing, the output of which is then shared)

im3w1l · on May 28, 2020

If I had a few million dollars budget for improving latex, I'd spend 90% of it UX research. What are people using it for? What are the biggest pain points? Which easy wins are there?

Then try to modify it as little as possible while improving things.

epicureanideal · on May 28, 2020

Or just improve HTML text rendering such that Latex isn't necessary to generate beautiful documents.

And add better support for concepts like page numbers, page headers, page footers, which while theoretically supported by HTML and CSS, aren't well supported in practice.

Given that many people use Python now for scientific purposes, and it's a straightforward language, to add extensibility to this Latex replacement, maybe just use Jinja templates and Python packages.

inimino · on May 29, 2020

You are greatly underestimating (1) the effort it took (in committees, over decades) to get CSS where it is now (and DSSSL, XSL-FO, and all the related work in this area) and (2) the vastness of the gap between where the Web platform is after all these years vs. where TeX already was before the Web even existed.

jessriedel · on May 28, 2020

Obviously everything that can be built by an individual is better that way than if built by a team. But at some point the amount of work becomes too much for one person to finish in a reasonable amount of time (e.g., a lifetime, or before their project becomes out of date).

What I'm describing is not do-able by one person, and the result of refusing to coordinate a team around the project (and instead having a hodge-podge of tools built by various people) is the current LaTeX ecosystem mess.

fermienrico · on May 28, 2020

I think the idea would be more like Jet Brain’s IDE for LateX. Ofcourse I have doubts if some team can pull this off well.

brzozowski · on May 28, 2020

Have you tried the TeXiFy IDEA plugin? It's well-maintained and has saved me a bunch of time.

https://github.com/Hannah-Sten/TeXiFy-IDEA

https://plugins.jetbrains.com/plugin/9473-texify-idea

michaelcampbell · on May 29, 2020

That's...amazing. Didn't even think to look there. Will totally be checking that out.

dragonwriter · on May 28, 2020

Yes. And, yes, there are better end user tools, for many use cases, for creating documents to be typeset than coding them directly in TeX, but (starting with LaTeX and moving up to many more modern solutions) a lot of them are built on top of TeX (and some even on LaTeX) rather than replacing it.

If it ain't broke, don't fix it.

FinnG · on May 28, 2020

I do totally understand what you mean, but I don't think you should dismiss the idea this swiftly. This is effectively a dictatorship vs democracy argument. I tend to suspect that neither solution would provide exactly what you wanted unless you were the dictator!

If you accept that premise, the decision has to be: which system would serve the most people the best? I'm inclined to think that a team with a budget of $20M would probably get the closest.

nerdponx · on May 28, 2020

Leave TeX alone, just build better tools on top of it.

jfk13 · on May 28, 2020

While it's possible to build a lot of things on top, maybe it'd be worth making some minor updates to the core TeX engine itself, to bring it into the world of Unicode and OpenType fonts...

Oh, there are a couple of such candidates, actually, built with very different philosophies: there's XeTeX, intended to graft Unicode and OpenType support onto TeX with as little disruption as possible, such that it "just works" with minimal effort; and there's LuaTeX, which "deconstructs" TeX to give you an infinitely-flexible typesetting toolkit that can do just about anything, if you can figure out how to wield it.

nerdponx · on May 28, 2020

Good points.

derefr · on May 28, 2020

I mean, that's what they were proposing; LaTeX is "on top of TeX", and LaTeX is what they were suggesting should be fixed.

nerdponx · on May 28, 2020

Yes, but I was responding to a comment about TeX.

raverbashing · on May 28, 2020

People do dissertations, articles, etc using Word/Google Docs, etc

Of course it's a work of love, but it's a tool from the 70s

Clinging to the past might be interesting but in the end it's that, a tool

Spivak · on May 28, 2020

Yes, it is ostensibly a tool from the 70's but it is also the most complete, correct, and featureful typesetting engine that exists.

TeX's downfall is the macro system. It makes it too easy for people to pile on poorly written abstractions that make working with any given document a nightmare because it has its own DSL. But because those packages and macros grew organically by researchers solving problems for themselves they are too practically useful to just drop them. So there are lots of annoying unintuitive quirks that you end up just having to put up with.

AlexandrB · on May 28, 2020

Considering what "modernizing" other parts of the tech stack has meant, I would expect a 21st century LaTeX to be hundreds of megabytes, require running a buggy, slow electron app, and phone home to a variety of SaaS providers when you use it.

Edit: Someone beat me to it by suggesting VSCode as an IDE.

rimunroe · on May 28, 2020

Hundreds of megabytes would be a considerable improvement for me. I think my last installation of LaTeX and the few packages I depended on was around two gigabytes.

ajmurmann · on May 28, 2020

And it's already slow and needs to run multiple times to generate a table of contents and bibtex

Edit: phone autocomplete typo

godelski · on May 28, 2020

Generally I use a Makefile that looks like this

```

SRC :=$(wildcard .tex .bib)

Paper.pdf: $(SRC)

pdflatex Paper

bibtex Paper.aux

pdflatex Paper

short: $(SRC)

pdflatex Paper

clean:

rm -f .glo .log .dvi .gls .toc .aux .ist .out .glg .pdf .bbl .blg .lof .brf *.sta

.PHONY: clean

```

I hate that you have to run things multiple times to get references right. Really it should just be `pdflatex Paper` and you should be done.

Every once in awhile I just get references that don't want to stick. Same with images. [ht!] and floatbarriers only go so far. As much as tikz is a pain (and beautiful), I'd definitely invest more time into making image placement and references easier for the average user.

EDIT: code has a lot of wildcards before every extension but I'm not sure how to escape those on HN

lokedhs · on May 29, 2020

Isn't this what latexmk does for you?

I don't use things like bibtext so I don't know how it handles that, but latexmk works perfectly for normal use cases, and ensures that you never have to run it more times than you need to.

godelski · on May 29, 2020

I didn't know about this before so I can't answer, but it looks like yes. Thanks!

ploxiln · on May 28, 2020

I've enjoyed using TinyTex, which is sort of the Arch Linux of TeX distributions: https://yihui.org/tinytex/

shuckles · on May 28, 2020

I think you are confusing RAM with storage.

rimunroe · on May 28, 2020

I suppose they could have meant that, but I think it's pretty reasonable to interpret that as having meant size on disk

shuckles · on May 28, 2020

You’re right. The original post was not clear.

setgree · on May 28, 2020

> the single most cost effective way to speed up math and physics research by a philanthropist would be taking a few million dollars and hiring a team of developers to bring LaTeX into the 21st century

This might be a niche opinion, but I think Rmarkdown & Jupyter get us a non-negligible portion of the way there. You can use LaTeX chunks when you need equations, markdown syntax when you need formatting (IIRC this is a pain with LaTeX), and executed code chunks to (e.g.) create and insert a graph. (One advantage of Rmarkdown is that you can have as many languages in a single document as knitr can handle, each its own chunk [0]).

Neither is all the way there yet, but, as Yihui Xie puts it [1]:

> I have a dream that one day all students and researchers will forget what "formatting a paper" even means. I have a dream that one day journals and grad schools no longer have style guides. I have a dream that one day no missing $ is inserted, and \hbox is never overfull.

[0]: https://bookdown.org/yihui/rmarkdown/language-engines.html

[1] https://twitter.com/xieyihui/status/1022873179532996609

j-pb · on May 28, 2020

bUt MuH TypESeTtInG!

I think you're absolutely right. But people seem to value a specific flavour of aesthetics over easy to understand and reproducible research.

I guess that's maybe partly caused by the publication pressure, once you start to pump out papers for pumping out papers sake, there is little incentive to go beyond the superficial.

derefr · on May 28, 2020

I have a dream that one day we'll get a language for typesetting that isn't a markup language.

Rather, this language would be designed to come after writing, in the same way that flagship Desktop Publishing software (InDesign, Publisher) make the assumption that layout happens after writing. (Sure, you can write inside the layout software, but you're not supposed to. They're assuming e.g. a journalist–editor–layout–typesetter pipeline.) You'd have a "layout file" (written in the typesetting DSL) that has separate "non-typeset rich-text" files linked to it via file/URI references (just like Desktop Publishing apps do.)

The person doing layout would never modify the linked source files; if they wanted to e.g. change certain character sequences into filligrees, that'd be something stored entirely as an annotation in the layout file, not as markup in the source file.

In that world, scientists just write the "non-typeset rich text" (for which Rmarkdown / Jupyter would be a great fit); and then it's the job of science journals to produce the "layout + typesetting annotations."

And hey, then journals would actually have a reason to exist! They would be to science as physical magazines are to journalism at this point: a place to find the articles you can already read online, but typeset much better.

abdullahkhalids · on May 28, 2020

Its not as straightforward. Converting a document from one column to two, often means that some equations have to be intelligently broken up across lines. There have been times when I have defined new variables, so I would not constantly have to break the two column layout for long equations.

Figure sizes are also a function of document fonts and size.

derefr · on May 28, 2020

I'm not saying you shouldn't be able to modify the way the source appears, even going so far as saying "there should be a new variable here." I'm just saying that those modifications should live in the layout document, rather than in the source itself.

Think of it like an archivist restoring a painting. First, they add a (removable) clear-resin barrier layer; and then they do restoration work on top of that barrier. In essence, rather than the paint they're adding becoming part of the painting, it instead becomes part of a "wrapper" for the painting, which can always be "unwrapped" back to the source.

Except this analogy isn't perfect, because you can only override a painting point-by-point; whereas a layout document can "losslessly" cut a linked source up into chunks and put each chunk in a different place, repeat chunks, insert things between chunks, etc.

A better analogy, actually, might be a DAW project. The input audio samples are immutable; the DAW project takes samples from the inputs, applies changes to those samples, and then sequences the results.

And note that a large part of the point of this workflow (in desktop publishing, at least) is that the upstream can continue iterating on the linked document as the downstream lays it out. Those changes may break the downstream layout, but they usually don't, because the downstream layout has usually been defined by constraints.

For example, in a newspaper article, if the source-text of the article gets more wordy, then up to a point, the layout engine just packs those same paragraphs tighter on the page it already decided they appear on, while keeping the paragraphs that appear on the next page spaced as before, and not allowing any lines from the paragraphs on page A to migrate to the flow-box on page B. But at a certain point, it switches strategies, moves an entire paragraph from A to B, and then relaxes the spacing of A back down. At that point, the layout may have been broken (because now B might be either too long to fit its flow-box, or unreadably tightly-spaced.) But that happens surprisingly rarely, compared to constraint-based reflow "just working" to absorb linked-document changes. (And note that either the writer or the layout designer can aid this process by inserting explicit page/section breaks that add weights to these constraints, to keep reflow from flapping lengths around.)

-----

Note that everything I'm talking about is already how a very domain-specific kind of Desktop Publishing already works: "engraving", a.k.a. sheet-music publishing. The score itself (the thing that contains information that could be converted to MIDI) and the engraving details (the way to present the score to a human) are kept separate in most engraving-tool formats.

duskwuff · on May 28, 2020

> Note that everything I'm talking about is already how a very domain-specific kind of Desktop Publishing already works: "engraving", a.k.a. sheet-music publishing.

You may be amused to learn that there's a TeX for sheet music -- Lilypond [1]. I don't believe it's commonly used in the industry, but it can generate decent-looking output.

[1]: https://lilypond.org/

mikhailfranco · on June 2, 2020

Also check out Noteflight:

https://www.noteflight.com/

j-pb · on May 28, 2020

That sounds a lot like EPub, where you have html and CSS.

It's an interesting idea, but a counterpoint would be: Why all this work for a dying medium? Paper is on it's way out, and there are so many different screen formats (including E-Inc) that dynamic layout is the way to go.

derefr · on May 28, 2020

> That sounds a lot like EPub, where you have html and CSS.

Eh, kinda. CSS requires the assistance of the HTML, though; HTML is "the boss" in the relationship, in that it gets to define where markup boundaries are, declare what CSS classes and identifiers apply to what elements, etc. So "I write the HTML, you lay it out with CSS" isn't really a valid pipelined workflow, if you expect to do anything fancy with your layout.

In GUI desktop-publishing programs, the layout document is "the boss." It's as if CSS could target its rules by byte-slices, rather than needing HTML to hand it ready-made selectors. You can take arbitrary bits and pieces of the linked source, style them how you please, and throw them onto parts of pages as you please.

Consider: in GUI desktop-publishing programs, a "pull quote" is not a second copy of the text in the source. It's just a second reference to the same text-slice of source-text, within the layout.

> Why all this work for a dying medium? Paper is on it's way out.

Layout/typesetting isn't just for books. Posters need layout. Billboards need layout. Business cards need layout. None of these are dying out in the least.

Heck, even for screen media, text within videos (e.g. title cards, ad copy, etc.) has static layout, and so benefits from layout/typesetting.

There will always be a place for tools that allow someone to efficiently describe (or hint) the best way to show text (on a screen, or on a page) for best readability — or best impact.

And even leaving aside the practical uses of text layout, I should note that kinetic typography is also an artform, done for its own sake—and one that often requires the assistance of a layout-constraint engine to achieve it. Do you think you can write something like House of Leaves purely using design tools like Adobe Illustrator? ;)

viklove · on May 28, 2020

flexbox sort of invalidates most of your argument

derefr · on May 28, 2020

Not at all. No matter how much power you have over HTML elements using flexbox, HTML still has the ultimate say over how the document is split into HTML elements.

In a desktop-publishing program, you can just highlight some text in the linked source, and make it bold, and the layout file will remember that that part of that paragraph should now be bold (using a targeting heuristic that will mostly work unless the paragraph is severely changed.)

You can't do that with CSS.

j-pb · on May 28, 2020

Well you could mark every div with an individual tag, and then have a realllllyyyyy big css file. ^^'

michaelcampbell · on May 29, 2020

> bUt MuH TypESeTtInG!

I didn't read the rest of your post after this, because the only purpose of this is to inflame and troll.

CJefferson · on May 28, 2020

The "Website" point brings up an important point.

LaTeX is suprisingly terrible at producing accessible documents. There are various systems and packages which make heroic efforts to make accessible output, but you have to use them from the start, as they work like you suggest -- they reimplement (or at least heavily modify) various LaTeX packages to make their outputs accessible.

I've had a blind PhD student and the standard of using LaTeX to produce PDFs for academic papers has been the single biggest block in his progress.

marvy · on May 28, 2020

I assumed that if ctrl-F works, then screen readers also work. Can you share a bit of information about what tends to go wrong, and/or how document authors can fix it?

j-pb · on May 28, 2020

Almost no documents properly support ctrl-F. You'll get at best single word search, but you won't find coherent sentences. There are arbitrary line-breaks caused by various styling and layout packages (most columns are broken). There are even publishers that purposefully put garbage into the PDF's text layer as a braindead form of "piracy protection".

PDF and TeX are broken beyond repair in terms of accessibility. The former because it allows the text and image to be split, and very few people care about text as long as it looks alright. And the latter because it's entire document representation model is based around computation. Knuth purposefully made it Turing complete in a misguided attempt at timelessness (no need for a replacement if it's infinitely extensible right?), which means there is no proper way to view a TeX document as "just text data".

I've worked in a library digitising documents, and whenever we encountered PDFs we'd just OCR the damn thing.

We need to throw the thing out, swallow the bitter pill of sub optimal kerning, and replace it with HTML asap.

solinent · on May 28, 2020

> PDF and TeX are broken beyond repair in terms of accessibility.

All that needs to be done is to have a accessible text section of the PDF--then Tex can just include the source of the document minus the Tex commands into the PDF, and PDF readers can have the screen reader work. The text could also just be stored per page, if the blind require to communicate where they found a particular item to a sighted person.

j-pb · on May 28, 2020

As I already wrote.

PDF already has exactly that. They separate they have a view layer and a text layer.

But it simply doesn't work, because people can't be bothered to get the text right, once the view layer is acceptable.

Also TeX minus the tags, is not the final text in any way, because it's not markup, it's code.

solinent · on May 30, 2020

> But it simply doesn't work, because people can't be bothered to get the text right,

That's hardly "it's broken beyond compare." I think with accessibility formats which represent fully typeset written documents (as opposed to pre-typesetting, like Latex), it's always going to be an issue because they're mainly produced for people who are sighted initially.

We should simply start shaming programs which export to PDF wrongly, it's not really a difficult thing to get right.

j-pb · on May 30, 2020

"The way a tool is used is the way that is encouraged by the tool."

Knuth is one of the most perfectionist computer scientists out there, and he still got it wrong.

The market economics are simply not in the favour of people doing extra work, and no amount of shaming will get that to change.

The alternative needs to be designed with accessibility builtin, and with decidability/markup as a conscious design choice.

We have such a format, HTML. The chances to get academics to use HTML with the coming open access wave are much better than getting them to rewrite all of their TeX templates.

solinent · on June 4, 2020

I disagree, I mean HTML would be even worse for text-flow if you rendered a PDF directly to HTML. The text-flow of HTML is not guaranteed, in the same way that the PDF text section is not guaranteed to be correct.

It's more about the social economics --but if you can find me a tool that supports the PDF export wrongly, I'll fix it myself.

> The market economics are simply not in the favour of people doing extra work,

Most of the tools are open-source. Does Word or any of the closed source tools do a bad job of this? That would be the main stopping point I guess.

It's still super trivial for them to implement, and shaming them will get their PR team evaluating the risk of not implementing such a feature. You just have to be successful at shaming them.

An HTML file also doesn't represent a document on paper, most people writing papers wouldn't switch to a electronic-only format at the moment--each page is carefully typeset and designed, HTML is not at this level even today.

CJefferson · on May 28, 2020

Minus the tex commands doesnt work for maths and tables, they are almost all tex commands.

marvy · on May 28, 2020

Let's take an example:

https://arxiv.org/pdf/1606.06389.pdf

A cursory inspection seems to indicate ctrl-F is fine here? Or am I missing something?

j-pb · on May 28, 2020

Single column layouts often work semi-acceptably for text only, when not purposefully tampered with.

However if you try to select the text on page 2, you'll see that the layout engine didn't actually place the formula into its own section. Not only is it broken, with the sum signs missing, it is also jumbled up with the text below it, resulting in weird - text broken formula - text - broken formula - mumbo jumbo.

This would make it extremely hard if not impossible to properly follow the paper with a screen reader.

The figures on page 6 also produce weird artefacts on the text layer. In HTML for example you'd simply present the alt-text to the person with the screen reader, but TeX doesn't have such a feature in most vector drawing libs afaik.

The table on page 8 is also not correct in terms of the text layer. If you select it you can see that you get the top half of the leftmost column followed by the rightmost column, followed by the rest in some order.

If I may also direct your attention to this gem on page 9: "Then at least half of allnodeshaveh ̄=lgn =lgn−1(seethepaperforthedefinitionofh ̄),and 2√ thus a potential of Ω( lg n). Therefore, the total heap potential would be lg n). Conjecture: the same construction works for the fancy potential √ √ function, which would give a bound of Θ(n · 4 lg lg n)."

Shall I go on ^^?

marvy · on May 29, 2020

No need to go on, that's already more than I bargained for, thank you so much for taking the time to respond :)

I suspect we're using different PDF viewers; I'm using the one that comes with my browser.

The results are somewhat less disastrous than you describe, but still bad. (I'm seeing maybe half the problems you mentioned.) I'm a bit curious which viewer you're using.

I can describe what happens when I copy/paste the stuff you mentioned on pages 2/8/etc., and if you're interested I will, but if you're not interested let me ask a slightly different question:

Rather than try to get the screen reader to make sense of the final PDF, would it be easier to just download the original page source from arxiv and let the screen reader deal with that?

j-pb · on May 29, 2020

I was using the builtin viewer of chrome I think, could also have been safari.

Using the original TeX source for the screen reader is significantly hindered by the fact that TeX isn't a markup-, but a programming language.

TeX is turing-complete by design, making it infinitely extensible, in order to avoid knuth ever having to re-typeset his books. After all if it's a programming language and a new system, problem, style, e.t.c, comes along you can just write a program that deals with it.

But this has horrible effects on render-ability, in order to know what the final document should look like, you need to run it, no way around that, thanks to Rice's theorem.

LaTeX users also generate a lot of their figures with TeX itself, write their own styling or bibliography rules, and write their own custom graphics rendering libraries.

The easiest path is to just give up, render the entire PDF to a 300dpi lossless image, and throw it into an OCR engine. These things contain a buttload of heuristics to generate structure and meaningful text based on visuals, and since we know that humans explicitly (and often only) care about those in TeX documents...

It's pretty darn sad, because in many ways TeX is holding scientific advances back, by eating its own children. My guess is that if scientific papers had branched off of plaintext tools like troff, we'd probably publish papers as machine readable semantically annotated knowledge-graphs by now.

marvy · on May 30, 2020

I just tried in Chrome; at least for page 2 it actually did a bit better than my browser: it actually managed to preserve the line breaks! (I don't have Safari installed.)

(Also, if you look closely, the summation signs are not gone, they are replaced by the letter P. Which is not helpful I admit.)

Do you think there's any place here for education/advocacy? For instance, everyone who makes web pages knows to provide alt text for images.

If there was a standard package that everyone knew they had to include or else it breaks everything from ctrl-F to copy/paste to screen readers, presumably people would use it, right?

I'm less interested in speculating what would have been if troff had "won", (though it is indeed fun to speculate), and more interested in how to fix the mess we're in now, so that 10 years in the future, blind people have better choices than OCR.

(Though OCR is still an improvement over the best option in the 1980s I bet. Though I wasn't around so just guessing.)

j-pb · on May 30, 2020

"Do you think there's any place here for education/advocacy? For instance, everyone who makes web pages knows to provide alt text for images."

That's not a culture thing, this is by the mechanism of <img alt. You won't get people who write TeX to change all of their workflows, and even if there was such a culture.

"If there was a standard package that everyone knew they had to include or else it breaks everything from ctrl-F to copy/paste to screen readers, presumably people would use it, right?"

Then it would still be nigh impossible because TeX commands, like all programming languages, compose rather poorly. It would be a herculean effort to produce a kinda but not really TeX that is both accessible with a focus on semantics, yet still compatible with the billions of lines of LaTeX/TeX out there.

"I'm less interested in speculating what would have been if troff had "won", (though it is indeed fun to speculate), and more interested in how to fix the mess we're in now, so that 10 years in the future, blind people have better choices than OCR."

Boycott LaTeX/TeX and PDF everywhere you can. Whenever you publish a paper, also publish it in markdown/html. Publish in OpenAccess Journals like [PeerJ](https://peerj.com/) which convert all of their papers to html in addition to pdf. Consider publishing papers in alternative forms like nextjournal.com .

We need to get our priorities straight in academia :/. This entire "but latex produces such beautiful documents", "I'm working towards getting into the most prestigious journal" culture of snobbery and vanity needs to stop. We need to go back to caring about the content, not the presentation, something TeX ironically was meant to do.

marvy · on May 30, 2020

So basically, give up on LaTeX to PDF as the primary workflow. Either convert LaTeX to semantic HTML instead of to PDF (probably doable for simple cases... \emph to <em>, \section to <h2>, \begin{tabular) to <table>, and so on), or better yet just author HTML without going through LaTeX at all. If needed, convert the semantic HTML to PDF as well, for printing or whatever, and then the PDF might end up sane.

Fine, fair enough. If fixing TeX is hopeless, then so be it. I assumed it just needed a few small tweaks, maybe combined with slightly cleverer PDF viewers. Guess I was wrong.

But then what should people use for math? I suppose there's MathJax, which seems to have put some thought into accessibility.

There's still a problem though. I can't help but notice that the journal you linked to is a biology journal. In some math/CS circles which are TeX's "home turf", TeX is far more entrenched to the point where I'm not sure such things even exist. For instance, arxiv sort of supports HTML, but not really:

https://arxiv.org/help/submit_html

So there I guess step 1 is to make HTML a viable option.

j-pb · on May 30, 2020

PeerJ has actually quite a few publications, one of them is CS ;) https://peerj.com/computer-science/ the dropdown on the top left allows you to switch between them.

I think MathJax is certainly a step in the right direction, they even support rendering to MathML.

But I agree that there is a certain lack there in terms of full semantic representations. MathJax is more accessible than TeX but it's still describing visual layout, instead of semantic meaning.

Pushing HTML to arxiv is also a step into the right direction.

I think the most important thing we can do is not be complacent with the state of the art. We need to go back to an age of computing where we didn't think we had it all figured out. We need to experiment, and not be afraid to take a step back in some aspects, like layout and kerning, in exchange for other advances like semantic representations and knowledge representation.

I think bred victor has a great talk on this: https://www.youtube.com/watch?v=8pTEmbeENF4

I think we need to experiment with things like observablehq.com or nextjournal.com or the many other that are coming into existence.

marvy · on May 30, 2020

re: PeerJ: I missed that, nice!

re: semantics vs visual layout of math... Wikipedia says OpenMath is a thing, but... that only solves half the problem. Once you have a format that encodes what you want, someone has to actually it.

Like, if some writes x^{-1} and f^{-1}, it's hard for a computer to figure out that the first one means "the number you get when you divide 1 by x", whereas the second one means "the function you get when you compute the inverse of f".

And if the author can't be bothered to slow down and say which is which, then the reader will have to guess.

re: HTML to arxiv: not ready for prime time, if you actually follow that link.

re: kerning: TeX's advantage here is not fundamental, I think. Just need a good font, as far as I know. (Actually that's not far; I know almost nothing here.)

re: layout: CSS is finally getting good at this from what I hear.

re: talk: looks familiar; maybe I should re-watch it.

lokedhs · on May 29, 2020

Try copying across a line break. You'll notice that the spaces between the words are missing when you paste it.

marvy · on May 29, 2020

Just tried copying across a line break on page 9, seems to work:

Here the potential function is, to say the least, not very simple.

LeifCarrotson · on May 28, 2020

I used LyX to write a pretty significant paper back in 1.6, a decade later Lyx is now on 2.3; I just installed it and opened my old research paper. Obviously, I haven't done a thorough test, but from 5 minutes poking around there's a huge improvement in appearance, documentation, and usability!

There's a similar story for the electronics design software package KiCAD - I played with that a bit in school as well, when it was more like a group of separate software packages duct-taped together with some import/export scripts. The enthusiasts weren't bothered by and seemed to actually enjoy the lack of polish. Recently, though, CERN has injected a bunch of money, programming time, and real-world use engineer's time into the project and it really shows.

I definitely agree that the NSF (or any major LaTeX-using organization) injecting a few million dollars into LyX would be an incredible boon to human productivity. I do feel that it would be better for an organization with internal consumers of the software to undertake this effort than for an outside team to just start with a blank canvas.

The latter too frequently ends up with the primary users being developers, where bug reports or feature requests will be responded to with "Just compile with -D FEATURE_ENABLE" or "For the syntax of that menu item, read the block comment and source code of the foo.cpp/bar() method" - basically, where TeX is right now. Most users of most software will install it with the Windows .exe, a few will use the .dmg/.deb/.rpm or their package manager: a vanishingly small subset are compiling it from source. And if you expect that your users should be able to read C++, your user base is going to very small.

ergl · on May 28, 2020

Writing the paper is the easy part, after spending months researching past literature, coming up with models and hypothesis, and performing experiments.

I really doubt Latex is the bottleneck for doing research.

j-pb · on May 28, 2020

You'd be surprised.

I've spend hours debugging other peoples references, bibliographies and journal dictated styles.

Also if you think that all of academia has MONTHS to work on papers, with our current publish or perish culture, you're wayyyyyy off.

ergl · on May 28, 2020

I admit I've been lucky with deadlines. The shortest span has been 3 weeks, but that was one time where most of the work was already done, and only then we looked for suitable conferences to submit the paper.

jessriedel · on May 28, 2020

Nothing in my comment suggests a bottleneck. It's just a matter of making people a few percentage point more efficient at a very low price.

jpab · on May 28, 2020

You referred to it as "the single most cost effective way to speed up math and physics research". I don't think it's so surprising that people are interpreting that to mean you think LaTeX is a bottleneck in the research process.

I guess you intended emphasis on the "cost effective" part of the statement, but not everyone will read it like that (I didn't).

jessriedel · on May 28, 2020

I mean, it wasn't merely emphasis. Likewise, "NBA" is not mere emphasis in the statement "Muggsy Bogues is the shortest person in NBA history".

HumanReadable · on May 28, 2020

I wish I could upvote this a thousand times and then some.

Already in the 80's Jeff Bezos refused to use latex due to its absurd difficulty to use. I can't believe it's been 40 years and there still isn't a better tool to write papers with.

It took me less than an hour to create a Google Docs template that looks roughly as good as a standard Latex document. The reason Latex looks beautiful is due to a set of wisely chosen defaults, nothing more.

Yet the focus of Google Docs clearly isn't scientific papers. I don't imagine it would take more than a few passionate employees at Google to expand Docs to a powerhouse of scientific writing.

Pictures of my template can be found below:

https://imgur.com/a/cC0e0gG

(and a blogpost about my Latex induced misery) http://mathiasbonde.com/?p=72

mNovak · on May 28, 2020

I agree beautiful defaults help LaTeX, all the more so because I always use someone else's well defined template. But, my main issues with Docs/Word is not just "look", rather:

- smart figure/caption/table handling (no orphaned captions, no figures hanging out in the middle of the page) - easy, label-based cross referencing (not selecting each from a popup interface) - text-based equation input

All that being said, I still have an elaborate Word template I use for proposal writing. Because often I have to pass it off to others (who may not know TeX) to contribute, and sometimes I need to quickly bastardize my nice formatting to squeeze into hard page limits.

ShareLatex / Overleaf is very good, but I'd love to see slightly more ease of use as OP suggests. I'd like to see a switchable WYSIWYG / code editor, so laypeople can make simple formatting changes visually without needing to know the right commands, and it propagates correctly into the underlying LaTeX.

HumanReadable · on May 28, 2020

Totally agree with your points. Latex remains the best tool for many tasks.

But none of the things you described, I think latex does particularly well either. Better than Google docs and the likes, sure, but I think it's possible to do better.

I truly think Google Docs could be so much more than it currently is.

They for example built a brilliant speech-to-text feature that is only available on desktop. Mobile is the one place I could imagine using such a feature. Imagine writing your documents with speech while hiking deep in the woods. Feeling the fresh air with a view of a beautiful brook as Google transcribes your thoughts. That's how I want to write!

Why on earth put so much effort into such a brilliant feature, and then have it only be available to users already sitting in front of a keyboard!?

Apologies for the tangent, it was where my mind wanted me to go!

drwu · on May 28, 2020

The automatic microtype is the killer feature for my own `unprofessional' publishing without Adobe suite.

Besides, the macro language of TeX is not the best programming language IMHO, but it just works, especially for the flexible control of long document.

marvy · on May 28, 2020

Just read your rant/blogpost. You didn't give too many examples of what's wrong with LaTeX, but you did mention putting two images next to each other. Isn't this just:

\includegraphics{img1} \includegraphics{img2}

Or is that not what you meant?

abdullahkhalids · on May 28, 2020

Subcaption package https://en.wikibooks.org/wiki/LaTeX/Floats,_Figures_and_Capt...

marvy · on May 29, 2020

Okay, nice, but maybe overkill, unless you want to refer to them separately within the main text?

abdullahkhalids · on May 29, 2020

If the images are not related to each other, but you want to place them on top of the page together for stylistic reasons, you probably want a simple minipage environment with two figure environments inside.

marvy · on May 29, 2020

I meant they might be so closely related that you want them share the same caption and figure number and everything.

abdullahkhalids · on May 29, 2020

Subcaption works perfectly for that. You can skip the subfigure caption - "The gull" etc in this picture

https://commons.wikimedia.org/wiki/File:Latex_example_subfig...

You can reference either the whole figure by using the label inside the figure environment, or you can reference the subfigures using labels inside the subfigure environments.

marvy · on May 30, 2020

> Subcaption works perfectly for that

I'm not saying it doesn't work. I'm saying that if your needs are simple enough, you can get away without it.

angrais · on May 28, 2020

You can use VSCode as LaTeX IDE and git to sync to overleaf... Which costs slightly less than $20M

arachnid92 · on May 28, 2020

My thoughts exactly. I'm currently working on a paper together with an older professor who barely manages to use Overleaf, and this is how I make it work. I keep my copy of the paper on my personal remote git repo. On my local workspace, I keep a clone of that repo with an additional branch tracking the Overleaf version. Then it's just a matter of merging branches whenever any of us make changes. She gets to enjoy the convenience of using Overleaf and I get to keep using VSCode and all my extensions without hassle.

zitterbewegung · on May 28, 2020

This is going to be really unpopular but you could allow people to write their papers in Microsoft word .

Either that or the other answer would be to say that’s why you have grad students .

ansgri · on May 28, 2020

I work in a lab which publishes in both CS and biology journals. Biologists require Word. Most our people hate it more than LaTeX, since in LaTeX it is at least difficult to break the template accidentally. Often the writing starts in GDocs, then gets moved to LaTeX, then converted to Word and cleaned up manually.

I'd love Word if it had that one feature, disable direct formatting, to enable changing formatting only through styles.

BTW Latex with embedded Markdown is a rather pleasant combo after you have it set up properly.

tomjen3 · on May 29, 2020

Word has that feature. Type "Protect Document" into the search bar, then select "limit formatting to a selection of styles" in the options toolbar that shows up on the right hand.

simion314 · on May 28, 2020

It is like asking developers to write html pages and code in Word.

MaxBarraclough · on May 28, 2020

Proprietary payware that doesn't support Linux and doesn't work with Git?

For current users of LaTeX, it's dead in the water.

> other answer would be to say that’s why you have grad students .

Of which discipline?

zitterbewegung · on May 28, 2020

Any of them. Professors generally chase grants to fund grad students who need tasks to get a PhD.

innocenat · on May 28, 2020

A lot of journal and conferences accept Word file (and provide MS Word template). My dad published tens of papers per year in biology and he doesn't even know what LaTeX is. However, for Maths/Physics/CS, a lot of people just find LaTeX to be faster overall (me included, I began using LaTeX simply because I was too tired to typeset my equation in MS Word)

privong · on May 28, 2020

> This is going to be really unpopular but you could allow people to write their papers in Microsoft word .

Many journals do allow Word submissions. Nature, for example[0]

[0] "we strongly encourage you to incorporate the manuscript text and figures into a single PDF or Microsoft Word file" https://www.nature.com/nature/for-authors/initial-submission

henrikeh · on May 28, 2020

Word is the preferred format in many fields.

In my field, Word is preferred for conference submissions, because the tight page limits (2-3 pages) is easier to work with in a WYSIWYG editor.

rat9988 · on May 28, 2020

Word with a template is the easiest way I've found to write scientific papers.

Bromskloss · on May 28, 2020

I don't trust articles written in Word.

narrationbox · on May 28, 2020

I have an alternative view: The fastest way to speed up research would be to fund an open source Wolfram software suite with feature parity. For a lot of hard tech you need good multiphysics and CFD simulations, both of which suffer from a severe lack of open source tooling. Most of the proprietary engineering software can be cloned rapidly once a solid foundation is built. Even though Julia and Python are making progress, in terms of ergonomics and productivity it is still hard to beat the offerings by Wolfram.

(If somebody is looking for a project for Mozilla's "fixing the internet" incubator, maybe give this a shot.)

abdullahkhalids · on May 28, 2020

Python ecosystem is not even close. I recently switched from Mathematica to Python, and I am going mad from juggling half a dozen packages with overlapping features.

Really wish somebody would make a meta package consisting of numpy, scipy, sympy, sage that removes conflicts and overlaps, and renames all functions with the same naming scheme.

teleforce · on May 30, 2020

I am surprised nobody here mentioned TeXmacs. It's nothing to do with TeX or Emacs, but it's inspired by them. Basically the concept is structural graph based document editing software similar to Tioga/Cedar by Xerox back in 1980s but unfortunately the research and improvement for better tools for editing document seems to stagnate a lot after 90s. I think the idea of having an easy to use WYSIWYG editor for scientific document editing need to be popular now to significantly increase scientist productivity. Remember that web was invented to increase scientist productivity and the rest is history.

On a related note, personally I think Overleaf or Google Docs is missing the point or the pain points. Even though both of them enabled offline editing but they are really a cloud first applications. The better approach is to have something like TeXmacs (native desktop first application) and make a seamless integration with synchronization and versioning to the cloud. With a proper use of synchronization and versioning technology like rsync and Git, together with seamless connectivity tools and protocols for examples Wireguard VPN, WebDAV or even the latest SMB over QUIC approach, I really think this is very feasible. The CONCEPT is similar to the useful and successful Watcom product or now SQL Anywhere database editing application.

thorwasdfasdf · on May 28, 2020

I'd say, make sure it's backwards compatible with the old way of doing it because there's a lot of scientists who have been using it for a looong time and don't like change. You have to remember, for a scientist, a tool is just a tool, not all of them are going to invest a lot of time in learning a whole new process.

gibolt · on May 28, 2020

The old way doesn't have to change. It still exists. Make a wrapper GUI that makes it far more approachable. Syncing and other funtions could be handled by a library that works with either

stared · on May 28, 2020

Oh, it is a lot of effort. Just pouring a lot of money and hoping that we can get through warts (without losing the best parts) of LaTeX is unreasonable.

And in fact, there are projects:

- https://www.mathjax.org/, and its much faster (and less feature-complete) variant https://katex.org/

- http://www.luatex.org/ - I guess this is the closest to what you mean

- https://rstudio.github.io/distill/ (using Knitr; it can generate HTML or PDFs)

etc, etc.

Even now, other libraries use BibTeX, which would be the easiest to replace in some more standardized format (JSON or YAML, so it is easy to load from web, directly).

fizixer · on May 28, 2020

No need (edit: except maybe for collaborative editing). Use TeXLive and install the updated copy every year (it has annual releases, and you simply delete the old install directory).

If you're not on linux, spin a Ubuntu VM, and install TeXLive.

(Strictly speaking you don't even need GUI on the machine where pdf gets compiled. I add 'scp output.pdf mydesktop:.' to my Makefile so the file is transferred to my desktop everytime I make. On the desktop the pdf is already open on the big screen and if your pdf reader is good, it'll update the opened pdf everytime the file on harddisk gets updated)

I never use overleaf, and I had no problem finished up my thesis in record time. LaTeX wasn't even an issue.

edit: Overleaf will be helpful for collaborative editing though. I personally would use github/gitlab instead if I ever have to do collaboration.

contravariant · on May 28, 2020

Seems easier to take org or markdown and add stuff to it until it's on par with LaTeX.

jessriedel · on May 28, 2020

Markdown is not even 10% of the way toward a full document typesetting system, so you might as well throw it away and create a new codebase. The only thing you're really suggesting then is making it backward-compatible with the markdown syntax.

contravariant · on May 28, 2020

Is markdown anything other than syntax? But yes quite a big part on (mostly) pagination is missing, which will need to be added.

I still think this is easier than merging the commonly used LaTeX packages into one code base and then building a proper IDE and a method to output to HTML.

Much better to start with markdown, extend KaTeX to support whatever you still need for math notation, and then work on making pandoc output to PDF (or equivalent) natively.

On of the bigger problems is bridging the gap between unpaginated HTML and paginated PDF. However if you start from the LaTeX side you'll need to deal with decades of technological debt as well.

andi999 · on May 28, 2020

Actually the monolithic Tex has been done. Its called context. (Every couple of years I try for a few hours to install it under windows but then give up). Sometimes I manage to install it and then give up because of the introductory texts.

https://en.wikipedia.org/wiki/ConTeXt

(but I still believe there is gold at the bottom of this)

Edit: woooah. I see there is now an installer for context, since april 2019! https://wiki.contextgarden.net/Installation

here we go again....

truckerbill · on May 28, 2020

Using a scripting language to format text always seemed insane to me. People love LaTeX but it's nowhere near as intuitive as writing formulas down on a piece of paper. Having two panes, one for the output seems also very hacky and unintuitive.

Instant feedback in one view, typing at the speed of thought should be the goal, and if it means a new paradigm I'm all for it. Imagine it was something you could actually program with; those gross inline math functions would suddenly be so much more parsable.

abdullahkhalids · on May 28, 2020

Remember that scene in the matrix when he says "I don't even see the code anymore". After a short while (about a couple of hundred pages of typing latex math for me), the code is mostly as intuitive as the visual formulas. At least for not too long equations.

[1] https://www.youtube.com/watch?v=MvEXkd3O2ow

jessriedel · on May 28, 2020

LyX exists. It is used only by a small minority of physicists even though it's been around for 25 years.

BlueTemplar · on May 28, 2020

LuaLaTeX ?

est31 · on May 28, 2020

You should check out Tectonic.

https://tectonic-typesetting.github.io

It fixes some of the issues that LaTex has.

bfirsh · on May 28, 2020

The important/difficult problem of generating beautiful web pages is most of the way there already: https://github.com/arxiv-vanity/engrafo

There are some features in LaTeXML, which powers Engrafo, for adding some "LaTeX++" features, like embedding JS, etc.

Still could do with a lot of work, though.

generationP · on May 28, 2020

Yes for IDE. Yes for HTML+ output (though what exactly the "+" entails is a tricky thing; HTML+CSS won't suffice). Not sure about merging the 100 most popular packages. Most likely their conflicts cannot be resolved in a backwards-compatible way, and then you have the standard "101 standards" outcome and old tex files start rotting.

jessriedel · on May 28, 2020

Yea, I really meant merging in the functionality of the most popular packages, and keeping as much of the syntax as possible without compatibility issues. I agree with you about those risks.

felixfbecker · on May 28, 2020

Imo papers of the future should be authored in semantic HTML, which can be rendered to PDF, but most importantly be reflowed and read on any digital device, also by people with disabilities, and indexed/searched.

The capabilities are all there - MathML, CSS for print layout, <figure>, SVG, highlight.js, LaTeX.css, ...

_bxg1 · on May 28, 2020

It's the right platform, but it's the wrong authoring experience for people who might not be able to code at all, much less understand the nuances of CSS and how it interacts with the DOM. People using LaTeX don't want to code something, they want to write something and just have requirements that go beyond the typical use-case for "writing something". A layer is needed on top.

Maybe that could take the form of a single JS and/or CSS file that gets dropped into an HTML page and then the author can create their documents with nothing but special tag attributes/classes. But likely it would need to be even more abstracted than that.

Here's a good attempt that's being made. It's an extended version of Markdown that covers charts, equations, etc.: https://casual-effects.com/markdeep/

isatty · on May 28, 2020

I disagree (or maybe because I don't fully understand your comment). In my (limited, I'm not a FE dev) experience, you layout an HTML page by hand but I do not want that with a research paper.

I want to write my paper and let LaTeX lay it out for me. Of course I'll have to wrestle with an unruly table or figure now and again but for the most part, the layout and typesetting of the document is handled for me.

josefx · on May 28, 2020

I wrote some documentation in html and wanted to export it as pdf. The main blocker: an index with page numbers. css has them, but support for them is mostly non existant.

johannes1234321 · on May 28, 2020

Luckily Knuth himself has this on his radar as this "earthshaking anouncment" by him shows: https://youtu.be/eKaI78K_rgA (Won't spoil it by giving more details)

epicureanideal · on May 28, 2020

Don't most journals accept MS Word submissions now? Doesn't Word provide all the formatting needed?

I also prefer Latex but I wonder if we're talking about solving a problem nobody has?

pnw_hazor · on May 28, 2020

Interesting. When I did some internships at Los Alamos (LANL) my lab neighbor was a theoretical physicist. He used Word and referred to LaTex as the Fortran of word processors.

Thrymr · on May 28, 2020

How about a modern TeX formatting layer to replace LaTeX?

blackrock · on May 28, 2020

This sounds like an opportunity for someone to create a python module to generate scientific documents.

You’ll get all the benefits of python, with the added benefits of generating formatted text.

Call it pyTeX.

LOL..

svat · on May 28, 2020

(The link goes to a thread on beautiful scientific communication (especially figures/diagrams), so it would have been nice to discuss that rather than discuss LaTeX. But as the conversation here seems to be mostly about the latter…)

What a lot of people don't realize about LaTeX is that much of its awkwardness comes from it being a product of two very strong personalities (both Turing Award winners for unrelated work!) pulling it in two opposing directions:

• Donald Knuth, who designed TeX as a tool for typesetting primarily, allowing a meticulous author complete control over the appearance of his pages, and attempting to capture in a program (or at least making possible) the highest standards of typography developed over the centuries since the invention of printing.(†)

• Leslie Lamport, who wrote LaTeX as a macro package running in TeX, trying to hide complexity from the author and making things as convenient as possible, out of a (probably correct) belief that authors should not care about the appearance and instead only focus on the content! (https://lamport.azurewebsites.net/pubs/document-production.p...)

The result is that with LaTeX, things look “easy” superficially, but everything is implemented in TeX macros (never intended as a full-fledged programming language), so things break in mysterious ways. Not to mention the additional layers of complexity from zillions of users writing their own "packages" for everything. The error messages don't make sense to users because they often come from the bowels of TeX, and I suspect that many users even just ignore(!) warnings about overfull/underfull boxes.

Conversely, if you look at one of Knuth's own documents written in plain TeX, everything is startlingly simple: for example there is no automatic equation numbering or cross-references; for Equation 5 you just write "5" in the equation and refer to it as "(5)" later, none of that stuff with "\eqref{eq:foo}" or whatever. If macros are used, they tend to be custom for the document, rather than elaborate general-purpose ones. Consider giving plain TeX a try (a great book is A Beginner's Book of TeX by Seroul and Levy); everything makes sense, you feel in power, all the error messages are clear (often the same error messages! but now they apply to something you actually wrote/intended), and if nothing else, you'll end up with an appreciation of what LaTeX is/does.

This comment is turning into a repetition of my same tiresome comments on other sites (https://cstheory.stackexchange.com/a/40282, https://tex.stackexchange.com/a/398372, https://tex.stackexchange.com/a/386592, https://tex.stackexchange.com/a/384881, https://tex.stackexchange.com/a/518802 etc.) so I'll stop here I guess :-)

----

(†): good paragraphs, avoiding widows and orphans and hyphenation on consecutive lines and loose/tight lines adjacent to each other, all of that. By far the longest chapter in The TeXbook, the manual on how to use TeX, is called “Fine Points of Mathematics Typing” and teaches the careful reader about aspects of typography that TeX does not handle automatically — and this is not even counting such advice in other chapters such as ties for avoiding line breaks in “psychologically bad” places. See for example http://www.rtznet.nl/zink/comparison.pdf (linked from http://www.rtznet.nl/zink/latex.php?lang=en) comparing plain-text paragraphs in Word and InDesign and TeX. If you look at the “bad” typesetting that Knuth rejected as painful (https://tex.stackexchange.com/a/367133), I think you get the idea :-)

leephillips · on May 28, 2020

Many good points. I would just point out that plain TeX itself is a format, a set of macros on top of TeX—just not a elaborate as LaTeX. So I’m not sure there is that much conflict between the two visions.

svat · on May 28, 2020

Oh yes definitely: some of the “sins” of LaTeX (doing too much with TeX macros, etc) do start with Knuth himself (though he seems to have been doing it for fun rather than prescribing it for general use), and some are in plain TeX too. But though I wouldn't call it conflict, I think there is a difference (of focus at least), which comes across in the way they are used: Knuth does all his writing, editing, and polishing with pencil on paper, and only uses the computer to finally type it up for typesetting and fine-tuning the appearance. (Also, he tends to use not plain TeX necessarily but take ideas from it and create a separate “format” for other documents.)

On the other hand, LaTeX is clearly designed with the vision of being a complete “document preparation system” for authors from the moment they first turn their thoughts into words. The former approach I guess is feasible today if you do most of your “document” stuff in another place (if not pencil and paper, then maybe a plain text editor or Markdown or…) and use TeX only at the end, for typesetting.

leephillips · on May 28, 2020

Ahh, now that you put it that way, yes, I think you are exactly right. For me, the real reason I started to use LaTeX had nothing to do with document formatting, which I had already figured out how to do with plain TeX (writing a 200-page thesis in it). It was the automatic handling of equation numbers and other labels.

the_svd_doctor · on May 28, 2020

I have such a love-hate relationship with Latex. I use it almost every day. I love how easy it is to write beautiful math, and how easy it is to organize well large documents. It's fast (in general). And, thank god, there are lots of information online.

But boy do I hate the whole semi-broken macro-based system. The package management feels so outdated. Errors are hard to decipher. Package documentation is there but takes days and days if you actually want to read it.

And then you use Tikz. Same result. Beautiful neat graphs. Horrible obscure macro based system with impossible-to-understand error messages.

petschge · on May 28, 2020

LaTeX is a bit like democracy. It is horrible, but all the alternatives I have tried a worse.

the_svd_doctor · on May 28, 2020

Gosh yeah. Imagine writing a 150+ pages thesis in Word. Would give me nightmares.

EDIT: To complement: I think it's obviously possible, lots of people do it. But I feel I would spend a lot of time on "keeping the structure right" as I edit it. Feels more straightforward with Latex. And then Math is kind of hard to write efficiently, but that's field specific.

m12k · on May 28, 2020

I only needed a single dose of 'Word has decided to move every single image, table and textbox all on top of each other at the very beginning of the document for no apparent reason' before I vowed never to use Word for anything important or complicated.

ketzu · on May 28, 2020

I wonder about that. I don't know word well enough at all to accomplish the same things I do in latex. But I know it can do a lot of those things.

I assume it would be similar if I knew as much about word as I do about latex. But there's no way of knowing, as I am not going to learn word without a much better reason than "i wonder if it's actually terrible to write 150 pages when you know how to use it."

bobbylarrybobby · on May 28, 2020

While Word has evolved to do many of the things that were exclusive to LaTeX at one point, the problem is that Word is not extensible. When you hit a wall in Word, that’s it — you have no options. In LaTeX you can pretty much always figure out how to do what you want to do, even if it takes learning a fair bit about the packages, and if you don’t want to spend that kind of time you can almost always find relevant info on stackoverflow.

lostmsu · on May 28, 2020

Either that changed, or you are technically incorrect. Word is heavily extensible through OLE. You can probably even embed a live ConvNet or a 3D physical simulation in a document. Hell, that was mentioned in Windows 95 book I read as a kid. They used to call it "multimedia".

tomjen3 · on May 29, 2020

Start by splitting it up into parts, each document not to contain more than 5-10 pages, then enable "limit formatting to a selection of styles" (anyone who disables this is to suffer the ordeal of the boats). Once everything is the way you want it, copy everything to a new place, add a master document and set the styles as you want. The export to PDF, publish or whatever you want.

If you want it truly beautiful, use a DTP and don't format the word documents at all.

marcosdumay · on May 28, 2020

I've done a 70 pages proposal on a computer vision study on Word once (plenty of images). I don't have nightmares with it anymore because I'll never open it again, and never make that stupid decision again (Even if the other side requires it!).

I've edited some ~100 pages regulations (so very few pictures and tables) on Word files too. It's not a nightmare because Open Office works quite well with those.

generationP · on May 28, 2020

The main advantage of LaTeX is that the format is more-or-less transparent. I say "more or less" because last time I claimed it is transparent, someone showed me their .tex files and sure I could not understand them, but by now this has been the only time I could not understand a .tex file (this was someone who used macros to decline nouns, among other things; it's not a common use case).

In general, if you see a .tex and cannot compile it, you can look inside and get a good idea of what the file is about in 5 minutes; a bit more time and you can essentially read it by hand. I don't know of any other semi-popular text format except well-written html+css (which is insufficient for science) that supports this. The cleanest systems will go obsolete and become unsupported; if I want my writings to stay around for 200 years, my best bet is to write them in a format that does not strictly require any decoder. And LaTeX, as commonly practiced, is such a format.

zzo38computer · on May 28, 2020

There is also Plain TeX. Many scientists use LaTeX; a few people (including myself) use Plain TeX. (At least, I find Plain TeX less confusing.)

thechao · on May 28, 2020

Right!? I feel like TeX is good, but most of the problems are actually outdated LaTeX issues.

JadeNB · on May 28, 2020

There are other TeX dialects, too. AMSTeX bid to become its own dialect, for a while, but I think what's mainly happened is that people who want its functionalities use them as a package (or packages) on top of LaTeX. ConTeXt is the big non-LaTeX dialect of which I'm aware.

ketzu · on May 28, 2020

I feel similar about latex. I love how easily I can write math in it, how neat the result looks and how I can structure larger documents easily.

But I hate the syntax, that seemingly simple tasks have arcane syntax or are hard to acomplish and mostly, that it's damn slow to compile. I was really surprised that you consider it fast, it's really interesting to me!

david_draco · on May 28, 2020

With https://www.lyx.org/ you only need to write LaTeX when you want to.

specialist · on May 28, 2020

Noob question: No one makes LaTeX "distros"? Failing that, what about sharing a Docker image?

leephillips · on May 28, 2020

There are LaTeX distros, but just a few of them. Most people on linux use the TeXLive distro. I recently downloaded the latest version from its home (https://www.tug.org/texlive/), and I think it took something like 6GB on my disk. Probably half of that is font files. TeXLive is in (for example) the Ubuntu repositories, but they are usually way behind the release version. I’ve heard there is a distribution for Windows, and probably for MacOS.

ketzu · on May 29, 2020

The most common windows version is probably miktex which comes with its own package manager which by default downloads packages on use, while texlive often is most easily used in the full scheme installing every package, but there exists a separate package manager as well, which is often not included as distros want you to use the OS package manager.

lou1306 · on May 28, 2020

The VS Code Latex-Workshop extension (which imho is great) allows to build LaTeX sources from a Docker container [1].

[1]: https://github.com/James-Yu/LaTeX-Workshop/wiki/Install#usin...

mFixman · on May 28, 2020

> Errors are hard to decipher.

Have you tried adding the `-file-line-error` compilation option? The errors are still unreadable, but they become 1000% easier to debug.

leephillips · on May 28, 2020

I really enjoyed looking at the examples of old books and manuscripts. But Matplotlib is only the go-to tool for graphs if you are using Python (and some Python users prefer something else). It’s not a universal tool like [La]TeX. The closest thing to that in this sphere would be gnuplot.

TeX appeared when I was in graduate school. Before it did, we were using troff, or whatever it was called, which produced hideous output. TeX was a revelation. That’s why everyone switched to it, despite the hardships endured when trying to get it to do what you wanted. The output looked like a beautiful, hand-set book. There was nothing else that came anywhere close. I wrote my thesis in plain TeX, using a Mac Plus ($1400 with a deep student discount). When LaTeX appeared, the selling point was not formatting, but automatic handling of labels and references. You could add a numbered equation in the middle of your document, without having to re-number, and track down references to, the hundred equations that came after it. Glorious.

Before LaTeX? The previous generation, my professors, wrote things out in hand and gave the result to a secretary. Those with terrible handwriting might type the text, leaving spaces for the math, which they added with a pen. They didn’t waste time time formatting the paper; that was someone else’s job. TeX created a generation of typographical obsessives, including me. Whatever the secretary did wasn’t good enough.

BiteCode_dev · on May 28, 2020

I always wondered the opposite: how do researchers can get any work done when they have to formalize it using LateX?

I've tried writting a book using LateX, and it's been nothing but a miserable experience. I'm been coding for 15 years, and tweaking my Linux machine for longer. I know what it means to face rought edges.

But LaTeX is another level of shenanigans.

I seem to understand that researchers use latex because of the seemingless formula integration and automation of references.

Well, why not use asciidoc (https://asciidoctor.org/docs/user-manual/#mathematical-expre...) or myst (https://myst-parser.readthedocs.io/en/latest/using/syntax.ht...) then? They are an order of magnitude easier to use, they can generate websites and pdf, use markdown but allow extensions, they support references, and accept latex formulas.

Is it a case of "everybody uses it so I have to"?

ergl · on May 28, 2020

Network effects have a lot to do with it. Conferences will give you style files that only work with Latex, and some require the .tex sources when you submit your article.

LudwigNagasena · on May 28, 2020

What I don’t understand is why Latex so cumbersome, complicated, unintuitive, ugly and slow. Is it simply a technical debt of choices made in time when computers were different and when there were different constraints? Is it just a nature of the problem and every tool will feel the same? Is it a result of evolution with a lack of direction? Was literate programming a bad design decision?

xamuel · on May 28, 2020

If you're an actual mathematician writing pages and pages of equations, LaTeX is not complicated, unintuitive, ugly, or slow. It's actually extremely well designed for addressing what is a very difficult problem, the problem of translating mathematics into ASCII. For how difficult that problem is, LaTeX is quite genius.

I think a lot of the would-be LaTeX-disruptors lack enough experience actually writing mathematics. LaTeX worked because Donald Knuth is both a great programmer AND a good mathematician. When LaTeX gets replaced, it'll need to replaced by a person (or team) with similar cross-disciplinary strengths.

ketzu · on May 28, 2020

Are you specifically talking about the math syntax within latex, or actual latex commands in general? Because I would disagree on the latter one. (But I'm a computer scientist not a mathematician.) uglyness is porbably hard to discuss as it's just a matter of taste.

I personally consider compile times that are measured in seconds for "simple" tasks slow (here's the question of it being an inherent problem of the space).

magicalhippo · on May 28, 2020

As someone who did both math courses and programming courses, I found LaTeX to be awesome for writing equations. It felt so natural I would more often then not solve equations in LaTeX directly.

However when writing stuff for my comp-sci courses, man what a chore. Pseudo-code was a PITA, as was getting decent tables of results where I wanted. So for a lot of those I threw my hands in the air and finished the paper using Word.

It might have been me, I try to learn as I went along, but it certainly was not as easy as math.

danielscrubs · on May 29, 2020

Markdown (via Pandoc) with the tex_math_dollars extension for math is the bees knees.

abdullahkhalids · on May 28, 2020

In a rewritten from scratch Latex, I would only make minor changes to the syntax of Latex. I don't think you can have another syntax that is overall better - compromises will be made somewhere or the other.

Compile time of course, can be fixed by reimplementing the whole compiler in a modern language.

LudwigNagasena · on May 29, 2020

But what if you a computer scientists or an economist that needs to write graphs, charts, tables, code, etc? Last time I tried to strikeout a column it required some weird TikZ shenanigans.

xamuel · on May 29, 2020

Agree LaTeX is painful here. For tables and inline code snippets, it is indeed a genuine shortcoming of LaTeX.

On the other hand:

* For graphs and charts, I'm not sure how it could be done much better in a text-based programming language.

* Even for tables you have to have some sympathy because remember, the cells in the table can (and occasionally do) contain very complicated mathematical contents. Tables are hard, as many a frontend dev will assure you.

abdullahkhalids · on May 29, 2020

Do you mean strike out each entry in the column separately? Or a single line that extends from the top left of the column to the bottom right?

The first can be done in a straightforward manner. Can Word do the second easier than Latex?

im3w1l · on May 28, 2020

For mathematics it is the best tool hands down. For papers without a single equation, I'm not as sure.

heisenbit · on May 28, 2020

Knuth did TeX, Lampert did LaTeX. As this is about typesetting most of the credit should got to Knuth. LaTeX is more about structure of the content.

iraphael · on May 28, 2020

Hey everyone! Thread author here. Happy to see so many people taking an interest in beautiful scientific communication!

If anyone has other examples of beautiful typography or graphing/figures throughout history, I'd love to see them! I've been replying to the original thread on twitter with some favorites: https://twitter.com/iraphas13/status/1262489387767480322?s=2...

mikhailfranco · on May 29, 2020

I believe Roger Penrose draws all his own illustrations. For copius excellent examples, see his book Road to Reality. Here is one example:

https://phys.org/news/2018-03-math-bridges-holography-twisto...

https://theportal.wiki/images/1/11/Penrose-Rindler-Clifford-...

He also comes from the pre-ppt age of presenting with handwritten acetate transparencies - and still does afaik. Many of his slides have been captured for the infowebs.

http://cgpg.gravity.psu.edu/online/Html/Seminars/Fall1998/Pe...

Some of his original papers from the 1960s were not published at the time, but circulated as samizdat facsimiles of his handwritten notes, until later transcribed by professors or their students, then published in book collections:

http://math.ucr.edu/home/baez/penrose/Penrose-TheoryOfQuanti...

Penrose is famous for his visual imagination, which seems to ground many of his insights. Here is a paper where he invents a visual notation for tensors and operators:

https://en.wikipedia.org/wiki/Penrose_graphical_notation

http://homepages.math.uic.edu/~kauffman/Penrose.pdf

This has inspired recent work by Bob Coecke, as captured in his beautiful book Picturing Quantum Processes.

(arXiv example: https://arxiv.org/pdf/0908.1787.pdf)

danbr · on May 28, 2020

Better title would have been: ‘How was research ever presented in a clean, pleasantly formatted document before LaTeX?’

meristem · on May 28, 2020

Yup-- this helps collaboration and communication across teams, not research processes.

AceJohnny2 · on May 28, 2020

Indeed, but this this being on threadreaderapp means it started as a write-once Twitter thread, not a well-written blog post.

macintux · on May 28, 2020

Reminded me of the recent piece: Crafting Crafting Interpreters, in which Nystrom describes his process for creating hand-drawn diagrams.

http://journal.stuffwithstuff.com/2020/04/05/crafting-crafti...

mikhailfranco · on May 29, 2020

This beautiful booklet my Tai-Danae Bradley also contains hand-drawn diagrams:

https://arxiv.org/pdf/1809.05923.pdf

It is set using the Tufte LaTeX template:

https://twitter.com/math3ma/status/1042569522056716289

https://www.overleaf.com/gallery/tagged/tufte

billfruit · on May 28, 2020

I feel like there is too much love/worship around here for LaTeX, despite concerns like it being not a good master format, and it mixing logical and physical formatting together.

But I do think that Markdown/Asciidoc is a viable system for very many tasks.

jankotek · on May 28, 2020

I read some advanced physics czech book from 1991, just after velvet revolution. It was written on type writer, charts and formulas hand drawn. Xeroxed to final count of 500 copies. You work with what you got.

new_realist · on May 28, 2020

Research is done despite LaTeX.

gnulinux · on May 28, 2020

In university I used to write literally everything in Latex. Papers, homeworks, notes, classnotes during class, personal stuff, everything and everything. All my math/CS professors gave us assignments/exams/everything in documents obviously written in Latex (some of my profs were even kind enough to give us .tex source).

I was taking this Geography class as a social science elective. We had a paper assignment, and my professor was talking about how it's important to submit HW in this specific format and that they'll give sample .docx file. At the time I didn't have any word processor on my computer so I just quickly asked the professor if it's ok to submit PDF in the same format as I use latex to do my hws. She was like "what is latex"? I'm not a native speaker so I thought I just mispronounced it and described it to her, and she was like no I don't know that you should write your paper in Word. I was stunned. I asked her how do geographers submit their research papers, and she said we just use Word.

oldsklgdfth · on May 28, 2020

A fun little fact from the wikipedia page[0].

> Since version 3, TeX has used an idiosyncratic version numbering system, where updates have been indicated by adding an extra digit at the end of the decimal, so that the version number asymptotically approaches π. This is a reflection of the fact that TeX is now very stable, and only minor updates are anticipated.

[0] https://en.wikipedia.org/wiki/TeX

supernova87a · on May 28, 2020

Does this story have to be linked to such a shitty ad-filled website? I mean, there are ads right smack in the middle of the figures and pictures of importance.

stared · on May 28, 2020

And speaking of typesetting after LaTeX, I recommend giving RMarkdown Distill, scientific and technical writing, native to the web (https://rstudio.github.io/distill/) a try.

There are references with a mouseover, possible to include an interactive chart D3.js, etc. Right now I use it for an internal report in deep learning.

rory_h_r · on May 28, 2020

A lot of the classic papers by Maxwell, Newton, Faraday etc. are available online. They look great but you can only imagine what a pain typesetting these used to be.

See, for example this paper by Maxwell from 1865: https://royalsocietypublishing.org/doi/abs/10.1098/rstl.1865...

jhallenworld · on May 28, 2020

In the late 80s I remember looking at many SIGARCH papers that were printed on crappy dot matrix printers. I thought this was kind of awesome- it's a combination of "who cares about the font when my paper has a good CPU architectural innovation" and "with my new microcomputer there is no need to pay for professional typesetting".

utopcell · on May 28, 2020

OT.

I am surprised that the LaTeX name didn't end up in a heated argument between Don Knuth and Leslie Lamport, like the infamous GNU/Linux vs Linux dispute between Stallman and Torvalds.

Maybe scientists are more civilized, maybe it helped that LaTeX already has TeX embedded in its name.

In retrospect, Lignux might not have been a bad name. The 'g' would be silent of course.

leephillips · on May 28, 2020

I can’t imagine either one of those guys giving a crap.

utopcell · on May 28, 2020

Agreed.

munificent · on May 28, 2020

A couple years back, I read Niklaus Wirth's "Algorithms + Data Structures = Programs", published by Prentice Hall in 1976.

It is a truly beautiful book. Cover, typesetting, illustrations. It just feels compact, clean, and elegant in a way that marvelously reinforces Wirth's own aesthetic. They really don't make them like they used to.

arminiusreturns · on May 28, 2020

I think org-mode is the best of all these worlds. I can breakout into latex where I need to, or can export to latex. I can export to html5 and css3 with custom templates, and can run code in blocks. I just wish other editors would work on org support because it feels like jupyter will eat a lot of the cake that org mode should be getting.

jyriand · on May 28, 2020

It was an interesting read, as I never thought that this might be a problem in the first place. It seems that to really care/know about LaTex you have to be hard core scientist working on academia. I was wondering why it's being mentioned so often on HN. I think research specific typography can only be advanced by someone who is experiencing these problems firsthand.

TrackerFF · on May 28, 2020

I used LaTeX a lot in college, but I'm gonna be honest - IF I could do the work in Microsoft word, I sure as hell would use word. Doing math in word with its built in math editor was incredibly fast, though at the expense of beauty, structure, flexibility.

omarhaneef · on May 28, 2020

Before LateX?

Just do a rough cuneiform sketch on papyrus before you carve into the clay tablet for storage, right?

contravariant · on May 28, 2020

I think you've got your writing mediums swapped around.

alphagrep12345 · on May 28, 2020

Why is latex needed? If we use can integrate a good math renderer to docs, why can't people use that instead?

BiteCode_dev · on May 28, 2020

If you cram a lot of formulas, images and graphs into a world document, you'll have an instable slug to work with.

Also, rendering a beautiful word document is quite chalenge.

Rendered latex document looks stunningly clean and professional out of the box.

They also are plain text, which mean they can be diffed, used with git indexed easily, etc. They play well with anything, being text.

You can even read them without having to render them.

And finally, latex check the structure of your doc. If you change it, it will crash, preventing you from commiting dead references.

rjsw · on May 28, 2020

My mother worked as a typist for a research lab, the stuff to type up could be handwritten or spoken.

b3n4kh · on May 29, 2020

I love that I can just read this handwrting, I usually can't even read my own...

BlueTemplar · on May 28, 2020

LaTeX is great, but it's obsolete (bad Unicode support), and is output to a problematic format (pdf).

Mediterraneo10 · on May 28, 2020

Unicode support in LaTeX has not been an issue since XeTeX became mature technology, so over a decade already.

leephillips · on May 28, 2020

Exactly. You can even markup math using Unicode. Here is Stokes' Theorem:

\[ ∫_Σ ∇ ⨯ 𝐅 ⋅ dΣ = ∮_{∂Σ}𝐅⋅d𝐫 \]

That will turn into a typeset version when run through LaTeX (using the unicode-math package). More details at https://lwn.net/Articles/657157/.

Also, any format except PDF is problematic. PDF is the only (widely used) format that will preserve formatting.