Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's interesting, because PDFs are an inherently un-streamable format. It requires seeking within the file to work (hence why command line PDF tools can't take STDIN for the PDF file - or they fake it by copying to a temp file first).

I wonder how google got around that problem.



Actually, PDF files can be optimized for streaming: http://acroeng.adobe.com/wp/?page_id=27

It's covered in Annex F of the PDF spec: https://wwwimages2.adobe.com/content/dam/Adobe/en/devnet/pdf...


"Linearized" PDFs have extra info up front to allow streaming and random access (via range requests).


You could try to progressively render the file, pausing when a required forward reference isn't within the data you have downloaded so far.


That seems to be what they do, in my experience.


Well, I imagine that they download the file in the background, and if the viewer seeks to a non-downloaded part of the file, then it just blocks with "Loading" until that part is downloaded.


If I remember correctly the PDF Spec says the files should be read backwards. The object table is at the end of the file.


engineers gonna engineer




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: