Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Strangely the linked marketing text repeatedly comments regarding OCR errors (I counted at least 4 separate instances), which is extremely weird because such a visual RAG suffers precisely the same problem. It is such a weird thing to repeatedly harp on.

If the OCR has a problem understanding varying fonts and text, there is zero reason using embeddings instead is immune to this.



I’m confused. Wouldn’t the LLM be able to read the text more correctly than traditional OCR by virtue of inferring what that looks like vs what makes sense for it to look like from training? I would think it would be less prone to making fewer typographic interpretation errors than a more traditional mechanical algorithm.


Modern OCR is using machine learning technologies, including ViT and precisely the same models and technologies used in the linked solution. I mean, if their comparison was with OCR from 2002, sure, but they're comparing against modern OCR solutions that generate text representations of documents, using the very latest machine learning innovations and massive models (along with textual transformer-based contextual inferrals), with their own solution which uses precisely the same stack. It's a weird thing for them to continually harp on.

Their solution is precisely as subject to ambiguities of text that the comparative OCR solutions are.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: