Hacker Newsnew | past | comments | ask | show | jobs | submit | arvind_k's commentslogin

At Zipphy, I worked on solving similar problems in on-prem environments — building an OCR + NLP + CV pipeline to generate spatial layouts and classify documents at scale.

One persistent challenge was generalizing across “wild” PDFs, especially multi-page tables.

Your mention of agentic OCR correction and semantic chunking really caught my attention. I’m curious — how did you architect those to stay consistent across diverse layouts without relying on massive rule sets?


Thanks MIke will take a look at Lumen


d1vyank ,

can we connect offline and discuss

thanks Arvinf 94230096(five)(four)


yeah WIP


thanks Ethan will try and embed your feedback


Simple document modelling and classification platform.


no ads in any form


I agree, the browser shouldn't contain ads, but an adblocker is also a basically essential extension that many of us use.

Myself included.

It's basically impossible not to these days.


Nowadays I refuse to fix anyone's computer without installing ad blockers in all installed browsers.

I'd consider it professionally irresonsible to let a non-technical user loose on the web without a good ad blocker. It's just too dangerous out there.


And then you're left with advertorials which are harder to detect.


Java script the good parts is worth a read


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: