Hey, Tabula maintainer here. tabula-java only works with "vector" PDFs. That is,...

mgm__ · on March 10, 2021

I had some success last year integrating tesseract OCR and OpenCV with Tabula (compiled to javascript). The purpose was to build a Google Docs pdf table import addon without requiring a backend. Happy to get in touch to figure out how I could contribute the work back to Tabula (if that makes sense).

Here is a gif of table detection for a scanned PDF doc (the first run is slower as it requires fetching the opencv is bundle): https://lh3.googleusercontent.com/-OobUBBtnydg/X6Vn_Ls3juI/A...

Here's a demo of the addon running outside of Google Docs: https://pdftableutil.possiblenull.com/app/