> The only domain where Tesseract is competitive is for perfect "black text on w...

> The only domain where Tesseract is competitive is for perfect "black text on white paper", it gives pretty poor performance when dealing with colored, distorted text, or even strong page structure effects (tables, etc.).

I wouldn't be surprised if their data set is bigger than the stock tesseract, but part of the OCR process is to preprocess the images.