For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | teapowered's commentsregister

Apache Tika Server is very easy to set up - it can be configured to use tesseract for OCR.


Came here to mention Tika. I just set up a small POC with the 'full' tika docker container - default OCR bundled (with... 5 languages? English, Spanish, etc).

I parsed a PDF and when looking at the output, I noticed 'united stotes of america' was in the text. Didn't make any sense... Digging further, I saw that it had also parsed the images in the PDF, and one of them was some govt logo with bad artifacting. It did indeed read more like 'stotes' than 'states'.

Edit: That said, the OP asked about tables. I haven't tested any table stuff with tika (not something I need right now). Is the tika table support any good? Does it even exist? Seems like it might not really matter for many tika use cases (but I might be missing something obvious!)



crypto is the only game where the person who burns the most coal wins


They don't - Your best bet is probably Azul https://www.azul.com/downloads/zulu/zulu-windows/



If you have two red lego bricks and two yellow lego bricks, do you not have two reds and two yellows? You definitely have A couple of bricks of each colour.


Demo site looks to be down? At least not rendering for me, FF nightly on android


It's about targeted advertising - arguing with your spouse? Next ad break we show you adverts for lawers.


I don't think anyone actually wants to do that. I work in advertising with video and the people I've talked don't appear to think this is a good idea.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You