My Factual Musings

Digitised text

Building a Simple OCR Application with Tesseract

The bane of anyone doing some text-processing when he or she is blind is coming across inaccessible content. This post is going to look at how to work with digitised images and scanned PDF from the terminal. Along the way, we will develop a rudimentary program to process PDF files into plain text.