kevincleppe.com

Open Source OCR Solutions

Accessibility work often means having to find hacky, round about solutions for common problems that have unique twists. One of those common problems is remediating poorly scanned pages from a book that students need to read. Optical Character Recognition (OCR) can make this work a bit easier, as we can extract the text from the PDF, put it into a Word document, fix that (Word docs are MUCH easier to fix than PDFs) and then if we want, export that file into a fully tagged and accessible PDF. That means we cut down what would have been potentially DAYS of remediation into just a minute or two! But if a page is poorly scanned, it can severely throw off this workflow. I have an example below of such a page:

An example of a poorly scanned page, provided by Cornell University

That's not the worst scanned page in the world, but you can imagine how much worse it could be. Coffee stains on the page, written notes, bad lighting, the curve on the inside of the page, blurry quality, and a million other things can make OCRing of a page a nightmare. Thankfully, OCRing technology has gotten really good, and I have a examples of solutions you can access now, or do minimal set up to use.

Snipping Tool OCR

The easiest to use, and a tool you have access to right now (assuming you are on an up to date Windows device) is the Snipping Tool. Just open it up, select the area you want to OCR, then click the "Text Actions" button, and say "Copy all Text." I have a pic of that below, as well as the resulting text.

The Snipping Tool OCR option, with the Text option circled.

I have a link here to a text file of the above image. In short, it gets the majority of it correct, but because the scan is not the best, some words are incorrect. "Close" becomes "Closc" for example. That's a small mistake, but over the course of a longer document, and with even worse scanned docuements, these mistakes can add up, and also add up in amount of remediation time. The goal is here to to reduce the amout of time remediating, without sacrificing quality.

ChatGPT, Claude, Perplexity, and other AI solutions

I'm including these in this despite the "Open Source" title, as these all have free tiers, and you are likely already familiar with at least some of them. These AIs have a lot of really cool features and solutions, and that include OCR capabilities. Literally just upload the image, say "Please OCR this file and put it into a mark down file, thank you" (saying please and thank you improves the results, it's a fact). I have a link here to a text file of the Claude OCR'ed version of this image. For some reason, ChatGPT was really struggling with this image, but Claude was able to one shot it. Always be sure to double check the accuracy of the text!!