Doing OCR on linux/Mac
Posted by Tariq • Monday, July 27. 2009 • Category: Programming, Tidbits
Yesterday somebody gave me a USB key with ~1000 JPEGs on it. Each JPEG was a scanned page, ugh, and the task was to find some useful information about topic X. Now each JPEG was about 1-2mb and I needed to do something useful with these images quickly. So what follows is quick walk through of how to do Optical Character Recognition (OCR, that means taking silly image files and ripping out any text identified in them) on Linux or in my case a Mac.
Continue reading "Doing OCR on linux/Mac"
Defined tags for this entry: bash, computer forensics, dictionary, forensics, imagemagick, linux, mac, ocr, password, programming, scanned, script, strings, tesseract, tidbits, unpaper
Wed, 30.12.2009 12:59
Hi Bert, Haven't seen the b ehavior you speak of, I [...]