PDF files can contain searchable, editable, read-out-loudable text (that's an article for another week!). However, not all PDF files contain 'real' text. If the PDF file was converted from an image, such as a PC fax file or a screen capture, the PDF will contain pictures of text instead of text. Our eyes and brains can certainly read the text, but the computers cannot.
Cataloguing, editing, searching and reading text that is actually an image out loud are not possible without first processing the PDF through some sort of Optical Character Recognition (OCR) process.
Back in the day, I owned a utility called OmniPage Pro that miraculously turned images of text into editable text. The product is still available today, but you might be able to save some money if you need to call upon OCR technology because, believe it or not, Adobe has included document OCR in Acrobat 9 Professional.
With an image-based PDF file open, choose Document > OCR Text Recognition > Recognize Text Using OCR.
You will be presented with a dialog box where you can specify a desired page range. You can optionally click the Edit button and select OCR settings.
The PDF Output Style options allow you to predetermine how your OCR processed file will be displayed once processed. Try selecting ClearScan from the drop-down menu to have your processed PDF file show the file's text in a less scanned look. This can also greatly reduce the processed file's size, since it will replace the original image of text with actual text. Don't worry about any non-text items such as graphics in the original file. Acrobat's OCR engine will likely understand that it's a picture and just leave it alone.
Once your file has been processed, you can click in the Find toolbar or use Acrobat's Search feature to locate desired words and phrases.
Acrobat is loaded with gems like this. I am constantly hearing the words "I never knew that!" from seasoned Acrobat users. Come sign up for a class and see what tools and features are waiting to be discovered and used to increase your productivity... and marketability!
Great post!!!!
Posted by: retouching | July 01, 2009 at 04:56 AM
The wesite is at: http://www.goodocr.com/
Posted by: folha | August 22, 2010 at 09:46 PM
The online application Free OCR allows transforming the contents of an image file in a text output format. Though Microsoft Word is not supported currently.
Posted by: James | October 16, 2010 at 05:54 AM
You have two options: purchase a commercial OCR software, install and then do the OCR job. Or just go to some OCR website like goodocr.com, upload your image and wait to get the result. The latter will for sure save you a lot of time if you only do this occasionaly.
Posted by: Gates | October 17, 2010 at 08:50 AM