Posted by Mark Yingling on Sep 18, 2015 11:33:25 AM

Did you know that you can make your PDFs searchable so that they are easier to find (that is, CAN be found)?


A quick mini-tutorial about search. There are essentially two ways that documents are indexed and tagged so that they can be found – keyword or full-text. Keywords are (or should be) part of your tagging system and are used to describe the document: customer/patient name, illness, reason for visit, etc. Full-text search means that every word within a document is indexed so that a search query will look for matching words within the body of a document. These two methods are usually combined for stronger results.


We tell you that so you’ll understand why searchable PDF is worth doing. If you rely on keywords and tagging (manual or automated) to generate the metadata (keywords) for you to find PDF documents, when those documents are tagged (described) incorrectly they effectively become invisible to search. If you were to accidentally tag a document as “chocolate” when it was actually about “peanut butter”*, when you search on “chocolate” then the document would not appear. However, had the document been full-text indexed, it would show in search results.


Good search set-up means that you should have at least three keywords per document. Even so, occasionally documents are tagged correctly. You don’t want to try to remember when you created a document and then have to scroll through your folder system looking for it!


Search Your PDFs

You may not know this, but PDF is an acronym for “portable document format.” It was created by Adobe to replicate the performance of paper in an electronic environment so that a page created as PDF would look the same regardless of the program used to view that document (Adobe no longer owns the standard, it’s now open, which is why there are so many companies offering PDF products). For document review, it’s great – most of us view at least one PDF a day in our work life. However, when you scan a document to PDF though, this means that the document is essentially a black box to search because it’s saved as a single image. Back to our chocolate and peanut butter example, unless you add metadata to the document at this point it’s going to be really hard to find a few days (much less weeks or months) later.


To make your PDF searchable (and, more importantly, findable), you need to OCR it. OCR (optical character recognition) is the software that recognizes machine print characters and turns those into a full-text index of a document.


You can OCR an existing PDF or apply OCR when scanning a document on your copier or scanner to PDF.


If you have a lot of PDF documents (or even if you’re scanning to TIFF), you should learn how to best OCR those documents so that you can easily find them in the future.


In a future post, we’ll go into detail on how you can make sure you’ll be able to find your PDF documents again. If you’re in a huge rush, Nuance and ABBYY both have inexpensive tools for PDF conversion (Note: You do NOT want to use these for large-scale conversions on your desktop PC; they do not scale). For personal or low-volume business use, there are apps for OCRing PDFs available – Google Drive and Evernote are two well-known ones for personal productivity.


Be sure to follow us to receive notification of our “how to” post coming up.


Using your copier or scanner as the first step, we can help you create a solution for finding all of the documents you create, for both employees and customers.


Get Your Print Assessment Now!

Topics: productivity