Over the past few weeks, we’ve mentioned document scanning and document imaging a few times while discussing the value of your copier/multifunction printer. Sometimes, it’s hard to picture how technology can work for you without a solid understanding of the process. This post will briefly outline the document capture process. Before we begin, a caveat. Note that capture, while simple in what it does, is very complex and a technically sophisticated set of technologies behind a façade of simplicity (there really is an app for that). In larger organizations and at high volumes, capture becomes a complex task, albeit one with a proven ROI. Research from AIIM indicates that most capture installations report ROI in fewer than 12 months, often in less than half a year.
As a reminder, here’s a quick list of benefits to the document scanning process:
- Reduce filing/storage costs
- Reduce distribution costs
- Protect/control information
- Improve access to information
- Comply with regulatory requirements
- Improve customer service
So what are the steps in capture?
Document Capture and Data Capture – Two Aspects
Capture consists of document capture and data capture, and these are two different technology processes. Document capture is, historically, the conversion of paper document into an electronic representation of that document (PDF, JPEG, or TIFF formats most often). I say, historically because most capture software can import and convert electronic files (Word, spreadsheets). Data capture extracts data from a business form. One of the easiest examples for data capture is a credit card application (which we all seem to receive weekly, at least!). Should you apply for a new card, your name, address, etc. is received and captured – only the data is captured, not an electronic image of the entire form. An invoice is another good example. Given that invoices are all similar, data capture software can be “trained” to look to certain areas of a form – the upper right for a customer reference number for example – and match invoice data to that customer, which will start a workflow for payment.
Before scanning paper documents, you need to prep them. Remove staples, paper clips, sticky notes (or tape them down so they can be scanned too); repair torn pages; and sort into batches (not always necessary depending on amount of scanning to do. While tedious, a paper clip or staple could stop your copier from working and really slow you down.
Conversion – Capture
You can place a paper document, singly or in batches, on a scanning device. Or an electronic (born digital) document can be ingested. Either way, the document will be imaged or, if it’s a form, relevant data can be extracted from it (forms are also imaged too, depending on the business need).
Documents can be captured via:
- Fax – Image quality is usually lower here, which could lower recognition accuracy.
- Camera phone – You have a scanner in your pocket with your phone camera and easily downloadable software.
- Copier/Multifunction Printer – From desktop to large-volume, nearly all have scanning functionality now.
- Scanner – Various speeds and models available depending on daily scanning volume and business need. From desktop speeds of 10 pages per minute to 120 pages per minute and higher.
- Checks and microfilm – There are specialized scanners for both types of documents.
Once captured, the documents will go through some or all of the following steps: document imaging, forms processing, image cleanup, quality control, and recognition.
The images will be saved as one of a number of formats: TIFF (Tagged Image File Format), JPEG, PDF, or GIF (Graphics Interchange Format).
For forms, the data and/or the entire form can be captured, depending on the needs of your business. Data captured from a form can be seamlessly entered into the correct database.
Image enhancement features of many software products (some features are even embedded in the hardware on devices today) increase the quality of the scanned documents. Common features include deskew, despeckle, crop, rotate, blank and double-page detection.
Double-check images for accuracy. In key-from-image, data can be validated by a second operator or via automated processes like database lookups. Bad images are flagged and rescanned.
Recognition is an important step for indexing each image.
- OCR (optical character recognition) – Recognizes machine-printed characters.
- Zonal – Used where only specific fields on a form are required.
- Full-text – Free form document conversion allowing search on all words in the document.
- ICR (intelligent character recognition) – For hand-printed characters.
- OMR (optical mark recognition) – Recognizes check boxes, filled-in bubbles, etc.
- Barcodes – Read and extract information from a pre-printed barcode.
If you ever want to find your documents again, indexing is not optional! The index can be full text or key fields; though a combination of both is best. There are a number of ways to index.
- Key from index fields (document type, date, customer name, etc.) – A data entry person manually indexes documents.
- Auto-indexing with barcodes – By storing form information on a barcode before scanning a batch of documents, certain index values can be automatically populated.
- Zone OCR – Also automatic
- Ingest from other applications – Email, word processing, etc.; metadata from the document (subject line, sender, etc.) become the index fields.
Once captured, your business documents are now quickly available to anyone in your organization, at any time (based on appropriate permissions, of course).