An optical scanner geared to office documents rather than photographs. Also called “office scanners,” “enterprise scanners” and “business scanners,” desktop models have automatic document feeders that can scan in the range of approximately 15 to 100 pages per minute. Such units are rated in pages per minute (ppm) or impressions per minute (ipm) if both sides of the page are scanned simultaneously. “Personal document scanners” are low-speed, portable models that accept one sheet of paper at a time. Images or Text The output of document scanners may remain as images within a document management system, but many applications require that the characters on the pages be converted into ASCII text for manipulation in word processing and other programs. Most document scanners come bundled with optical character recognition (OCR) software for the computer that provides the conversion from visual character images to ASCII text. 

The scanning or digitization of paper documents for storage makes different requirements of the scanning equipment used than scanning of pictures for reproduction. While documents can be scanned on general-purpose Scanners, it is more efficiently performed on dedicated Document Scanners.

When scanning large quantities of documents, speed and paper-handling is very important, but the resolution of the scan will normally be much lower than for good reproduction of pictures.

Document Scanners have Document Feeders, usually larger than those sometimes found on copiers or all-purpose scanners. Scans are made at high speed, perhaps 20 to 150 pages per minute, often in grayscale, although many scanners support color. Many scanners can scan both sides of double-sided originals (duplex operation). Sophisticated document scanners have firmware or software that cleans up scans of text as they are produced, eliminating accidental marks and sharpening type; this would be unacceptable for photographic work, where marks cannot reliably be distinguished from desired fine detail. Files created are compressed as they are made.

Document Scanning Production Scanners

The resolution used is usually from 150 to 300 dpi, although the hardware may be capable of somewhat higher resolution; this produces Images of text good enough to read and for optical character recognition (OCR), without the higher demands on storage space required by higher-resolution Images.

Document scans are often processed using OCR technology to create editable and searchable files. Most scanners use ISIS or TWAIN device drivers to scan documents into TIFF format so that the scanned pages can be fed into a document management system that will handle the archiving and retrieval of the scanned pages. Lossy JPEG compression, which is very efficient for pictures, is undesirable for text documents, as slanted straight edges take on a jagged appearance, and solid black (or other color) text on a light background compresses well with lossless compression formats. 

A specialized form of document scanning is Book Scanning. Technical difficulties arise from the books usually being bound and sometimes fragile and irreplaceable, but some manufacturers have developed specialized machinery to deal with this. Often special robotic mechanisms are used to automate the page turning and scanning process.

The amount of data generated by a scanner can be very large: a 600 DPI 23 x 28 cm (9″x11″) (slightly larger than A4 paper) uncompressed 24-bit image is about 100 megabytes of data which must be transferred and stored. Recent Scanners can generate this volume of data in a matter of seconds, making a fast connection desirable.

Although no software beyond a scanning utility is a feature of any Scanner, many scanners come bundled with software. Typically, in addition to the scanning utility, some type of image-editing application (such as Photoshop), and optical character recognition (OCR) software are supplied. OCR software converts graphical images of text into standard text that can be edited using common word-processing and text-editing software; accuracy is rarely perfect.


