Optical character recognition, usually abbreviated to OCR, is the

mechanical or electronic translation of scanned images of handwritten,

typewritten or printed text into machine-encoded text. It is widely used

to convert books and documents into electronic files, to computerize a

record-keeping system in an office, or to publish the text on a website.

OCR makes it possible to edit the text, search for a word or phrase, store

it more compactly, display or print a copy free of scanning artifacts, and

apply techniques such as machine translation, text-to-speech and text

mining to it. OCR is a field of research in pattern recognition, artificial

intelligence and computer vision.

OCR systems require calibration to read a specific font; early versions

needed to be programmed with images of each character, and worked

on one font at a time. "Intelligent" systems with a high degree of

recognition accuracy for most fonts are now common. Some systems are

capable of reproducing formatted output that closely approximates the

original scanned page including images, columns and other non-textual

components.

In 1929 Gustav Tauschek obtained a patent on OCR in Germany, followed

by Handel who obtained a US patent on OCR in USA in 1933 (U.S. Patent

1,915,993). In 1935 Tauschek was also granted a US patent on his

method (U.S. Patent 2,026,329). Tauschek's machine was a mechanical

device that used templates and a photodetector.

RCA engineers in 1949 worked on the first primitive computer-type OCR

to help blind people for the US Veterans Administration, but instead of

converting the printed characters to machine language, their device

converted it to machine language and then spoke the letters. It proved

far too expensive and was not pursued after testing.[1]

In 1950, David H. Shepard, a cryptanalyst at the Armed Forces Security

Agency in the United States, addressed the problem of converting

printed messages into machine language for computer processing and

built a machine to do this, reported in the Washington Daily News on 27

April 1951 and in the New York Times on 26 December 1953 after his U.S.

Patent 2,663,758 was issued. Shepard then founded Intelligent Machines

Research Corporation (IMR), which went on to deliver the world's first

several OCR systems used in commercial operation.

The first commercial system was installed at the Reader's Digest in 1955.

The second system was sold to the Standard Oil Company for reading

credit card imprints for billing purposes. Other systems sold by IMR

during the late 1950s included a bill stub reader to the Ohio Bell

Telephone Company and a page scanner to the United States Air Force

for reading and transmitting by teletype typewritten messages. IBM and

others were later licensed on Shepard's OCR patents.

In about 1965 Reader's Digest and RCA collaborated to build an OCR

Document reader designed to digitise the serial numbers on Reader's

Digest coupons returned from advertisements. The font used on the

documents were printed by an RCA Drum printer using the OCR-A font.

The reader was connected directly to an RCA 301 computer (one of the

first solid state computers). This reader was followed by a specialised

document reader installed at TWA where the reader processed Airline

Ticket stock. The readers processed document at a rate of 1,500

documents per minute, and checked each document, rejecting those it

was not able to process correctly. The product became part of the RCA

product line as a reader designed to process "Turn around Documents"

such as those Utility and insurance bills returned with payments.

The United States Postal Service has been using OCR machines to sort

mail since 1965 based on technology devised primarily by the prolific

inventor Jacob Rabinow. The first use of OCR in Europe was by the British

General Post Office (GPO). In 1965 it began planning an entire banking

system, the National Giro, using OCR technology, a process that

revolutionized bill payment systems in the UK. Canada Post has been

using OCR systems since 1971[citation needed]. OCR systems read the

name and address of the addressee at the first mechanised sorting

center, and print a routing bar code on the envelope based on the postal

code. To avoid confusion with the human-readable address field which

can be located anywhere on the letter, special ink (orange in visible light)

is used that is clearly visible under ultraviolet light. Envelopes may then

be processed with equipment based on simple barcode readers.

In 1974 Ray Kurzweil started the company Kurzweil Computer Products,

Inc. and led development of the first omni-font optical character

recognition system — a computer program capable of recognizing text

printed in any normal font. He decided that the best application of this

technology would be to create a reading machine for the blind, which

would allow blind people to have a computer read text to them out loud.

This device required the invention of two enabling technologies — the

CCD flatbed scanner and the text-to-speech synthesizer. On January 13,

1976 the successful finished product was unveiled during a

widely-reported news conference headed by Kurzweil and the leaders of

the National Federation of the Blind.

In 1978 Kurzweil Computer Products began selling a commercial version

of the optical character recognition computer program. LexisNexis was

one of the first customers, and bought the program to upload paper legal

and news documents onto its nascent online databases. Two years later,

Kurzweil sold his company to Xerox, which had an interest in further

commercializing paper-to-computer text conversion. Kurzweil Computer

Products became a subsidiary of Xerox known as Scansoft, now Nuance

Communications.