Optical Character Recognition

Research in OCR and related fields such as document analysis or forms recognition is performed both at corporate research labs and in academic institutions, with occasionally formal, but more frequently informal collaboration between the two. An examle of such informal collaboration is the Bay Area OCR Interest Group. The announcement on the termination of the NIST METTREC effort provides an interesting perspective on the collaborative environment in OCR.

The major commercial vendors of shrinkwrap OCR solutions used to be Caere Corp. and Xerox Corp. but the recent acquisition of Caere by ScanSoft has effectively eliminated competition on the desktop.

The high-end market is much more fragmented, with vendors ranging from IBM Corp. to BBN Technologies, and re Recognition AG.

Low density languages are largely missing from the picture, with the exception of Creole and an interesting collaborative effort on Tibetan (web page now defunct). A good grab-bag of links is maintained by Jim Bilderback. The Yahoo list has a lot of the smaller players, but not the ones listed above.

The major centers of academic research include CEDAR at SUNY Buffalo, CENPARMI at Concordia University, ISL at University of Washington, and PRIP at Michigan State University.

Several major conferences such as ICASSP, ICDAR, ICPR, and SPIE have significant OCR sessions. For handwriting recognition the main venue is IWFHR.

Significant OCR databases are available from the National Institute for Standards and Technology, CEDAR, and at ETL, the Electrotechnics Laboratory of Japan. A very limited amount is available at the University of California Irvine collection of machine learning tasks.

A brief overview of the field, with some references, can be found in the OCR article for the Oxford International Encyclopedia of Linguistics

We have a bit more open datasets/open competition now at the upcoming ICDAR

The above lists are not complete -- if you wish to add pointers to your institutional or personal pages please contact András Kornai.

Hindi font samples added for the Darpa TIDES surprise language exercise.

Last updated June 5 2003

Back to Home Page