The high-end market is much more fragmented, with vendors ranging from IBM Corp. to BBN Technologies, and re Recognition AG.
Low density languages are largely missing from the picture, with the exception of Creole and an interesting collaborative effort on Tibetan (web page now defunct). A good grab-bag of links is maintained by Jim Bilderback. The Yahoo list has a lot of the smaller players, but not the ones listed above.
The major centers of academic research include CEDAR at SUNY Buffalo, CENPARMI at Concordia University, ISL at University of Washington, and PRIP at Michigan State University.Several major conferences such as ICASSP, ICDAR, ICPR, and SPIE have significant OCR sessions. For handwriting recognition the main venue is IWFHR.
Significant OCR databases are available from the National Institute for Standards and Technology, CEDAR, and at ETL, the Electrotechnics Laboratory of Japan. A very limited amount is available at the University of California Irvine collection of machine learning tasks.
A brief overview of the field, with some references, can be found in the OCR article for the Oxford International Encyclopedia of Linguistics
We have a bit more open datasets/open competition now at the upcoming ICDAR
The above lists are not complete -- if you wish to add pointers to your institutional or personal pages please contact András Kornai.
Hindi font samples added for the Darpa TIDES surprise language exercise.
Last updated June 5 2003