Date: Thu, 2 Apr 98 14:11:15 EST From: Mike Garris x2928 Subject: METTREC status Dear METTREC Request For Comment (RFC) respondent: I wish to thank you again for taking the time to respond to the METTREC RFC. As you will read below, your comments were very helpful. Many of you asked to be kept informed on the progress of the project. I'm sending this email to inform you that the METTREC project at NIST will be coming to a close. An objective explanation for this decision is provided below. Weighing heavily was the general lack of interest and motivation for participation in this type of an evaluation conference. Deliverables for this project will include the software tools we have developed along with a database containing images and text of the 1994 Federal Register. The software will be placed in the public domain and the data will be published and distributed as a NIST Special Database. I welcome your feedback and comments in regards to the information provided below, and on the project in general. Sincerely, Mike Garris mgarris@nist.gov @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ REASONS FOR CLOSING DOWN THE METTREC PROJECT 1. RECOGNIZING AND USING METADATA --------------------------------- Our primary interest in METTREC has been to pursue the use of automatically recognized metadata and measure its impact on information retrieval. Based on our experience over the passed year, we have observed: a.) An organized "OCR research community" no longer exists. There only remains a small number of commercial vendors competing in a "shrinking" market. b.) Very little research has been developed into technology tools for automatically detecting metadata in legacy paper documents that can be used for IR. c.) There is little motivation for OCR participation. d.) No one in the IR community is actively researching the use of metadata. It is acknowledged that metadata is interesting and might be useful, but no one is actually trying to exploit it. As an example, Donna Harman recently was a key note speaker at an IR workshop in Pittsburgh. She and her colleague explicitly raised the issue of what can/should be done with metadata. No one from the audience responded, and upon seeking individuals out on the side regarding this topic, and no one had any response. The NLP&IR Group at NIST is beginning to do research in these areas. Conclusion: Metadata cannot be readily detected with existing OCR technology, and the IR community is not prepared to address the use of metadata. Therefore, an OCR/IR metadata evaluation conference is not practical. 2. CONFERENCE PARTICIPATION --------------------------- The response to the METTREC Request For Comment (RFC) has been minimal. To date, there have been only 22 responses received from more than 1000 RFC's distributed. From the 10 who responded as being "potentially able" to participate: 5 are OCR respondents (3 of which are questionable), 4 are IR respondents (2 of which are questionable), and there was 1 metadata participant. A compilation of these responses have been included at the end of this report. Conclusion: The level of interest and the ability of people to participate is too low to continue with an evaluation conference. 3. IMPACT --------- NIST's is mandated with the task of developing technology that will impact private industry, and (while the need may be great) the above results point to Government as the only significant end user. Without a scientific focus on metadata or any significant commercial activity, NIST management has decided to discontinue long term effort in this area. NIST will, in the short term, publish and document the data and tools that have been developed on this project. These resource will be published and distributed as a NIST Special Database for research purposes. SUMMARY ------- 1. OCR is a post-competitive technology, and research of metadata for IR use has yet to be developed. 2. If the Government is the sole user of a technology, then the Government should expect to fully fund the development and evaluation costs of the technology.