The invention relates to a method for ascertaining error types for incorrect reading results from an OCR reader for text units which have a standard content structure and are subdivided into distinguishable sections, using true reference data. Reference data for the respective incorrectly read text unit are used for automatically ascertaining the respective text unit with the associated sections in a dictionary for the text units which contains a text unit, subdivided into individual, distinct sections, for each searchable entry. The reading result data are used to search the dictionary for a text unit with associated sections. The sections found with the respective corresponding reference sections are then compared pair by pair and the respective incorrect reading result is classified into stipulated error classes on the basis of the discrepancies ascertained in the pair by pair comparison.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A method for ascertaining error types for incorrect reading results from an OCR reader for text units which have a standard content structure and are subdivided into distinguishable sections, using true reference data, characterized in that the reference data for the respective incorrectly read text unit are used for automatically ascertaining the respective text unit with the associated sections in a dictionary for the text units which contains a text unit, subdivided into individual, distinct sections, for each searchable entry, in that the reading result data are used to search the dictionary for a text unit with associated sections, the sections found with the respective corresponding reference sections are compared pair by pair, and in that the respective incorrect reading result is classified into stipulated error classes on the basis of the discrepancies ascertained in the pair by pair comparison.
2. The method as claimed in claim 1 , characterized in that the error types are ascertained from a random reading sample.
3. The method as claimed in claim 1 , characterized in that the error classes ascertained are evaluated on a statistical basis.
4. The method as claimed in claim 1 , characterized in that the text units are addresses.
5. The method as claimed in claim 4 , characterized in that each entry in the address dictionary contains an address code.
6. The method as claimed in claim 5 , characterized in that the automatic reading involves production of an address code which is compared with an address code produced from the manual input data, with a reading error being diagnosed if there is no match.
7. The method as claimed in claim 5 , characterized in that the address code as reading result data is used to search the address dictionary for an address with the various address elements as sections of the text unit for comparison with the address elements from the manual input.
8. The method as claimed in claim 1 , characterized in that the true reference data for each text unit which is to be read are produced by manual input.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
March 13, 2002
August 9, 2005
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.