Patentable/Patents/US-6345119
US-6345119

Handwritten character recognition apparatus and method using a clustering algorithm

PublishedFebruary 5, 2002
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

For a plurality of handwritten characters extracted from an input image, a character category for each character is first determined by a character recognition process. Second, according to a clustering process, similarity levels of character-forms among extracted characters are determined, and based on the determination result, the character category determination result from the first character recognition process is modified.

Patent Claims
25 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A character recognition apparatus recognizing a handwritten character in image data in a document, comprising: a same writer entry area extracting unit extracting an area to which characters are written by a same writer from the image data in the document; a feature vector computing unit computing a feature vector of each character in all characters in the area extracted by said same writer entry area extracting unit; a character recognition unit determining a character category of each character based on a distance between the feature vector of each character computed by said feature vector computing unit and a reference vector entered in a dictionary; a clustering unit recognizing each feature vector to be a cluster, and performing a clustering process by sequentially integrating clusters whose distance is the closest; a clustering unit recognizing each feature vector to be a cluster, and performing a clustering process by sequentially integrating clusters whose distance is the closest; a comparing unit comparing a clustering result from the clustering unit with the character recognition result from the character recognition unit and outputting a compared result; a character recognition result amending unit amending the character category of a first integrated cluster to the character category of a second integrated cluster using the compared result when the integrated clusters are different from each other and outputting an amended character recognition result indicative thereof; and a clustering result storage unit storing the clustering result from the clustering unit, wherein the character recognition result amending unit amends the clustering result stored in the clustering result storage unit to the amended character recognition result.

2

2. The character recognition apparatus according to claim 1 , wherein said character recognition result amending unit comprises: a cluster integrating unit comparing a distance between clusters in all categories based on a process result from said clustering unit, and integrating the clusters indicating a between cluster distance smaller than a predetermined threshold; a within-cluster character category determining unit determining whether or not the character categories of the clusters integrated by said cluster integrating unit are different based on the all character recognition result from said character recognition unit; and a character category amending unit amending the character category of a first cluster containing a smaller number of elements to the character category of a second cluster when said within-cluster character category determining unit determines that the integrated clusters are different from each other.

3

3. The character recognition apparatus according to claim 1 , wherein said clustering unit comprises: a hierarchical clustering unit sequentially integrating characters similar to each other in feature vector as a cluster by performing a hierarchical clustering process on the feature vectors in a character category; and a clustering stop unit stopping the hierarchical clustering process when a number of clusters in a category reaches a predetermined value after performing the hierarchical clustering process by said hierarchical clustering unit.

4

4. The character recognition apparatus according to claim 1 , wherein said clustering unit comprises: a hierarchical clustering unit sequentially integrating characters similar to each other in feature vector as a cluster by performing a hierarchical clustering process on the feature vectors in a character category; and a clustering stop unit stopping the hierarchical clustering process when a between-cluster distance at a time of cluster integration is equal to or larger than a predetermined threshold in the hierarchical clustering process performed by said hierarchical clustering unit.

5

5. The character recognition apparatus according to claim 1 , wherein said clustering unit comprises: a hierarchical clustering unit sequentially integrating characters similar to each other in feature vector as a cluster by performing a hierarchical clustering process on the feature vectors in a character category; and a clustering stop unit stopping the hierarchical clustering process when an increase ratio of a between-cluster distance at a time of cluster integration is equal to or larger than a predetermined threshold in the hierarchical clustering process performed by said hierarchical clustering unit.

6

6. A character recognition apparatus for recognizing a character category of a number of input characters input by a same writer, comprising: a computing unit computing a feature vector of an input character; a recognition unit computing a distance value between the feature vector computed by said computing unit and a reference vector of each character category entered in a dictionary, recognizing the character category indicating a smallest distance value as the character category of the input character, and rejecting the input character as unrecognizable when a value of distance from the recognized character category is not equal to or not smaller than a predetermined value; a clustering unit generating one or more clusters for each character category by clustering the feature vectors of all input characters computed by said computing unit each character category recognized by said recognition unit based on a value of distance between feature vectors calculated by said computing unit and sequentially integrating clusters whose distance is the closest; an amending unit amending the character category of an integrated cluster being different from other integrated clusters in character category based on a process result from said clustering unit, the feature vector computed by said computing unit, the character category recognized by said recognition unit, and the value of distance computed by said recognition unit between the feature vector computed by said computing unit and the reference vector of each character category and outputting an amended result indicative thereof; and a clustering result storage unit storing the process result from the clustering unit, wherein the amending unit amends the process result stored in the clustering result storage unit to the amended result.

7

7. The character recognition apparatus according to claim 6 , wherein said amending unit comprises: an extracting unit extracting a cluster which can be misrecognized with a high possibility from the clusters generated by said clustering unit; a specifying unit specifying a cluster in another character category which is closest to the extracted cluster, where a total number of characters in the specified cluster is more than a threshold; and a changing unit changing a recognized character category of an input character, which belongs to the cluster extracted by said extracting unit and comprising a distance value equal to or larger than a predetermined value, into a character category that the specified cluster belongs to if the distance value between the cluster extracted by said extracting unit and the cluster specified by said specifying unit is less than the predetermined value.

8

8. The character recognition apparatus according to claim 7 , wherein said amending unit further comprises a second changing unit changing the input character belonging to the cluster extracted by said extracting unit into a rejected character when values of distance between the cluster extracted by said extracting unit and all clusters of all other character categories are equal to or larger than the predetermined value.

9

9. The character recognition apparatus according to claim 6 , wherein said amending unit comprises: a specifying unit specifying a cluster closest to the input character rejected by said recognition unit and belonging to another character category using the feature vector computed by said computing unit; and a changing unit changing an input character rejected by said recognition unit and indicating a value of distance from the recognized character category equal to or smaller than a predetermined value into a recognizable character and setting the character category of the cluster specified by said specifying unit as the recognized character category of the changed input character when a value of distance between the input character rejected by said recognition unit and the cluster specified by said specifying unit is equal to or smaller than the predetermined value.

10

10. The character recognition apparatus according to claim 7 , wherein said extracting unit extracts a cluster containing a number of input characters equal to or smaller than a predetermined value as a cluster which can be misrecognized with a high possibility.

11

11. The character recognition apparatus according to claim 7 , wherein said extracting unit extracts a cluster, as a cluster which can be misrecognized with a high possibility, containing a number of input characters equal to or smaller than a multiple of a predetermined constant of a total number of input characters of the character category to which the cluster belongs.

12

12. The character recognition apparatus according to claim 7 , wherein said extracting unit extracts, as a cluster which can be misrecognized with a high possibility, a cluster containing a number of input characters equal to or smaller than a predetermined value and indicating a value of distance between a character category of the cluster and other clusters equal to or larger than a predetermined value.

13

13. The character recognition apparatus according to claim 7 , wherein said extracting unit extracts a cluster, as a cluster which can be misrecognized with a high possibility, containing a number of input characters equal to or smaller than a multiple of a predetermined constant of a total number of input characters of the character category to which the cluster belongs, and equal to or larger than a predetermined value of distance from another cluster of the character category.

14

14. The character recognition apparatus according to claim 7 , wherein said specifying unit preliminarily determines another character category to be specified depending on the character category of a specification character category.

15

15. The character recognition apparatus according to claim 7 , wherein said specifying unit limits another character category to be specified to a high-order character category whose distance value is less than or equal to a predetermined threshold computed by said recognition unit.

16

16. The character recognition apparatus according to claim 6 , wherein said clustering unit first generates a cluster by performing a clustering process in a hierarchical clustering method, and then performs the clustering process in a non-hierarchical clustering method using the cluster as an initial state.

17

17. A character recognition apparatus comprising: a recognition unit recognizing a category of a character extracted from an input image; a clustering unit generating at least one cluster from characters extracted from the input image by comparing the characters extracted from the input image with each other; a cluster extracting unit extracting a cluster containing a number of elements smaller than a predetermined value from the cluster generated by said clustering unit; a between-cluster distance computing unit computing a between-cluster distance between a first cluster belonging to a first category extracted by said cluster extracting unit and a second cluster belonging to a second category generated by said clustering unit; a category amending unit integrating the first cluster into the second cluster when a value of the between-cluster distance between the first cluster and the second cluster is equal to or smaller than a predetermined value, and amending, from the first category to the second category, the category belonging to the first cluster and outputting an amended result indicative thereof; and a clustering result storage unit storing the cluster generated by the clustering unit, wherein the category amending unit amends the cluster stored in the clustering result storage unit to the amended result.

18

18. The character recognition apparatus according to claim 17 , further comprising: a rejecting unit rejecting a character belonging to a third cluster by extracting the third cluster whose between-cluster distance is larger or equal to a predetermined threshold from another cluster in a same category; and a rejection amending unit computing distance values between the character rejected by said rejecting unit and the clusters generated by said clustering unit, integrating the character rejected by said rejecting unit into a fourth cluster, wherein the fourth cluster is approximate to the character rejected by said rejecting unit, and amending the category of the character rejected by said rejecting unit to a third category to which the fourth cluster belongs.

19

19. The character recognition apparatus according to claim 17 , wherein said character is a character handwritten by a same writer.

20

20. A character recognition apparatus comprising: a recognition unit recognizing a category of a character extracted from an input image; a clustering unit generating at least one cluster from characters extracted from the input image by comparing the characters extracted from the input image with each other; a cluster extracting unit extracting a cluster containing a number of elements smaller than a predetermined value from the cluster generated by said clustering unit; a between-cluster distance computing unit computing a between-cluster distance between a first cluster belonging to a first category and containing a number of elements equal to or smaller than a predetermined value and a second cluster belonging to a second category generated by said clustering unit; a recognition reliability obtaining unit obtaining recognition reliability of a character belonging to the first cluster from said recognition unit when a value of the between-cluster distance between the first cluster and the second cluster is equal to or smaller than a predetermined value; a category amending unit amending, from the first category to the second category, the category of the character belonging to the first cluster comprising the recognition reliability equal to or smaller than a predetermined value and outputting an amended result indicative thereof; and a clustering result storage unit storing the cluster generated by the clustering unit, wherein the category amending unit amends the cluster stored in the clustering result storage unit to the amended result.

21

21. A method of recognizing a character, comprising: recognizing a category of a character extracted from an input image; generating at least one cluster from characters extracted from the input image by comparing the characters extracted from the input image with each other; extracting a cluster containing elements equal to or smaller than a predetermined value from the cluster generated; computing a between-cluster distance between a first cluster belonging to a first category and containing elements equal to or smaller than a predetermined value and a second cluster belonging to a second category; integrating the first cluster into the second cluster when a value of the between-cluster distance between the first cluster and the second cluster is equal to or smaller than a predetermined value; amending from the first category to the second category the category belonging to the first cluster and outputting an amended result indicative thereof; storing the cluster generated; and amending the stored cluster generated to the amended result.

22

22. A method of recognizing a character comprising: recognizing a category of a character extracted from an input image; generating at least one cluster from characters extracted from the input image by comparing the characters extracted from the input image with each other; extracting a cluster containing elements equal to or smaller than a predetermined value from the cluster generated; computing a between-cluster distance between a first cluster belonging to a first category and containing elements equal to or smaller than a predetermined value and a second cluster belonging to a second category; obtaining recognition reliability of a character belonging to the first cluster when a value of the between-cluster distance between the first cluster and the second cluster is equal to or smaller than a predetermined value; amending, from the first category to the second category, the category of the character belonging to the first cluster comprising the recognition reliability equal to or smaller than a predetermined value and outputting an amended result indicative thereof; storing the cluster generated; and amending the stored cluster generated to the amended result.

23

23. A computer-readable medium used to direct a computer to perform: recognizing a category of a character extracted from an input image; generating at least one cluster from characters extracted from the input image by comparing the characters extracted from the input image with each other; extracting a cluster containing elements equal to or smaller than a predetermined value from the cluster generated; computing a between-cluster distance between a first cluster belonging to a first category and containing elements equal to or smaller than a predetermined value and a second cluster belonging to a second category; integrating the first cluster into the second cluster when a value of the between-cluster distance between the first cluster and the second cluster is equal to or smaller than a predetermined value; amending from the first category to the second category the category belonging to the first cluster and outputting an amended result indicative thereof; storing the cluster generated; and amending the stored cluster generated to the amended result.

24

24. A computer-readable medium used to direct a computer to perform: recognizing a category of a character extracted from an input image; generating at least one cluster from characters extracted from the input image by comparing the characters extracted from the input image with each other; extracting a cluster containing elements equal to or smaller than a predetermined value from the cluster generated; computing a between-cluster distance between a first cluster belonging to a first category and containing elements equal to or smaller than a predetermined value and a second cluster belonging to a second category; obtaining recognition reliability of a character belonging to the first cluster when a value of the between-cluster distance between the first cluster and the second cluster is equal to or smaller than a predetermined value; amending, from the first category to the second category, the category of the character belonging to the first cluster comprising the recognition reliability equal to or smaller than a predetermined value and outputting an amended result indicative thereof; storing the cluster generated; and amending the stored cluster generated to the amended result.

25

25. The method of recognizing a character according to claim 21 , further comprising: extracting a third cluster whose between-cluster distance is larger or equal to a predetermined threshold from another cluster in a same category; rejecting a character belonging to the third cluster; computing distance values between the character rejected and the clusters generated; integrating the character rejected into a fourth cluster, wherein the fourth cluster is approximate to the character rejected; and amending a category of the character rejected to a third category to which the fourth cluster belongs.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

February 18, 1997

Publication Date

February 5, 2002

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Handwritten character recognition apparatus and method using a clustering algorithm” (US-6345119). https://patentable.app/patents/US-6345119

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.