A system for separating text and drawings in a digital ink file (e.g., a handwritten digital ink file). A stroke analyzer classifies single strokes that have been input by a user as “text” or “unknown.” The stroke analyzer utilizes a trainable classifier, such as a support vector machine. A grouping component is provided that groups text strokes in an attempt to form text objects (e.g., words, characters, or letters). The grouping component also groups unknown strokes in an attempt to form objects (e.g., shapes, drawings, or even text). A trainable classifier, such as a support vector machine, evaluates the grouped strokes to determine if they are objects.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A computer readable medium having computer-executable instructions, comprising, accessing a plurality of stroke samples, the stroke samples representing more than one class, wherein at least on class represented is a text class and at least one class represented is a drawing class; extracting curvature features of each of the strokes for each class; and using the curvature features, training a support vector machine to classify strokes for each class, wherein the curvature features of a stroke comprise a discrete curvature stroke, the discrete curvature being defined using a difference between angles determined in accordance with points along the stroke.
2. The computer readable medium of claim 1 , wherein the curvature features of a stroke comprise a tangent histogram of the stroke.
3. The computer readable medium of claim 1 , further comprising grouping some of the strokes of the plurality of strokes based upon a relative height threshold of the plurality of strokes.
4. The computer readable medium of claim 1 , further comprising grouping some of the strokes of the plurality of strokes based upon a relative aspect ratio of the plurality of strokes.
5. A computer readable medium having computer-executable instructions, comprising: accessing a digital ink file having at least one stroke therein; extracting curvature features of the at least one stroke; based upon an analysis of the curvature features, determining whether the at least one stroke is text by evaluating the stroke with a support vector machine; and based upon the curvature features, determining whether the at least one stroke is classified as an unknown stroke.
6. The computer readable medium of claim 5 , wherein the curvature features comprise the discrete curvature of the stroke.
7. The computer readable medium of claim 5 , further comprising: accessing a plurality of strokes in the digital ink file, and grouping some of the strokes of the plurality of strokes based upon local characteristics of the plurality of strokes to form grouped strokes.
8. The computer readable medium of claim 7 , wherein the grouped strokes are grouped based upon spatial information regarding the plurality of strokes.
9. The computer readable medium of claim 8 , wherein the spatial information comprises a distance threshold between strokes in the subset of the plurality of strokes.
10. The computer readable medium of claim 7 , wherein the grouped strokes are grouped based upon a relative height threshold of the strokes.
11. The computer readable medium of claim 7 , wherein the grouped strokes are grouped based upon a relative aspect ratio of the strokes.
12. The computer readable medium of claim 7 , wherein the grouped strokes are grouped based upon a normalized height of at least some of the plurality of strokes.
13. The computer readable medium of claim 7 , wherein the grouped strokes are grouped based upon a threshold distance between the strokes.
14. A computer readable medium having computer-executable instructions, comprising: accessing a digital ink file having at least one stroke therein; extracting the tangent histogram of the at least one stroke; based upon an analysis of the the tangent histogram, determining whether the at least one stroke is text; and based upon the the tangent histogram, determining whether the at least one stroke is classified as an unknown stroke.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
June 28, 2001
November 20, 2007
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.