US-8515958

System and method for providing a classification suggestion for concepts

PublishedAugust 20, 2013

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A system and method for providing a classification suggestion for concepts is provided. A corpus of concepts including reference concepts each associated with a classification and uncoded concepts are maintained. A cluster of uncoded concepts and reference concepts is provided. A neighborhood of reference concepts in the cluster is determined for at least one of the uncoded concepts. A classification of the neighborhood is determined using a classifier. The classification of the neighborhood is suggested as a classification for the at least one uncoded concept.

Patent Claims

20 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method for providing a classification suggestion for concepts, comprising the steps of: maintaining a corpus of reference concepts, each associated with a classification; obtaining uncoded concepts, each comprising a collection of one or more nouns and noun phrases with a common semantic meaning that are extracted from one or more documents; generating a cluster of the uncoded concepts and at least one of the reference concepts; determining a neighborhood of one or more reference concepts in the cluster for at least one of the uncoded concepts; determining a classification for the neighborhood of the reference concepts using a classifier; suggesting the classification of the neighborhood as a classification for the at least one uncoded concept; receiving a classification code for the at least one uncoded concept from a reviewer; identifying a discordance between the received classification code for the at least one uncoded concept and the suggested classification code for the at least one uncoded concept when the received classification code is different from the suggested classification code; assigning an identifier to each of the received classification code and the suggested classification code; and displaying the at least one uncoded concept with the identifier for the received classification code and the identifier for the suggested classification code, wherein the steps are performed by a suitably programmed computer.

Plain English Translation

A method, implemented on a computer, suggests classifications for text concepts extracted from documents. It maintains a database of "reference concepts" (pre-classified text snippets) and "uncoded concepts" (new text snippets needing classification, made of nouns/noun phrases). The method groups similar uncoded and reference concepts into clusters. For each uncoded concept, it finds a "neighborhood" of related reference concepts within its cluster. It then uses a classifier to determine a classification for that neighborhood of reference concepts. This neighborhood's classification is then suggested for the uncoded concept. The system records reviewer feedback, flags disagreements between suggested and reviewer-provided classifications, and displays these discrepancies with identifiers for each classification.

Claim 2

Original Legal Text

2. The method according to claim 1 , further comprising: applying the suggested classification to the one or more documents associated with the at least one uncoded concept concept.

Plain English Translation

The method for suggesting classifications for text concepts, as described previously, further includes automatically applying the suggested classification to the documents from which the uncoded concept was extracted. This action automatically categorizes the documents based on the suggested classification of the extracted concept.

Claim 3

Original Legal Text

3. The method according to claim 1 , further comprising: determining a distance metric based on the similarity of each reference concept in the neighborhood to the at least one uncoded concept; and assigning the classification of the reference concept in the neighborhood with the closest distance metric as the classification of the neighborhood.

Plain English Translation

In the method for suggesting classifications for text concepts, as described previously, the classification of the neighborhood is determined by calculating a similarity score ("distance metric") between each reference concept in the neighborhood and the uncoded concept. The reference concept with the *closest* distance metric (highest similarity) dictates the classification assigned to the neighborhood, which is then suggested for the uncoded concept.

Claim 4

Original Legal Text

4. The method according to claim 1 , further comprising: determining a distance metric based on the similarity of each reference concept in the neighborhood to the at least one uncoded concept; summing the distance metrics of the reference concepts associated with the same classification; averaging the sums of the distance metrics in each classification; and assigning the classification of the reference concepts in the neighborhood with the closest average distance metric as the classification of the neighborhood.

Plain English Translation

In the method for suggesting classifications for text concepts, as described previously, a similarity score ("distance metric") is calculated between each reference concept in the neighborhood and the uncoded concept. These distances are grouped by classification. For each classification, the distances are summed, and then averaged. The classification with the *lowest average* distance (highest average similarity) to the uncoded concept is chosen as the classification for the neighborhood, which is then suggested for the uncoded concept.

Claim 5

Original Legal Text

5. The method according to claim 1 , further comprising: calculating a vote for each reference concept in the neighborhood; and assigning the classification of the reference concepts in the neighborhood with the highest calculated vote total as the classification of the neighborhood.

Plain English Translation

In the method for suggesting classifications for text concepts, as described previously, each reference concept in the neighborhood receives a "vote." The classification of reference concepts with the *highest total vote count* is assigned as the classification of the neighborhood, which is then suggested as the classification for the uncoded concept.

Claim 6

Original Legal Text

6. The method according to claim 1 , further comprising: calculating a vote for each reference concept in the neighborhood; determining a distance metric based on the similarity of each reference concept in the neighborhood to the at least one uncoded concept; differentially weighing the votes based on the distance metric; and assigning the classification of the reference concepts in the neighborhood with the highest differentially weighted vote total as the classification of the neighborhood.

Plain English Translation

In the method for suggesting classifications for text concepts, as described previously, each reference concept in the neighborhood receives a vote and a similarity score ("distance metric") measuring its similarity to the uncoded concept. The votes are then adjusted (weighted) based on these distances, giving more weight to closer (more similar) concepts. The classification of reference concepts with the *highest weighted vote total* is assigned as the classification of the neighborhood, which is then suggested for the uncoded concept.

Claim 7

Original Legal Text

7. The method according to claim 1 , further comprising: providing a confidence level of the suggested classification.

Plain English Translation

The method for suggesting classifications for text concepts, as described previously, also provides a "confidence level" indicating the reliability of the suggested classification. This represents a measure of how certain the system is about the accuracy of its suggestion.

Claim 8

Original Legal Text

8. The method according to claim 7 , further comprising: displaying the confidence level only when above a confidence level threshold.

Plain English Translation

In the method for suggesting classifications for text concepts, as described previously, the confidence level of the suggested classification is only shown to the user if it exceeds a predefined threshold. This prevents displaying suggestions with low confidence, improving the user experience.

Claim 9

Original Legal Text

9. The method according to claim 1 , wherein the neighborhood is determined based on one of inclusion, injection, and nearest neighbor.

Plain English Translation

In the method for suggesting classifications for text concepts, as described previously, the neighborhood of reference concepts is determined using one of the following methods: *inclusion* (including all related concepts), *injection* (adding specific related concepts), or *nearest neighbor* (selecting the closest concepts).

Claim 10

Original Legal Text

10. The method according to claim 1 , wherein the classifier is one of minimum distance, minimum average distance, maximum counts. and distance weighted maximum count.

Plain English Translation

In the method for suggesting classifications for text concepts, as described previously, the classifier used to determine the classification of the neighborhood is one of the following: *minimum distance* (closest single concept), *minimum average distance* (closest average distance), *maximum counts* (most frequent classification), or *distance weighted maximum count* (most frequent, weighted by distance).

Claim 11

Original Legal Text

11. A system for providing a classification suggestion for concepts, comprising: a database to store a corpus of reference concepts, each associated with a classification and uncoded concepts, each comprising a collection of one or more nouns and noun phrases with a common semantic meaning that are extracted from one or more documents; a clustering engine to generate a cluster of uncoded concepts and one or more of the reference concepts; and a processor to execute modules, comprising: a neighborhood module to determine a neighborhood of one or more reference concepts in the cluster for at least one of the uncoded concepts; a classification module to determine a classification for the neighborhood of the reference concepts using a classifier; a suggestion module to suggest the classification of the neighborhood as a classification for the at least one uncoded concept; a receipt module to receive a classification code for the at least one uncoded concept from a reviewer; a discordance module to identify a discordance between the received classification code for the at least one uncoded concept and the suggested classification code for the at least one uncoded concept when the received classification code is different from the suggested classification code; an identifier module to assign an identifier to each of the received classification code and the suggested classification code; and a display module to display the at least one uncoded concept with the identifier for the received classification code and the identifier for the suggested classification code.

Plain English Translation

A system for suggesting classifications for text concepts includes a database storing pre-classified "reference concepts" and unclassified "uncoded concepts" (nouns/noun phrases from documents). A "clustering engine" groups similar concepts together. A "neighborhood module" finds nearby reference concepts for each uncoded concept in a cluster. A "classification module" assigns a classification to this neighborhood using a classifier. A "suggestion module" proposes this classification for the uncoded concept. A "receipt module" gathers feedback from a human reviewer. A "discordance module" identifies disagreements between the suggested classification and the reviewer's classification. An "identifier module" assigns unique labels to each classification when a disagreement occurs. A "display module" then presents the uncoded concept along with both identifiers, highlighting the conflicting classifications.

Claim 12

Original Legal Text

12. The system according to claim 11 , further comprising: a marking module to apply the suggested classification to the one or more documents associated with the at least one uncoded concept.

Plain English Translation

The system for suggesting classifications for text concepts, as described previously, further includes a "marking module" that automatically applies the suggested classification to the documents from which the uncoded concept was extracted, thus automatically categorizing those documents.

Claim 13

Original Legal Text

13. The system according to claim 11 , further comprising: a distance module to determine a distance metric based on the similarity of each reference concept in the neighborhood to the at least one uncoded concept; and an assign module to assign the classification of the reference concept in the neighborhood with the closest distance metric as the classification of the neighborhood.

Plain English Translation

The system for suggesting classifications for text concepts, as described previously, includes a "distance module" that calculates a similarity score ("distance metric") between each reference concept in the neighborhood and the uncoded concept. An "assign module" assigns the classification of the *closest* reference concept (highest similarity) to the neighborhood, which is then suggested for the uncoded concept.

Claim 14

Original Legal Text

14. The system according to claim 11 , further comprising: a distance module to determine a distance metric based on the similarity of each reference concept in the neighborhood to the at least one uncoded concept; a calculation module to sum the distance metrics of the reference concepts associated with the same classification and to average the sums of the distance metrics in each classification; and an assign module to assign the classification of the reference concepts in the neighborhood with the closest average distance metric as the classification of the neighborhood.

Plain English Translation

The system for suggesting classifications for text concepts, as described previously, includes a "distance module" that calculates a similarity score ("distance metric") between each reference concept in the neighborhood and the uncoded concept. A "calculation module" sums the distances within each classification and then averages them. An "assign module" assigns the classification with the *lowest average distance* (highest average similarity) to the uncoded concept to the neighborhood, which is then suggested for the uncoded concept.

Claim 15

Original Legal Text

15. The system according to claim 11 , further comprising: a vote module to calculate a vote for each reference concept in the neighborhood; and an assign module to assign the classification of the reference concepts in the neighborhood with the highest calculated vote total as the classification of the neighborhood.

Plain English Translation

The system for suggesting classifications for text concepts, as described previously, includes a "vote module" that calculates a "vote" for each reference concept in the neighborhood. An "assign module" then assigns the classification with the *highest total vote count* to the neighborhood, which is then suggested for the uncoded concept.

Claim 16

Original Legal Text

16. The system according to claim 11 , further comprising: a vote module to calculate a vote for each reference concept in the neighborhood; a distance module to determine a distance metric based on the similarity of each reference concept in the neighborhood to the at least one uncoded concept; a weight module to differentially weigh the votes based on the distance metric; and an assign module to assign the classification of the reference concepts in the neighborhood with the highest differentially weighted vote total as the classification of the neighborhood.

Plain English Translation

The system for suggesting classifications for text concepts, as described previously, includes a "vote module" that calculates a "vote" for each reference concept. A "distance module" calculates a similarity score ("distance metric") between each reference concept and the uncoded concept. A "weight module" adjusts the votes based on these distances, giving more weight to closer concepts. An "assign module" then assigns the classification with the *highest weighted vote total* to the neighborhood, which is then suggested for the uncoded concept.

Claim 17

Original Legal Text

17. The system according to claim 11 , further comprising: a confidence module to provide a confidence level of the suggested classification.

Plain English Translation

The system for suggesting classifications for text concepts, as described previously, includes a "confidence module" that provides a "confidence level" indicating the reliability of the suggested classification, measuring the system's certainty.

Claim 18

Original Legal Text

18. The system according to claim 17 , further comprising a display to display the confidence level only when above a confidence level threshold.

Plain English Translation

The system for suggesting classifications for text concepts, as described previously, includes a display that shows the confidence level of the suggested classification *only* when it exceeds a predefined threshold. This avoids displaying low-confidence suggestions.

Claim 19

Original Legal Text

19. The system according to claim 11 , wherein the neighborhood is determined based on one of inclusion, injection, and nearest neighbor.

Plain English Translation

In the system for suggesting classifications for text concepts, as described previously, the neighborhood of reference concepts is determined using one of these methods: *inclusion* (all related concepts), *injection* (adding specific related concepts), or *nearest neighbor* (closest concepts).

Claim 20

Original Legal Text

20. The system according to claim 11 , wherein the classifier is one of minimum distance, minimum average distance, maximum counts, and distance weighted maximum count.

Plain English Translation

In the system for suggesting classifications for text concepts, as described previously, the classifier used to classify the neighborhood is one of: *minimum distance* (closest single concept), *minimum average distance* (closest average distance), *maximum counts* (most frequent classification), or *distance weighted maximum count* (most frequent, weighted by distance).

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06N G06F

Patent Metadata

Filing Date

July 27, 2010

Publication Date

August 20, 2013

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search