A system and method for providing a classification suggestion for concepts is provided. A corpus of concepts including reference concepts each associated with a classification and uncoded concepts are maintained. A cluster of uncoded concepts and reference concepts is provided. A neighborhood of reference concepts in the cluster is determined for at least one of the uncoded concepts. A classification of the neighborhood is determined using a classifier. The classification of the neighborhood is suggested as a classification for the at least one uncoded concept.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A method for providing a classification suggestion for concepts, comprising the steps of: maintaining a corpus of reference concepts, each associated with a classification; obtaining uncoded concepts, each comprising a collection of one or more nouns and noun phrases with a common semantic meaning that are extracted from one or more documents; generating a cluster of the uncoded concepts and at least one of the reference concepts; determining a neighborhood of one or more reference concepts in the cluster for at least one of the uncoded concepts; determining a classification for the neighborhood of the reference concepts using a classifier; suggesting the classification of the neighborhood as a classification for the at least one uncoded concept; receiving a classification code for the at least one uncoded concept from a reviewer; identifying a discordance between the received classification code for the at least one uncoded concept and the suggested classification code for the at least one uncoded concept when the received classification code is different from the suggested classification code; assigning an identifier to each of the received classification code and the suggested classification code; and displaying the at least one uncoded concept with the identifier for the received classification code and the identifier for the suggested classification code, wherein the steps are performed by a suitably programmed computer.
A method, implemented on a computer, suggests classifications for text concepts extracted from documents. It maintains a database of "reference concepts" (pre-classified text snippets) and "uncoded concepts" (new text snippets needing classification, made of nouns/noun phrases). The method groups similar uncoded and reference concepts into clusters. For each uncoded concept, it finds a "neighborhood" of related reference concepts within its cluster. It then uses a classifier to determine a classification for that neighborhood of reference concepts. This neighborhood's classification is then suggested for the uncoded concept. The system records reviewer feedback, flags disagreements between suggested and reviewer-provided classifications, and displays these discrepancies with identifiers for each classification.
2. The method according to claim 1 , further comprising: applying the suggested classification to the one or more documents associated with the at least one uncoded concept concept.
The method for suggesting classifications for text concepts, as described previously, further includes automatically applying the suggested classification to the documents from which the uncoded concept was extracted. This action automatically categorizes the documents based on the suggested classification of the extracted concept.
3. The method according to claim 1 , further comprising: determining a distance metric based on the similarity of each reference concept in the neighborhood to the at least one uncoded concept; and assigning the classification of the reference concept in the neighborhood with the closest distance metric as the classification of the neighborhood.
In the method for suggesting classifications for text concepts, as described previously, the classification of the neighborhood is determined by calculating a similarity score ("distance metric") between each reference concept in the neighborhood and the uncoded concept. The reference concept with the *closest* distance metric (highest similarity) dictates the classification assigned to the neighborhood, which is then suggested for the uncoded concept.
4. The method according to claim 1 , further comprising: determining a distance metric based on the similarity of each reference concept in the neighborhood to the at least one uncoded concept; summing the distance metrics of the reference concepts associated with the same classification; averaging the sums of the distance metrics in each classification; and assigning the classification of the reference concepts in the neighborhood with the closest average distance metric as the classification of the neighborhood.
In the method for suggesting classifications for text concepts, as described previously, a similarity score ("distance metric") is calculated between each reference concept in the neighborhood and the uncoded concept. These distances are grouped by classification. For each classification, the distances are summed, and then averaged. The classification with the *lowest average* distance (highest average similarity) to the uncoded concept is chosen as the classification for the neighborhood, which is then suggested for the uncoded concept.
5. The method according to claim 1 , further comprising: calculating a vote for each reference concept in the neighborhood; and assigning the classification of the reference concepts in the neighborhood with the highest calculated vote total as the classification of the neighborhood.
In the method for suggesting classifications for text concepts, as described previously, each reference concept in the neighborhood receives a "vote." The classification of reference concepts with the *highest total vote count* is assigned as the classification of the neighborhood, which is then suggested as the classification for the uncoded concept.
6. The method according to claim 1 , further comprising: calculating a vote for each reference concept in the neighborhood; determining a distance metric based on the similarity of each reference concept in the neighborhood to the at least one uncoded concept; differentially weighing the votes based on the distance metric; and assigning the classification of the reference concepts in the neighborhood with the highest differentially weighted vote total as the classification of the neighborhood.
In the method for suggesting classifications for text concepts, as described previously, each reference concept in the neighborhood receives a vote and a similarity score ("distance metric") measuring its similarity to the uncoded concept. The votes are then adjusted (weighted) based on these distances, giving more weight to closer (more similar) concepts. The classification of reference concepts with the *highest weighted vote total* is assigned as the classification of the neighborhood, which is then suggested for the uncoded concept.
7. The method according to claim 1 , further comprising: providing a confidence level of the suggested classification.
The method for suggesting classifications for text concepts, as described previously, also provides a "confidence level" indicating the reliability of the suggested classification. This represents a measure of how certain the system is about the accuracy of its suggestion.
8. The method according to claim 7 , further comprising: displaying the confidence level only when above a confidence level threshold.
In the method for suggesting classifications for text concepts, as described previously, the confidence level of the suggested classification is only shown to the user if it exceeds a predefined threshold. This prevents displaying suggestions with low confidence, improving the user experience.
9. The method according to claim 1 , wherein the neighborhood is determined based on one of inclusion, injection, and nearest neighbor.
In the method for suggesting classifications for text concepts, as described previously, the neighborhood of reference concepts is determined using one of the following methods: *inclusion* (including all related concepts), *injection* (adding specific related concepts), or *nearest neighbor* (selecting the closest concepts).
10. The method according to claim 1 , wherein the classifier is one of minimum distance, minimum average distance, maximum counts. and distance weighted maximum count.
In the method for suggesting classifications for text concepts, as described previously, the classifier used to determine the classification of the neighborhood is one of the following: *minimum distance* (closest single concept), *minimum average distance* (closest average distance), *maximum counts* (most frequent classification), or *distance weighted maximum count* (most frequent, weighted by distance).
11. A system for providing a classification suggestion for concepts, comprising: a database to store a corpus of reference concepts, each associated with a classification and uncoded concepts, each comprising a collection of one or more nouns and noun phrases with a common semantic meaning that are extracted from one or more documents; a clustering engine to generate a cluster of uncoded concepts and one or more of the reference concepts; and a processor to execute modules, comprising: a neighborhood module to determine a neighborhood of one or more reference concepts in the cluster for at least one of the uncoded concepts; a classification module to determine a classification for the neighborhood of the reference concepts using a classifier; a suggestion module to suggest the classification of the neighborhood as a classification for the at least one uncoded concept; a receipt module to receive a classification code for the at least one uncoded concept from a reviewer; a discordance module to identify a discordance between the received classification code for the at least one uncoded concept and the suggested classification code for the at least one uncoded concept when the received classification code is different from the suggested classification code; an identifier module to assign an identifier to each of the received classification code and the suggested classification code; and a display module to display the at least one uncoded concept with the identifier for the received classification code and the identifier for the suggested classification code.
A system for suggesting classifications for text concepts includes a database storing pre-classified "reference concepts" and unclassified "uncoded concepts" (nouns/noun phrases from documents). A "clustering engine" groups similar concepts together. A "neighborhood module" finds nearby reference concepts for each uncoded concept in a cluster. A "classification module" assigns a classification to this neighborhood using a classifier. A "suggestion module" proposes this classification for the uncoded concept. A "receipt module" gathers feedback from a human reviewer. A "discordance module" identifies disagreements between the suggested classification and the reviewer's classification. An "identifier module" assigns unique labels to each classification when a disagreement occurs. A "display module" then presents the uncoded concept along with both identifiers, highlighting the conflicting classifications.
12. The system according to claim 11 , further comprising: a marking module to apply the suggested classification to the one or more documents associated with the at least one uncoded concept.
The system for suggesting classifications for text concepts, as described previously, further includes a "marking module" that automatically applies the suggested classification to the documents from which the uncoded concept was extracted, thus automatically categorizing those documents.
13. The system according to claim 11 , further comprising: a distance module to determine a distance metric based on the similarity of each reference concept in the neighborhood to the at least one uncoded concept; and an assign module to assign the classification of the reference concept in the neighborhood with the closest distance metric as the classification of the neighborhood.
The system for suggesting classifications for text concepts, as described previously, includes a "distance module" that calculates a similarity score ("distance metric") between each reference concept in the neighborhood and the uncoded concept. An "assign module" assigns the classification of the *closest* reference concept (highest similarity) to the neighborhood, which is then suggested for the uncoded concept.
14. The system according to claim 11 , further comprising: a distance module to determine a distance metric based on the similarity of each reference concept in the neighborhood to the at least one uncoded concept; a calculation module to sum the distance metrics of the reference concepts associated with the same classification and to average the sums of the distance metrics in each classification; and an assign module to assign the classification of the reference concepts in the neighborhood with the closest average distance metric as the classification of the neighborhood.
The system for suggesting classifications for text concepts, as described previously, includes a "distance module" that calculates a similarity score ("distance metric") between each reference concept in the neighborhood and the uncoded concept. A "calculation module" sums the distances within each classification and then averages them. An "assign module" assigns the classification with the *lowest average distance* (highest average similarity) to the uncoded concept to the neighborhood, which is then suggested for the uncoded concept.
15. The system according to claim 11 , further comprising: a vote module to calculate a vote for each reference concept in the neighborhood; and an assign module to assign the classification of the reference concepts in the neighborhood with the highest calculated vote total as the classification of the neighborhood.
The system for suggesting classifications for text concepts, as described previously, includes a "vote module" that calculates a "vote" for each reference concept in the neighborhood. An "assign module" then assigns the classification with the *highest total vote count* to the neighborhood, which is then suggested for the uncoded concept.
16. The system according to claim 11 , further comprising: a vote module to calculate a vote for each reference concept in the neighborhood; a distance module to determine a distance metric based on the similarity of each reference concept in the neighborhood to the at least one uncoded concept; a weight module to differentially weigh the votes based on the distance metric; and an assign module to assign the classification of the reference concepts in the neighborhood with the highest differentially weighted vote total as the classification of the neighborhood.
The system for suggesting classifications for text concepts, as described previously, includes a "vote module" that calculates a "vote" for each reference concept. A "distance module" calculates a similarity score ("distance metric") between each reference concept and the uncoded concept. A "weight module" adjusts the votes based on these distances, giving more weight to closer concepts. An "assign module" then assigns the classification with the *highest weighted vote total* to the neighborhood, which is then suggested for the uncoded concept.
17. The system according to claim 11 , further comprising: a confidence module to provide a confidence level of the suggested classification.
The system for suggesting classifications for text concepts, as described previously, includes a "confidence module" that provides a "confidence level" indicating the reliability of the suggested classification, measuring the system's certainty.
18. The system according to claim 17 , further comprising a display to display the confidence level only when above a confidence level threshold.
The system for suggesting classifications for text concepts, as described previously, includes a display that shows the confidence level of the suggested classification *only* when it exceeds a predefined threshold. This avoids displaying low-confidence suggestions.
19. The system according to claim 11 , wherein the neighborhood is determined based on one of inclusion, injection, and nearest neighbor.
In the system for suggesting classifications for text concepts, as described previously, the neighborhood of reference concepts is determined using one of these methods: *inclusion* (all related concepts), *injection* (adding specific related concepts), or *nearest neighbor* (closest concepts).
20. The system according to claim 11 , wherein the classifier is one of minimum distance, minimum average distance, maximum counts, and distance weighted maximum count.
In the system for suggesting classifications for text concepts, as described previously, the classifier used to classify the neighborhood is one of: *minimum distance* (closest single concept), *minimum average distance* (closest average distance), *maximum counts* (most frequent classification), or *distance weighted maximum count* (most frequent, weighted by distance).
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
July 27, 2010
August 20, 2013
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.