Legal claims defining the scope of protection, as filed with the USPTO.
1. A computer-implemented system for clustering similar documents, comprising: concepts for a set of documents; an occurrence module to determine occurrence frequencies of each concept in the document set; a distance module to calculate an inner product quantifying a similarity for each of the documents in the set with one or more clusters of documents based on the occurrence frequencies of the concepts; a map module to map each document to each of the document clusters based on the inner product, to identify those documents with the smallest inner products as most relevant to a theme, and to generate a matrix as a representation of the document and cluster mappings; and a processor to execute the modules.
2. A system according to claim 1 , further comprising: an extraction module to extract terms from the documents; a concept determination module to generate the concepts from a subset of the extracted terms that satisfy threshold conditions for occurrences.
3. A system according to claim 1 , further comprising: a database record for each concept; and a database to store the database records.
4. A system according to claim 1 , further comprising: a relevance module to determine those concepts that are most relevant; and a summary module to summarize the most relevant concepts in a matrix that maps the concepts to clusters of documents.
5. A system according to claim 1 , further comprising: a lexicon generation module to build a lexicon of the concepts by mapping individual occurrences of each concept within one or more of the documents.
6. A system according to claim 5 , further comprising: a frequency table generation module to generate a frequency table from the lexicon by removing terms that occur only once in the documents and ordering the concepts by decreasing order.
7. A system according to claim 6 , further comprising: a histogram generation module to build from the frequency table, a histogram comprising a visualization of the frequencies of occurrences of the concepts extracted from each document.
8. A system according to claim 1 , further comprising: an occurrence module to determine a total frequency occurrence for the concepts by mapping each of the concepts across all the documents in the set.
9. A system according to claim 8 , further comprising: a concept graph generation module to generate a concept graph based on the total frequency occurrence for each concept by mapping the concepts in order of descending frequency of occurrence within a number of documents that reference that concept.
10. A system according to claim 1 , further comprising: a similarity module to quantify the similarity by comparing frequency occurrences for each of the concepts with concept weightings for each cluster.
11. A method for clustering similar documents, comprising the steps of: identifying concepts for a set of documents; determining occurrence frequencies of each concept in the document set; calculating an inner product quantifying a similarity for each of the documents in the set with one or more clusters of documents based on the occurrence frequencies of the concepts; mapping each document to each of the document clusters based on the inner product; identifying those documents with the smallest inner products as most relevant to a theme; and generating a matrix as a representation of the document and cluster mappings, wherein the steps are executed by a suitably-programmed computer.
12. A method according to claim 11 , further comprising: extracting terms from the documents; generating the concepts from a subset of the extracted terms that satisfy threshold conditions for occurrences.
13. A method according to claim 11 , further comprising: generating a database record for each concept; and storing the database records.
14. A method according to claim 11 , further comprising: determining those concepts that are most relevant; and summarizing the most relevant concepts in a matrix that maps the concepts to clusters of documents.
15. A method according to claim 11 , further comprising: building a lexicon of the concepts by mapping individual occurrences of each concept within one or more of the documents.
16. A method according to claim 15 , further comprising: generating a frequency table from the lexicon by removing terms that occur only once in the documents and ordering the concepts by decreasing order.
17. A method according to claim 16 , further comprising: building from the frequency table, a histogram comprising a visualization of the frequencies of occurrences of the concepts extracted from each document.
18. A method according to claim 11 , further comprising: determining a total frequency occurrence for the concepts by mapping each of the concepts across all the documents in the set.
19. A method according to claim 18 , further comprising: generating a concept graph based on the total frequency occurrence for each concept by mapping the concepts in order of descending frequency of occurrence within a number of documents that reference that concept.
20. A method according to claim 11 , further comprising: quantifying the similarity by comparing frequency occurrences for each concept with concept weightings for each of the clusters.
Unknown
May 13, 2014
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.