Legal claims defining the scope of protection, as filed with the USPTO.
1. A computer-implemented method comprising: identifying a plurality of non-person-name classification terms from a group of digital resources, each of the plurality of classification terms used to group the group of digital resources into a plurality of classification clusters, each particular classification cluster distinguished from other classification clusters based on the presence or absence of a classification term; grouping the group of digital resources into the plurality of classification clusters, each digital resource associated with a particular classification cluster based upon the inclusion in the digital resource of at least one of the plurality of classification terms; identifying, by operation of a computer, person names appearing in the digital resources associated with each classification cluster; determining a score of a probabilistic determination that two or more words are related to the particular person name; eliminating storing false positives in a name index by storing each particular name in the name index when the score of the probabilistic determination for the two or more words exceeds a defined threshold; identifying, by operation of a computer, at least two or more persons corresponding to a particular person name, each person identified based upon at least one particular common or related classification term relating particular classification clusters, where the particular classification clusters are distinct from any other classification clusters associated with any other person identified for the particular person name; identifying, by operation of a computer, groupings of digital resources associated with each person identified based on the digital resources associated with the related particular classification clusters associated with the respective person; and in response to receiving a search query related to a particular person name, initiating a display of a disambiguated listing of the at least two or more persons corresponding to the particular person name, each person listing comprising a nested listing of multiple digital resources associated with the respective person.
2. The method of claim 1 wherein each digital resource in the set of digital resources includes the particular person name.
3. The method of claim 1 wherein the search results interface includes: a first listing corresponding to the particular person name for a first person corresponding to the particular person name; and a second listing corresponding to the particular person name for a second person corresponding to the particular person name.
4. The method of claim 3 further comprising: receiving an input selecting the first person; and presenting a search results listing of digital resources associated with the first person in response to the input, wherein the listing of digital resources includes hyperlinks to each of the digital resources in the listing.
5. The method of claim 1 further comprising: selecting, by operation of a computer system, the classification terms from terms in the set of digital resources, including selecting the classification terms according to a relative uniqueness of each classification term in the set of digital resources.
6. The method of claim 1 wherein each classification cluster is identified based on an identification of relationships between classification terms in the set of digital resources.
7. The method of claim 1 wherein identifying person names associated with each classification cluster includes identifying a person name in a digital resource within a predetermined proximity of a classification term corresponding to the classification cluster.
8. The method of claim 1 wherein the set of classification terms for at least one of the clusters includes only a single classification term.
9. The method of claim 1 wherein identifying a person further comprises assigning a score to each occurrence of the person name in proximity to a classification term, wherein higher scores are assigned to occurrences involving the classification term in closer proximity to the person name.
10. The method of claim 1 wherein the person names are identified based on expected characteristics of person names.
11. A system comprising: a search engine operable to identify a plurality of digital resources satisfying a search query related to a particular person name; one or more computers including one or more computer storage devices storing instructions for causing the one or more computers to: identify a plurality of non-person-name classification terms from a group of digital resources, each of the plurality of classification terms used to group the group of digital resources into a plurality of classification clusters, each particular classification cluster distinguished from other classification clusters based on the presence or absence of a classification term; group the group of digital resources into the plurality of classification clusters, each digital resource associated with a particular classification cluster based upon the inclusion in the digital resource of at least one of the plurality of classification terms; identify person names appearing in the digital resources associated with each classification cluster; determine a score of a probabilistic determination that two or more words are related to the particular person name; eliminate storing false positives in a name index by storing each particular name in the name index when the score of the probabilistic determination for the two or more words exceeds a defined threshold; identify at least two or more persons corresponding to a particular person name, each person identified based upon at least one particular common or related classification term relating particular classification clusters, where the particular classification clusters are distinct from any other classification clusters associated with any other person identified for the particular person name; identify groupings of digital resources associated with each person identified based on the digital resources associated with the related particular classification clusters associated with the respective person; and in response to receiving a search query related to a particular person name, initiate a display of a disambiguated listing of the at least two or more persons corresponding to the particular person name, each person listing comprising a nested listing of multiple digital resources associated with the respective person.
12. The system of claim 11 wherein the display includes an indication that at least two persons have been identified.
13. The system of claim 11 wherein the display is adapted for presentation on a user interface of a user device communicating with the one or more computers over a wide area network.
14. The system of claim 11 further comprising one or more databases storing a term index including associations between classification terms identified in digital resources.
15. A tangible, non-transitory computer storage medium encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: identifying a plurality of non-person-name classification terms from a group of digital resources, each of the plurality of classification terms used to group the group of digital resources into a plurality of classification clusters, each particular classification cluster distinguished from other classification clusters based on the presence or absence of a classification term; grouping the group of digital resources into the plurality of classification clusters, each digital resource associated with a particular classification cluster based upon the inclusion in the digital resource of at least one of the plurality of classification terms; identifying person names appearing in the digital resources associated with each classification cluster; determining a score of a probabilistic determination that two or more words are related to the particular person name; eliminating storing false positives in a name index by storing each particular name in the name index when the score of the probabilistic determination for the two or more words exceeds a defined threshold; identifying at least two or more persons corresponding to a particular person name, each person identified based upon at least one particular common or related classification term relating particular classification clusters, where the particular classification clusters are distinct from any other classification clusters associated with any other person identified for the particular person name; identifying groupings of digital resources associated with each person identified based on the digital resources associated with the related particular classification clusters associated with the respective person; and in response to receiving a search query related to a particular person name, initiating a display of a disambiguated listing of the at least two or more persons corresponding to the particular person name, each person listing comprising a nested listing of multiple digital resources associated with the respective person.
16. The method of claim 1 further comprising: storing each classification term in a record of an index, each record including identification of at least one digital resource that includes the classification term; using the index to identify, in digital resources including the particular person name, at least a first classification term and a second classification term; using the index to identify a first cluster of digital resources in the digital resources including both the particular person name and the first classification term; and using the index to identify a second cluster of digital resources in the digital resources including both the particular person name and the second classification term.
17. The method of claim 16 wherein identifying a person is based at least in part on whether the particular person name appears within a predefined proximity of the first classification term in a digital resource.
18. The method of claim 1 wherein identifying classification terms in a digital resource includes identifying words in at least one predefined category of words.
19. The method of claim 18 wherein the at least one predefined category of words includes at least one of email address, URL, geographical name, or title.
20. The method of claim 1 wherein each digital resource includes text.
21. The method of claim 1 further comprising identifying a subset of the digital resources satisfying a search query.
Unknown
January 26, 2016
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.