Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating descriptions of relationships between entities. In one aspect, a method includes identifying one or more related entities for a particular entity based at least in part on commonalities between the particular entity and the one or more related entities, sorting the commonalities according to a measure of uniqueness of each of the commonalities, and identifying a subset of the commonalities having a measure of uniqueness above a lower measure of uniqueness threshold. The identified subset of commonalities can include one or more commonalities. One or more commonalities can be selected from the subset of commonalities as indicative of a relationship to the particular entity, and a description of the relationship can be identified based on the selected one or more commonalities.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A method performed by data processing apparatus, the method comprising: identifying one or more related entities for a particular entity based at least in part on (A) a plurality of commonalities between the particular entity and the one or more related entities and (B) a plurality of categories of potential commonalities, with each category of potential commonalities including a respective plurality of commonalities between the particular entity and one or more related entities, comprising: sorting the plurality of commonalities according to a measure of uniqueness of each of the plurality of commonalities; selecting a particular category of potential commonalities, wherein the plurality of commonalities includes the respective plurality of commonalities for the particular category of potential commonalities, including: ranking the plurality of categories of potential commonalities according to a contribution of the respective plurality of commonalities for each category of potential commonalities to a relatedness of the one or more related entities to the particular entity; and selecting the particular category of potential commonalities based on the ranking; identifying a subset of the sorted plurality of commonalities having a measure of uniqueness above a lower measure of uniqueness threshold, wherein the identified subset of commonalities includes one or more commonalities; selecting one or more commonalities from the subset of commonalities as indicative of a relationship to the particular entity; and identifying a description of the relationship based on the selected one or more commonalities.
2. The method of claim 1 wherein ranking the plurality of categories of potential commonalities according to the contribution of the respective plurality of commonalities to the relatedness of the one or more related entities to the particular entity includes weighting each category of potential commonalities according to predetermined category weights.
3. The method of claim 2 wherein the predetermined category weights are determined by: receiving ratings of a relatedness of entities; calculating a plurality of similarity scores for the entities, with each similarity score corresponding to each of the plurality of categories of potential commonalities; and performing a linear regression analysis using the plurality of similarity scores and the ratings of the relatedness of the entities to calculate the category weights.
4. The method of claim 1 wherein the measure of uniqueness of each of the plurality of commonalities includes a quantity of related entities that share a commonality in the plurality of commonalities.
5. The method of claim 4 wherein identifying a subset of the sorted plurality of commonalities includes: calculating an average measure of uniqueness for the sorted plurality of commonalities; and identifying a commonality having a measure of uniqueness within about a standard deviation of the average measure of uniqueness.
6. The method of claim 1 wherein the measure of uniqueness of each of the plurality of commonalities relates to a frequency of description of the commonality in a corpus of resources.
7. The method of claim 1 wherein selecting one or more commonalities from the subset of commonalities as indicative of a relationship to the particular entity includes at least one of: identifying commonalities of potential interest to a user; identifying commonalities associated with trusted information sources; or selecting a plurality of commonalities, with each selected commonality corresponding to a different category of potential commonalities.
8. The method of claim 1 wherein the plurality of commonalities is selected for a user based on prior interactions by the user.
9. The method of claim 1 wherein identifying a description of the relationship based on the selected one or more commonalities includes identifying text associated with the selected one or more commonalities.
10. The method of claim 1 further comprising displaying the description of the relationship in a user interface in association with an identifier of the particular entity.
11. The method of claim 1 wherein the plurality of commonalities are selected from one or more categories of potential commonalities including: references to the particular entity and one or more related entities in a common web page; references to the particular entity and one or more related entities by a particular content author; identification of the particular entity and one or more related entities in a common web browsing session; one or more common categories associated with the particular entity and one or more related entities; one or more common attributes associated with the particular entity and one or more related entities; one or more common terms identified as representative of the particular entity and one or more related entities; an association of the particular entity and one or more related entities within a hierarchical entity structure; common sentiment phrases extracted from documents associated with the particular entity and documents associated with one or more related entities; an association of a waypoint for the particular entity and waypoints for one or more related entities with a user-defined map; or an identification of a web page associated with the particular entity and webpages associated with the one or more related entities as similar web pages.
12. The method of claim 1 , further comprising: obtaining a user-provided search query that identifies the particular entity; and responsive to the user-provided search query: presenting, to a user, information identifying (i) a related entity in the one or more related entities and (ii) a description of a relationship between the related entity and the particular entity.
13. The method of claim 12 wherein an entity is one of a place, a business, a geographical location, an organizations, or a person.
14. A non-transitory computer storage medium encoded with a computer program, the program comprising instructions that when executed by data processing apparatus cause the data processing apparatus to perform operations including: receiving an identification of one or more related entities for a particular entity, wherein the related entities are identified based at least in part on a plurality of commonalities between the particular entity and the one or more related entities; determining a relative contribution for each of a plurality of categories of commonalities to a level of relatedness between the particular entity and the one or more related entities; eliminating at least a portion of the plurality of commonalities to generate a subset of commonalities based, at least in part, on (A) a frequency of occurrence of the commonalities in a corpus of resources above a threshold frequency level and (B) a relatively low contribution of one or more categories of commonalities to the level of relatedness; selecting one or more commonalities from the subset of commonalities as indicative of a relationship to the particular entity; and identifying a description of the relationship based on the selected one or more commonalities.
15. The computer storage medium of claim 14 wherein the threshold frequency level is based on an average frequency of occurrence of the plurality of commonalities.
16. The computer storage medium of claim 14 wherein selecting one or more commonalities from the subset of commonalities as indicative of a relationship to the particular entity is performed for a particular user based at least in part on data identifying prior interactions by the user with one or more web documents.
17. The computer storage medium of claim 14 wherein the operations further including: obtaining a user-provided search query that identifies the particular entity; and responsive to the user-provided search query: presenting, to a user, information identifying (i) a related entity in the one or more related entities and (ii) a description of a relationship between the related entity and the particular entity.
18. The system of claim 17 wherein an entity is one of a place, a business, a geographical location, an organizations, or a person.
19. A system comprising: one or more related entity identification servers adapted to identify related entities based on a plurality of commonalities between a first entity and a second entity; one or more processing servers adapted to identify descriptions of relationships between related entities based on one or more candidate commonalities by: identifying one of a plurality of categories of commonalities as providing a greater relative contribution to a relatedness of the first entity and the second entity based on a weighted similarity score for each of the plurality of categories of commonalities, wherein the weighted similarity score for each category of commonalities is based on a combination of a similarity score calculated using commonalities in the category of commonalities and a weighting corresponding to a predetermined level of contribution of the category of commonalities to the relatedness of related entities; eliminating a subset of the commonalities from the plurality of commonalities as candidate commonalities based on an insufficient level of uniqueness of the commonalities in the subset of commonalities; selecting one or more of the commonalities that remain after eliminating a subset of the commonalities as indicative of a basis for a relationship between the first entity and the second entity; and identifying a description of the relationship between the first entity and the second entity based on the one or more selected commonalities.
20. The system of claim 19 wherein the one or more related entity identification servers adapted to identify related entities by calculating a Jaccard index for the first entity and the second entity.
21. The system of claim 19 further comprising one or more initialization servers adapted to: receive ratings indicating a level of relatedness of a sample set of entities; calculate a similarity score for each pair of entities in the sample set of entities, wherein each similarity score corresponds to a different one of the plurality of categories of commonalities and the similarity score is calculated using commonalities for the pair of entities in the corresponding category of commonalities; and perform a linear regression analysis to calculate the weighting for each category of commonalities based on the received ratings and the calculated similarity scores.
22. The system of claim 19 wherein the one or more processing servers are further adapted to obtain a user-provided search query that identifies the particular entity; and responsive to the user-provided search query: present, to a user, information identifying (i) a related entity in the one or more related entities and (ii) a description of a relationship between the related entity and the particular entity.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
March 14, 2013
August 25, 2015
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.