Some embodiments provide a program that identifies an entity having an entity attribute. The program receives, from each method of several methods, a set of candidate identity attributes that are each for identifying a particular entity having the entity attribute specified in the document. Each method of the several methods generates the corresponding set of candidate identity attributes based on the entity attribute specified in a document. The program calculates a score for each candidate identity attribute in the sets of candidate identity attributes. The program identifies, based on the sets of scores, an identity attribute from the sets of candidate identity attributes that identifies the entity having the entity attribute specified in the document.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A method for identifying an entity having an entity attribute in a document, the method comprising: receiving, from each process of a plurality of processes, a corresponding set of candidate identity attributes that are each for identifying a particular entity having said entity attribute specified in the document, wherein each process of the plurality of processes generates the corresponding set of candidate identity attributes based on the entity attribute specified in the document; calculating a score for each candidate identity attribute in the sets of candidate identity attributes, wherein calculating the score for a particular candidate identity attribute comprising (1) identifying a set of tokens in the particular candidate identity attribute, (2) assigning a value to each token in the sets of tokens based on a token count that represents a number of instances of the token across the sets of candidate identity attributes and (3) calculating the score based on the assigned values; and identifying, based on the sets of scores calculated to the candidate identity attributes, an identity attribute from the sets of candidate identity attributes that identifies the entity having said entity attribute specified in the document.
2. The method of claim 1 , wherein the entity is a name of a person.
3. The method of claim 1 , wherein the entity attribute is a name of the person.
4. The method of claim 3 , wherein the identity attribute is a title designated to the person.
5. The method of claim 3 , wherein the identity attribute is a company with which the person is affiliated.
6. The method of claim 1 , wherein the set of candidate identity attributes is a first set of candidate identity attributes, wherein the score is a first core, wherein the identity attribute is a first identity attribute, the method further comprising: receiving, from each process of the plurality of processes, a second set of candidate identity attributes that are each for identifying a particular entity having said entity attribute specified in the document, wherein each process of the plurality of processes generates the corresponding set of candidate identity attributes based on the entity attribute specified in the document; calculating, for each second set of candidate identity attributes, a second score for each candidate identity attribute in the second set of candidate identity attributes; and identifying, based on the second scores calculated for the second set of candidate identity attributes, a second identity attribute from the second sets of candidate identity attributes that identifies the entity having said entity attribute specified in the document.
7. The method of claim 1 , wherein each process in the plurality of processes further generates a relevancy score for each candidate identity attribute, the relevancy score representing a degree of correctness that the particular entity identified by the candidate identity attribute is the entity.
8. The method of claim 7 , wherein calculating the score for the particular candidate identity attribute further comprises calculating the score based on the particular candidate identity attribute's relevancy score.
9. The method of claim 7 , wherein calculating the score for the particular candidate identity attribute further comprises calculating the score based on a normalization factor for converting the relevancy score to a particular range of values.
10. The method of claim 1 , wherein calculating the score for the particular candidate identity attribute further comprises calculating the score based on a confidence factor that represents a probability that the particular candidate identity attribute correctly identifies the entity having said entity attribute specified in the document.
11. The method of claim 1 , wherein the process in the plurality of processes is a first process, wherein a second process in the plurality of processes generates a set of candidate identity attributes by a query to an entity database.
12. The method of claim 1 , wherein the process in the plurality of processes is a first process, wherein a second process in the plurality of processes comprises a service that generates a set of candidate identity attributes by performing lexical analysis on the document.
13. The method of claim 2 , wherein calculating the relevance scores comprises: processing the plurality of candidate identity attribute sets based on the first candidate identity attribute of each candidate identity attribute set to identify a subset of the plurality of candidate identity attribute sets; and processing only the subset of the plurality of candidate identity attribute sets based on the second candidate identity attribute of each candidate identity attribute set.
14. A method for identifying a set of identity attributes for determining the identity of an entity, the method comprising: identifying a particular entity that occurs more often than other entities in a set of documents; identifying a plurality of candidate identity attribute sets by analyzing the particular entity and at least one document in the set of documents using a plurality of different processes that each identifies (i) a set of candidate identities corresponding to the particular entity and (ii) a candidate identity attribute set for each identified candidate identity, wherein at least one of the different processes analyzes a stored plurality of identities to identify candidate identities having the particular entity and that are related to an entity to which the at least one document is also related; for each candidate identity attribute set of the plurality of candidate identity attribute sets, calculating a relevance score for each candidate identity attribute in the set that measures a level of correspondence between the particular entity and the candidate identity attribute; and identifying, based on the relevance scores calculated for the candidate identity attributes of the different candidate identity attribute sets, a particular candidate identity attribute set for a particular identity that corresponds to the particular entity.
15. The method of claim 14 , wherein identifying the plurality of candidate identity attribute sets comprises identifying a candidate identity attribute set based on a lexical analysis of the at least one document.
16. The method of claim 14 further comprising calculating a normalization factor that converts a particular relevance score calculated for a particular candida e identity attribute to a particular range of values.
17. The method of claim 14 , wherein calculating a particular relevance score for a particular candidate identity attribute for a particular candidate identity comprises calculating a confidence factor that represents a probability that the particular candidate identity correctly identifies the entity referred to by the particular entity in the set of documents.
18. The method of claim 14 , wherein each candidate identity attribute set comprises a first candidate identity attribute of a first type and a second candidate identity attribute of a second type.
19. The method of claim 18 , wherein the particular entity is an entity of a person, the first type is a title of the person, and the second type is a company with which the person is affiliated.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
May 4, 2018
March 31, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.