Hybrid System for Named Entity Resolution

PublishedFebruary 12, 2013

Assigneenot available in USPTO data we have

InventorsCaroline Brun Maud Ehrmann Guillaume Jacquet

Technical Abstract

Patent Claims

21 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method for named entity resolution comprising: providing a stored global distribution space comprising triples, each triple having the form w 1 .R.w 2 , where w 1 and w 2 are lexical units, and R is a syntactic relation between the lexical units w 1 and w 2 , at least some of the lexical units being named entities; with a computing device, parsing an input text string to identify a context in which an identified named entity of the input text string is used, the context including a lexical unit which is in an identified syntactic relation with the identified named entity; comparing the identified context with a plurality of stored contexts, each stored context comprising a respective lexical unit which is in an identified syntactic relation with another named entity and in which the other named entity in the stored context is associated with a class of named entity, the named entity class being selected from a plurality of classes, at least one of the plurality of classes corresponding to a metonymic use of a respective other named entity, the comparing comprising: from the stored global distribution space, identifying a sub-space comprising a subset of the triples, each of the triples in the subset comprising a stored context w 1 .R or R.w 2 and a lexical unit w 2 or w 1 , respectively, in which the stored context w 1 .R or R.w 2 is one that is also found in a triple with the identified named entity in the global distribution space and the lexical unit for at least some of the triples in the subspace comprises another named entity; for each of the plurality of stored contexts, computing a distance between the identified context and the stored context, the distance being computed as a function of a difference between a frequency of occurrence, in a distribution space derived from a training corpus, of the identified named entity in the identified context and a frequency of occurrence, in the distribution space, of the other named entity in the stored context; computing a score for each of the plurality of named entity classes based on the computed distances; and assigning a named entity class from the plurality of named entity classes to the identified named entity based on at least one of the identified context and the scores.

2. A method for named entity resolution comprising: with a computing device, parsing an input text string to identify a context in which an identified named entity of the input text string is used; comparing the identified context with a plurality of stored contexts in which the named entity in the stored context is associated with a class of named entity, the named entity class being selected from a plurality of classes, at least one of the plurality of classes corresponding to a metonymic use of a respective named entity; and assigning a named entity class from the plurality of named entity classes to the identified named entity based on at least one of the identified context and the comparison, wherein the comparing comprises: from a stored global distribution space comprising triples, each triple comprising a lexical unit and a context in which the lexical unit is found in a training corpus, each triple having the form w 1 .R.w 2 , where w 1 and w 2 are lexical units, and R is a syntactic relation between the lexical units w 1 and w 2 , at least some of the lexical units being named entities, identifying a sub-space comprising a subset of the triples, each of the triples in the subset comprising a stored context w 1 .R or R.w 2 and a lexical unit w 2 or w 1 , respectively, the stored context w 1 .R or R.w 2 being one that is also found in a triple with the identified named entity in the global distribution space; for triples in the sub-space, determining whether the named entity in the stored context is associated with a class of named entity selected from the plurality of classes, and if so, assigning the class to the named entity; for each of the plurality of stored contexts, computing a distance between the identified context and the stored context, the computing of the distance being computed for stored contexts in the sub-space; computing a score for each of the plurality of named entity classes based on the computed distances; and assigning one of the named entity classes to the identified named entity, based on the computed scores.

3. The method of claim 1 , wherein the parsing includes applying a set of dependency rules to the input text string, each of the dependency rules specifying a syntactic relation between a first lexical unit based on a named entity and a second lexical unit, the rule being satisfied when the relation is present in the input text string.

4. The method of claim 3 , wherein the context is based on the syntactic relation and the named entity.

5. The method of claim 1 , wherein the class is selected from a set of classes including a literal class and at least one metonymic class.

6. The method of claim 5 , wherein the at least one metonymic class comprises at least one location-specific metonymic class and at least one organization-specific metonymic class.

7. The method of claim 6 , wherein the at least one organization-specific metonymic class comprises at least two organization-specific classes selected from the group consisting of: a class in which an organization name stands for its members; a class in which the organization name refers to an event associated with the organization; a class in which the organization name refers to its products; a class in which the organization name stands for the facility that houses the organization; a class in which the organization name is used as an index indicating its value; a class in which the name is used as a string; and a class in which the organization name refers to a representation; the organization-specific classes optionally further including an additional class for all other types of organization-specific metonymy not otherwise covered.

8. The method of claim 6 , wherein the at least one location-specific metonymic class comprises at least two location-specific classes selected from the group consisting of: a class in which a location name stands for persons or an organization associated with it; a class in which the location name stands for an event that happened there; a class in which the location name stands for a product developed there; a class in which the location name is used as a reference to another name; a class in which the location name refers to a representation; the location-specific classes optionally further including an additional class for all other types of location-specific metonymy not otherwise covered.

9. The method of claim 1 , wherein the parsing includes assigning a preliminary class to the named entity selected from a set of preliminary classes including a literal class, at least one metonymic class, and an unknown class.

10. The method of claim 9 , wherein when the preliminary class assigned is an unknown class, the assigning of the named entity class from the plurality of named entity classes is based on the comparison.

11. The method of claim 1 , further comprising annotating a document in which the text string occurs in accordance with the assigned class.

12. A computer program product comprising a non-transitory recording medium encoding instructions which, when executed on a computer, perform the method of claim 1 .

13. A hybrid system for named entity resolution comprising: memory which stores: a symbolic component for identifying a context in which an identified named entity of an input text string is used; a data structure which stores a subset of triples identified from a global distribution space comprising triples, each triple in the global distribution space having the form w 1 .R.w 2 where w 1 and w 2 are lexical units, and R is a syntactic relation between the lexical units w 1 and w 2 , at least some of the lexical units being named entities, each of the triples in the subset comprising a stored context w 1 .R or R.w 2 and a respective lexical unit w 2 or w 1 , in which the stored context w 1 .R or R.w 2 is one that is also found in a triple with the identified named entity in the global distribution space and the lexical unit for at least some of the triples in the subset of triples comprises another named entity; a distribution component for computing a distance between the identified context in which the named entity is being used and another context in which the named entity is used in a known metonymic sense, the distance being computed as a function of a difference between a frequency of occurrence, in the distribution space, of the identified named entity in the identified context and a frequency of occurrence, in the distribution space, of another named entity in the stored context; and a processor which implements the symbolic component and distribution component; the system assigning a class to the identified named entity, based on at least one of the identified context and the computed distance.

14. The system of claim 13 , wherein the symbolic component comprises a parser which applies a set of dependency rules to the input text string, each of the dependency rules specifying a syntactic relation between a first lexical unit based on a named entity and a second lexical unit, the rule being satisfied when the relation is present in the input text string.

15. The system of claim 14 , wherein the context is based on the relation and the named entity.

16. The system of claim 13 , wherein the class is selected from a set of classes including a literal class and at least one metonymic class.

17. The system of claim 16 , wherein the at least one metonymic class comprises at least one location-specific metonymic class and at least one organization-specific metonymic class.

18. The system of claim 13 , wherein the symbolic component assigns a class to the named entity selected from a set of classes including a literal class, at least one metonymic class, and an unknown class.

19. The system of claim 18 , wherein when the class assigned by the symbolic component is an unknown class, the class assigned by the system to the identified named entity is based on the computed distance.

20. The system of claim 13 , wherein the distribution component assigns a score to the named entity based on the computes the distance between the context in which the named entity is being used and another context in which the named entity is used in a known metonymic sense.

21. A method for document annotation comprising: providing a stored global distribution space comprising triples, each triple having the form w 1 .R.w 2 , where w 1 and w 2 are lexical units, and R is a syntactic relation between the lexical units w 1 and w 2 , at least some of the lexical units being named entities; inputting, to a computer system, a document comprising at least one text string; with a processor of the computer system: parsing the text string to identify a context in which an identified named entity of the text string is used; comparing the identified context with at least one stored context in which the named entity in the stored context is associated with a class of named entity, the named entity class being selected from a plurality of classes, at least one of the plurality of classes corresponding to a metonymic use of a respective named entity, the comparing comprising: from the stored global distribution space, identifying a sub-space comprising a subset of the triples, each of the triples in the subset comprising a stored context w 1 .R or R.w 2 and a lexical unit w 2 or w 1 , respectively, in which the stored context w 1 .R or R.w 2 is one that is also found in a triple with the identified named entity in the global distribution space and the lexical unit for at least some of the triples in the subspace comprises another named entity; for each of the plurality of stored contexts, computing a distance between the identified context and the stored context, the computing of the distance being computed for stored contexts in the sub-space; and assigning a named entity class from the plurality of named entity classes to the identified named entity based on at least one of the identified context and the computed distances; and annotating the document based on the assigned class.

Patent Metadata

Filing Date

Unknown

Publication Date

February 12, 2013

Inventors

Caroline Brun

Maud Ehrmann

Guillaume Jacquet

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search