Method and Apparatus for Identifying Synonyms and Using Synonyms to Search

PublishedJanuary 19, 2016

Assigneenot available in USPTO data we have

InventorsJing Dong Fei Xing Ning Guo Lei Hou Qin Zhang

Technical Abstract

Patent Claims

16 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A computer implemented method for identifying synonyms, the method comprising: obtaining, by a server, a first word and a second word, each of the first word and the second word including at least one term; determining that a shortest edit distance between the first word and the second word is less than or equal to an edit distance threshold; determining whether both of the first word and the second word exist in a preset knowledge database; in response to determining at least the first word does not exist in the preset knowledge database, segmenting the first word to obtain one or more terms included in the first word; determining whether the one or more terms after segmentation exist in the preset knowledge database; and searching, in response to determining that the one or more terms after segmentation exist in the preset knowledge database, a smallest granularity type with a highest weight value for each of the one or more terms in the preset knowledge database; finding, in response to determining that both of the first word and the second word exist in the preset knowledge database, the smallest granularity type with the highest weight value for each of the first word and the second word in the preset knowledge database; and determining whether the first word and second word have a same smallest granularity type with a highest weight value including, determining that the first word and the second word are synonyms, in response to determining that the first word and the second word have the same smallest granularity type with the highest weight value; and determining that the two words are non-synonyms, in response to determining that the first word and the second word do not have the same smallest granularity type with the highest weight value.

2. The method as recited in claim 1 , wherein the obtaining the first word and the second word, each of the first word and the second word including at least one term, comprises: determining a threshold of a number of terms included in the first word and the second word; and determining that the number of terms in each of the first word and the second word is smaller than the threshold.

3. The method as recited in claim 1 , wherein the obtaining the first word and the second word, each of the first word and the second word including at least one term, comprises: determining a threshold of an appearance frequency of the first word and the second word; and determining that the appearance frequency in each of the first word and the second word is higher than the threshold.

4. The method as recited in claim 1 , wherein the first word and the second word are from a query log of a search engine.

5. The method as recited in claim 4 , wherein the obtaining the first word and the second word, each of the first word and the second word including at least one term, comprises: obtaining the query log of the search engine; determining a threshold of a ranking of queries in the query log; selecting a plurality of queries with rankings higher than the threshold; and obtaining the first word and the second word from the plurality of queries.

6. The method as recited in claim 1 , further comprising: determining that the shortest edit distance between the first word and the second word is higher than the edit distance threshold; and determining that the first word and the second word are non-synonyms.

7. The method as recited in claim 1 , wherein the preset knowledge database comprises: one or more terms and concepts, each term or concept corresponding to at least one type, each type corresponding to the term or concept having a respective weight value.

8. The method as recited in claim 7 , wherein the finding the smallest granularity type with the highest weight value for each word in the preset knowledge database comprises: searching a term or a concept corresponding to each of the first word and the second word in the preset knowledge database; and finding the smallest granularity type with the highest weight value for each of the first word and the second word according to the at least one type corresponding to the term or concept and relevant weight value.

9. The method as recited in claim 1 , further comprising: in response to determining that the first word and the second word are synonyms, saving such identified synonyms in a synonym database.

10. The method as recited in claim 9 , further comprising: receiving, by a search engine, a query request from a user, the query request including a query term to be searched; searching, by the search engine, the query term in the synonym database to find a synonym of the query term; conducting, by the search engine, a search by using the query term and the synonym of the query term; and returning, by the search engine, a result including both the query term and the synonym of the query term to the user.

11. An apparatus for identifying synonyms, the apparatus comprising: a processor; a memory device communicatively coupled with the processor; and a server storing: a retrieval unit that obtains a first word and a second word, each of the first word and the second word including at least one term; a first determination unit that determines that a shortest edit distance between the first word and the second word is less than or equal to an edit distance threshold; a second determination unit that determines whether both of the first word and the second word exist in a preset knowledge database; a query unit that finds a smallest granularity type with a highest weight value for each of the first word and the second word in the preset knowledge database, in response to determining that both of the first word and the second word exist in the preset knowledge database; a segmentation unit that segments the first word to obtain one or more terms included in the first word and informs the second determination unit; wherein the second determination unit further determines if all of the one or more terms after segmentation exist in the preset knowledge database, informs the query unit; and determines if not all of the one or more terms after segmentation exist in the preset knowledge database, informs the segmentation unit, in response to determining that at least the first word does not exist in the preset knowledge database; and a third determination unit that determines that the first word and the second word are synonyms when the first word and the second word have a same smallest granularity type with a highest weight value, and that the first word and the second word are non-synonyms when the first word and the second word do not have the same smallest granularity type with the highest weight value.

12. The apparatus as recited in claim 11 , wherein the preset knowledge database comprises: one or more terms and concepts, each term or concept corresponding to at least one type, each type corresponding to the term or concept having a weight value.

13. The apparatus as recited in claim 11 , wherein the apparatus is a server or a search engine.

14. The apparatus as recited in claim 11 the apparatus further comprising: a retrieval unit that receives a query request from a user, the query request including a term to be searched; a synonym searching unit that finds a synonym of the term by searching the term in a synonym database; a search unit that conducts a search by using the term and the synonym of the term; and a return unit that returns a search result to the user.

15. One or more non-transitory computer-readable storage media having stored thereon computer executable units that are executable to perform actions comprising: obtaining a query log of a search engine; determining a threshold of a ranking of queries in the query log; selecting a plurality of queries with rankings higher than the threshold; obtaining a first word and a second word from the plurality of queries; determining that a shortest edit distance between the first word and the second word is less than or equal to an edit distance threshold; determining whether both of the first word and the second word exist in a preset knowledge database; in response to determining at least the first word does not exist in the preset knowledge database, segmenting the first word to obtain one or more terms; and determining whether the one or more terms after segmentation exist in the preset knowledge database, in response to determining that the one or more terms after segmentation exist in the preset knowledge database, searching a smallest granularity type with a highest weight value for each of the one or more terms in the preset knowledge database; in response to determining that both of the first word and the second word exist in the preset knowledge database, finding the smallest granularity type with the highest weight value for each of the first word and the second word in the preset knowledge database; determining whether the two words have a same smallest granularity type with a highest weight value; in response to determining that the first word and the second word have the same smallest granularity type with the highest weight value, determining that the first word and the second word are synonyms; and in response to determining that the first word and the second word do not have the same smallest granularity type with the highest weight value, determining that the two words are non-synonyms.

16. The one or more computer-readable storage media as recited in claim 15 , further comprising after determining that the first word and the second word have the same smallest granularity type with the highest weight value, determining whether a term in the first word or the second word respectively is changeable without changing a meaning of the first word or the second word respectively, in response to determining that the term in the first word or the second word is changeable, further determining that the first word and the second word are synonyms; and in response to determining that the term in the first word or the second word is not changeable, further determining that the first word and the second word are non-synonyms.

Patent Metadata

Filing Date

Unknown

Publication Date

January 19, 2016

Inventors

Jing Dong

Fei Xing

Ning Guo

Lei Hou

Qin Zhang

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search