Patentable/Patents/US-8812303
US-8812303

Multi-language relevance-based indexing and search

PublishedAugust 19, 2014
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

Indexing and querying in multiple languages is accomplished using an ordered chain of filters and/or other such components. When receiving information to be indexed or for a query, the information can be tokenized and typed based at least in part on the language of each token. The character types can be adjusted if appropriate for the languages, and the tokens can be further segmented using a dictionary for the respective language types. Once appropriate tokens are determined, relevant synonyms in each appropriate language can be determined and typed accordingly. If necessary the case of the tokens and synonyms can be adjusted and further segmented based on punctuation. The terms and synonyms then can be used as part of the index or as part of the search query to include other terms or phrases based on relevance to the original information.

Patent Claims
20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A computer-implemented method, comprising: under the control of one or more computer systems configured with executable instructions, segmenting information; determining a language type for the segmented information; searching a language dictionary for synonyms of the contents of each information segment in at least one language type; and storing the synonyms and contents of each information segment.

2

2. The computer-implemented method of claim 1 , further comprising: receiving the segmented information, the segmented information including a plurality of characters; and removing spaces between the plurality of characters.

3

3. The computer-implemented method of claim 2 , wherein the information is segmented based at least in part of prior positions of the spaces within the information.

4

4. The computer-implemented method of claim 1 , further comprising: converting any full space characters to half space characters before searching the language dictionary for the determined language type.

5

5. The computer-implemented method of claim 1 , further comprising: further segmenting the information into at least one additional segment when contents of the information segment correspond to separate entries in the language dictionary.

6

6. The computer-implemented method of claim 1 , further comprising: converting a character type for at least information segment.

7

7. The computer-implemented method of claim 6 , wherein the character type is one of a full shape type and a half shape type.

8

8. The computer-implemented method of claim 1 , wherein the information is one of information for a data object to be indexed and information corresponding to a received query.

9

9. The computer-implemented method of claim 1 , further comprising: converting contents of at least one token to one of a lowercase and an uppercase format.

10

10. A method according to claim 1 , further comprising: removing punctuation from at least one token after searching for synonyms; and segmenting the token into at least one additional token based on the removed punctuation.

11

11. The computer-implemented method of claim 1 , further comprising: determining whether to perform a one way or a two way test for synonyms for the segmented information.

12

12. The computer-implemented method of claim 1 , wherein steps of the method are performed by an ordered chain of filters.

13

13. A system, comprising: a processor; and a memory including instructions that, when executed by the processor, cause the processor to: segment information; determine a language type for the segmented information; search a language dictionary for synonyms of the contents of each information segment in at least one language type; and store the synonyms and contents of each information segment.

14

14. The system of claim 13 , wherein the instructions that, when executed by the processor, further cause the system to: convert a character type for each information segment.

15

15. The system of claim 13 , wherein the instructions that, when executed by the processor, further cause the computing device to: determine a language type for each synonym and each information segment.

16

16. The system of claim 13 , wherein the memory device further includes instructions that, when executed by the processor, cause the processor to: remove punctuation from each information segment; and further segment the information segment into at least one additional segment where punctuation is removed.

17

17. A non-transitory computer-readable storage medium storing instructions that, when executed by at least one processor, cause a computing device to: segment information; determine a language type for the segmented information; search a language dictionary for synonyms of the contents of each information segment in at least one language type; and store the synonyms and contents of each information segment.

18

18. The non-transitory computer-readable storage medium of claim 17 , wherein the instructions, when executed by the at least one processor, further cause the computing device to: further segment the information into at least one additional segment when contents of the information segment correspond to separate entries in the language dictionary.

19

19. The non-transitory computer-readable storage medium of claim 17 , wherein the instructions, when executed by the at least one processor, further cause the computing device to: determine a language type for each synonym and each information segment.

20

20. The non-transitory computer-readable storage medium of claim 17 , wherein the instructions, when executed by the at least one processor, further cause the computing device to: receive the information, the information including a plurality of characters; and remove spaces between the plurality of characters.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

January 17, 2013

Publication Date

August 19, 2014

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Multi-language relevance-based indexing and search” (US-8812303). https://patentable.app/patents/US-8812303

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.