9396183

System and Method for Building Diverse Language Models

PublishedJuly 19, 2016
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A method comprising: establishing a website visitation policy according to a previous crawling cycle and vocabulary gaps in a language model, wherein the website visitation policy identifies, according to a pattern of links, a likelihood of web pages to have information capable of filling the vocabulary gaps, and wherein the website visitation policy comprises a crawling schedule according to perplexity of the web pages with respect to the language model; crawling, via a processor, the web-pages according to the crawling schedule, to yield new vocabulary words; and generating a diverse language model according to the language model and the new vocabulary words.

2

2. The method of claim 1 , further comprising recognizing received speech with the diverse language model.

3

3. The method of claim 1 , wherein the diverse language model is generated by modifying the language model.

4

4. The method of claim 1 , wherein the likelihood of the web pages is further according to an information theoretic measure.

5

5. The method of claim 4 , wherein the web pages have high perplexity values over the language model from a previous cycle.

6

6. The method of claim 1 , further comprising updating the website visitation policy for the crawling once a specified number of pages is crawled.

7

7. The method of claim 6 , wherein updating the website visitation policy is according to an expected perplexity value of novelty regions.

8

8. The method of claim 7 , wherein the expected perplexity value of the novelty regions is determined by evaluating links to the web page.

9

9. The method of claim 1 , further comprising merging a set of language models.

10

10. A system comprising: a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform instructions comprising: establishing a website visitation policy according to a previous crawling cycle and vocabulary gaps in a language model, wherein the website visitation policy identifies, according to a pattern of links, a likelihood of web pages to have information capable of filling the vocabulary gaps, and wherein the website visitation policy comprises a crawling schedule according to perplexity of the web pages with respect to the language model; crawling the web-pages according to the crawling schedule, to yield new vocabulary words; and generating a diverse language model according to the language model and the new vocabulary words.

11

11. The system of claim 10 , the computer-readable storage medium having additional instructions stored which, when executed by the processor, cause the processor to perform operations comprising recognizing received speech with the diverse language model.

12

12. The system of claim 10 , wherein the diverse language model is generated by modifying the language model.

13

13. The system of claim 10 , wherein the likelihood of the web pages is further according to an information theoretic measure.

14

14. The system of claim 13 , wherein the web pages have high perplexity values over the language model from a previous cycle.

15

15. The system of claim 10 , the computer-readable storage medium having additional instructions stored which, when executed by the processor, cause the processor to perform operations comprising updating the website visitation policy for the crawling once a specified number of pages is crawled.

16

16. The system of claim 15 , wherein updating the website visitation policy is according to an expected perplexity value of novelty regions.

17

17. The system of claim 16 , wherein the expected perplexity value of the novelty regions is determined by evaluating links to the web page.

18

18. The system of claim 10 , the computer-readable storage medium having additional instructions stored which, when executed by the processor, cause the processor to perform operations comprising merging a set of language models.

19

19. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising: establishing a website visitation policy according to a previous crawling cycle and vocabulary gaps in a language model, wherein the website visitation policy identifies, according to a pattern of links, a likelihood of web pages to have information capable of filling the vocabulary gaps, and wherein the website visitation policy comprises a crawling schedule according to perplexity of the web pages with respect to the language model; crawling the web-pages according to the crawling schedule, to yield new vocabulary words; and generating a diverse language model according to the language model and the new vocabulary words.

20

20. The computer-readable storage device of claim 19 , having additional instructions stored which, when executed by the computing device, cause the computing device to perform operations comprising recognizing received speech with the diverse language model.

Patent Metadata

Filing Date

Unknown

Publication Date

July 19, 2016

Inventors

Luciano De Andrade BARBOSA
Srinivas BANGALORE

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “SYSTEM AND METHOD FOR BUILDING DIVERSE LANGUAGE MODELS” (9396183). https://patentable.app/patents/9396183

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.