{"schema_version":"1.0","canonical_url":"https://patentable.app/patents/US-11520982","patent":{"patent_number":"US-11520982","title":"Generating corpus for training and validating machine learning model for natural language processing","assignee":null,"inventors":[],"filing_date":"2019-09-27T00:00:00.000Z","publication_date":"2022-12-06T00:00:00.000Z","cpc_codes":["G06F","G06F","G06N","G06N","G06N","G06F","G06N","G06N"],"num_claims":19,"abstract":"A method may include generating, based a context-free grammar, a sample forming a corpus. The context-free grammar may include production rules for replacing a first nonterminal symbol with a second nonterminal symbol and/or a terminal symbol. The sample may be generated by rewriting recursively a first text string to form a second text string associated with the sample. The first text string may be rewritten by applying the production rules to replace nonterminal symbols included in the first text string until no nonterminal symbols remain in the first text string. A machine learning model may be trained, based on the corpus, to process a natural language. Related methods and articles of manufacture are also disclosed."},"analysis":{"summary":null,"layman_explanation":null,"technical_analysis":null,"business_analysis":null,"faqs":null,"topics":[],"tech_cluster":null},"seo":{"title":"Generating corpus for training and validating machine learning model for natural language processing","description":"A method may include generating, based a context-free grammar, a sample forming a corpus. The context-free grammar may include production rules for replacing a first nonterminal symbol with a second n","keywords":[]},"attribution":{"source":"Patentable","source_url":"https://patentable.app","canonical_url":"https://patentable.app/patents/US-11520982","license":"CC-BY-4.0-like","license_terms":"AI-generated analysis on this page (summary, layman_explanation, technical_analysis, business_analysis, faqs) may be reused with attribution and a visible link back to the canonical URL above. Patent abstracts, claims, and bibliographic data are USPTO public domain.","required_link":"https://patentable.app/patents/US-11520982","citation_suggestion":"Patentable. \"Generating corpus for training and validating machine learning model for natural language processing\" (US-11520982). https://patentable.app/patents/US-11520982","copyright_holder":"Nomic Interactive Technology LLC"},"links":{"html":"https://patentable.app/patents/US-11520982","json":"https://patentable.app/api/llm-context/US-11520982","site":"https://patentable.app","llms_txt":"https://patentable.app/llms.txt"},"generated_at":"2026-05-31T07:00:00.146Z"}