{"schema_version":"1.0","canonical_url":"https://patentable.app/patents/US-10540444","patent":{"patent_number":"US-10540444","title":"Text mining a dataset of electronic documents to discover terms of interest","assignee":null,"inventors":[],"filing_date":"2017-06-20T00:00:00.000Z","publication_date":"2020-01-21T00:00:00.000Z","cpc_codes":["G06F","G06F","G06F","G06F","G06F","G06N","G06N","G06N"],"num_claims":27,"abstract":"A method is provided for analyzing and interpreting a dataset composed of electronic documents including free-form text. The method includes text mining the documents for terms of interest, including receiving a set of seed nouns as input to an iterative process an iteration of which includes searching for multiword terms having seed nouns as their head words, at least some of which define a training set of a machine learning algorithm used to identify additional multiword terms at least some of which have nouns outside the set of seed nouns as their head words. The iteration also includes adding the nouns outside the set of seed nouns to the set and thereby identifying a new set of seed nouns for a next iteration. The method includes unifying terms of interest to produce normalized terms of interest for application to generate features of the documents for data analytics performed thereon."},"analysis":{"summary":null,"layman_explanation":null,"technical_analysis":null,"business_analysis":null,"faqs":null,"topics":[],"tech_cluster":null},"seo":{"title":"Text mining a dataset of electronic documents to discover terms of interest","description":"A method is provided for analyzing and interpreting a dataset composed of electronic documents including free-form text. The method includes text mining the documents for terms of interest, including ","keywords":[]},"attribution":{"source":"Patentable","source_url":"https://patentable.app","canonical_url":"https://patentable.app/patents/US-10540444","license":"CC-BY-4.0-like","license_terms":"AI-generated analysis on this page (summary, layman_explanation, technical_analysis, business_analysis, faqs) may be reused with attribution and a visible link back to the canonical URL above. Patent abstracts, claims, and bibliographic data are USPTO public domain.","required_link":"https://patentable.app/patents/US-10540444","citation_suggestion":"Patentable. \"Text mining a dataset of electronic documents to discover terms of interest\" (US-10540444). https://patentable.app/patents/US-10540444","copyright_holder":"Nomic Interactive Technology LLC"},"links":{"html":"https://patentable.app/patents/US-10540444","json":"https://patentable.app/api/llm-context/US-10540444","site":"https://patentable.app","llms_txt":"https://patentable.app/llms.txt"},"generated_at":"2026-05-30T13:23:39.404Z"}