{"schema_version":"1.0","canonical_url":"https://patentable.app/patents/US-11947909","patent":{"patent_number":"US-11947909","title":"Training a language detection model for language autodetection from non-character sub-token signals","assignee":null,"inventors":[],"filing_date":"2023-04-17T00:00:00.000Z","publication_date":"2024-04-02T00:00:00.000Z","cpc_codes":["G06F"],"num_claims":20,"abstract":"In non-limiting examples of the present disclosure, systems, methods and devices for determining a language of a text string are presented. A language detection model may be maintained. The language detection model may comprise identities and weights for initial and final consonants, identities and weights for prefixes and suffixes, and identities and weights for vowel sequences, where each identity is derived from a training corpus. The weights may correspond to a frequency of a text unit in the corpus. A text string may be received and a match score between the text string and the language of the language detection model may be determined. The match score may be based on initial and final consonant scores, prefix and suffix scores, and/or vowel sequence scores for each word in the text string. If the match score meets a threshold value a follow-up action associated with the language may be performed."},"analysis":{"summary":null,"layman_explanation":null,"technical_analysis":null,"business_analysis":null,"faqs":null,"topics":[],"tech_cluster":null},"seo":{"title":"Training a language detection model for language autodetection from non-character sub-token signals","description":"In non-limiting examples of the present disclosure, systems, methods and devices for determining a language of a text string are presented. A language detection model may be maintained. The language d","keywords":[]},"attribution":{"source":"Patentable","source_url":"https://patentable.app","canonical_url":"https://patentable.app/patents/US-11947909","license":"CC-BY-4.0-like","license_terms":"AI-generated analysis on this page (summary, layman_explanation, technical_analysis, business_analysis, faqs) may be reused with attribution and a visible link back to the canonical URL above. Patent abstracts, claims, and bibliographic data are USPTO public domain.","required_link":"https://patentable.app/patents/US-11947909","citation_suggestion":"Patentable. \"Training a language detection model for language autodetection from non-character sub-token signals\" (US-11947909). https://patentable.app/patents/US-11947909","copyright_holder":"Nomic Interactive Technology LLC"},"links":{"html":"https://patentable.app/patents/US-11947909","json":"https://patentable.app/api/llm-context/US-11947909","site":"https://patentable.app","llms_txt":"https://patentable.app/llms.txt"},"generated_at":"2026-05-31T01:32:00.212Z"}