{"schema_version":"1.0","canonical_url":"https://patentable.app/patents/US-11984127","patent":{"patent_number":"US-11984127","title":"Training and using a transcript generation model on a multi-speaker audio stream","assignee":null,"inventors":[],"filing_date":"2021-12-31T00:00:00.000Z","publication_date":"2024-05-14T00:00:00.000Z","cpc_codes":["G10L","G10L","G10L","G10L"],"num_claims":20,"abstract":"The disclosure herein describes using a transcript generation model for generating a transcript from a multi-speaker audio stream. Audio data including overlapping speech of a plurality of speakers is obtained and a set of frame embeddings are generated from audio data frames of the obtained audio data using an audio data encoder. A set of words and channel change (CC) symbols are generated from the set of frame embeddings using a transcript generation model. The CC symbols are included between pairs of adjacent words that are spoken by different people at the same time. The set of words and CC symbols are transformed into a plurality of transcript lines, wherein words of the set of words are sorted into transcript lines based on the CC symbols, and a multi-speaker transcript is generated based on the plurality of transcript lines. The inclusion of CC symbols by the model enables efficient, accurate multi-speaker transcription."},"analysis":{"summary":null,"layman_explanation":null,"technical_analysis":null,"business_analysis":null,"faqs":null,"topics":[],"tech_cluster":null},"seo":{"title":"Training and using a transcript generation model on a multi-speaker audio stream","description":"The disclosure herein describes using a transcript generation model for generating a transcript from a multi-speaker audio stream. Audio data including overlapping speech of a plurality of speakers is","keywords":[]},"attribution":{"source":"Patentable","source_url":"https://patentable.app","canonical_url":"https://patentable.app/patents/US-11984127","license":"CC-BY-4.0-like","license_terms":"AI-generated analysis on this page (summary, layman_explanation, technical_analysis, business_analysis, faqs) may be reused with attribution and a visible link back to the canonical URL above. Patent abstracts, claims, and bibliographic data are USPTO public domain.","required_link":"https://patentable.app/patents/US-11984127","citation_suggestion":"Patentable. \"Training and using a transcript generation model on a multi-speaker audio stream\" (US-11984127). https://patentable.app/patents/US-11984127","copyright_holder":"Nomic Interactive Technology LLC"},"links":{"html":"https://patentable.app/patents/US-11984127","json":"https://patentable.app/api/llm-context/US-11984127","site":"https://patentable.app","llms_txt":"https://patentable.app/llms.txt"},"generated_at":"2026-05-31T14:10:04.261Z"}