{"schema_version":"1.0","canonical_url":"https://patentable.app/patents/US-8543402","patent":{"patent_number":"US-8543402","title":"Speaker segmentation in noisy conversational speech","assignee":null,"inventors":[],"filing_date":"2011-04-29T00:00:00.000Z","publication_date":"2013-09-24T00:00:00.000Z","cpc_codes":["G10L","G10L"],"num_claims":6,"abstract":"System and methods for robust multiple speaker segmentation in noisy conversational speech are presented. Robust voice activity detection is applied to detect temporal speech events. In order to get robust speech features and detect speech events in a noisy environment, a noise reduction algorithm is applied, using noise tracking. After noise reduction and voice activity detection, the incoming audio/speech is initially labeled as speech segments or silence segments. With no prior knowledge of the number of speakers, the system identifies one reliable speech segment near the beginning of the conversational speech and extracts speech features with a short latency, then learns a statistical model from the selected speech segment. This initial statistical model is used to identify the succeeding speech segments in a conversation. The statistical model is also continuously adapted and expanded with newly identified speech segments that match well to the model. The speech segments with low likelihoods are labeled with a second speaker ID, and a statistical model is learned from them. At the same time, these two trained speaker models are also updated/adapted once a reliable speech segment is identified. If a speech segment does not match well to the two speaker models, the speech segment is temporarily labeled as an outlier or as originating from a third speaker. This procedure is then applied recursively as needed when there are more than two speakers in a conversation."},"analysis":{"summary":null,"layman_explanation":null,"technical_analysis":null,"business_analysis":null,"faqs":null,"topics":[],"tech_cluster":null},"seo":{"title":"Speaker segmentation in noisy conversational speech","description":"System and methods for robust multiple speaker segmentation in noisy conversational speech are presented. Robust voice activity detection is applied to detect temporal speech events. In order to get r","keywords":[]},"attribution":{"source":"Patentable","source_url":"https://patentable.app","canonical_url":"https://patentable.app/patents/US-8543402","license":"CC-BY-4.0-like","license_terms":"AI-generated analysis on this page (summary, layman_explanation, technical_analysis, business_analysis, faqs) may be reused with attribution and a visible link back to the canonical URL above. Patent abstracts, claims, and bibliographic data are USPTO public domain.","required_link":"https://patentable.app/patents/US-8543402","citation_suggestion":"Patentable. \"Speaker segmentation in noisy conversational speech\" (US-8543402). https://patentable.app/patents/US-8543402","copyright_holder":"Nomic Interactive Technology LLC"},"links":{"html":"https://patentable.app/patents/US-8543402","json":"https://patentable.app/api/llm-context/US-8543402","site":"https://patentable.app","llms_txt":"https://patentable.app/llms.txt"},"generated_at":"2026-05-31T07:44:41.267Z"}