Patentable/Patents/US-11495234
US-11495234

Data mining apparatus, method and system for speech recognition using the same

PublishedNovember 8, 2022
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

A data mining device, and a speech recognition method and system using the same are disclosed. The speech recognition method includes selecting speech data including a dialect from speech data, analyzing and refining the speech data including a dialect, and learning an acoustic model and a language model through an artificial intelligence (AI) algorithm using the refined speech data including a dialect. The user is able to use a dialect speech recognition service which is improved using services such as eMBB, URLLC, or mMTC of 5G mobile communications.

Patent Claims
12 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 2

Original Legal Text

2. The speech recognition method of claim 1, further comprising collecting speech data.

Plain English translation pending...
Claim 3

Original Legal Text

3. The speech recognition method of claim 2, wherein, in the collecting speech data, speech data of users is collected through the users in a region in which different types of dialects are used, and through various types of speech recognition service domains.

Plain English translation pending...
Claim 5

Original Legal Text

5. The speech recognition method of claim 1, wherein, in the extracting features from the speech data including the dialect, at least one among pronunciation string features, lexical features, domain features, and frequency features of dialect speech is extracted.

Plain English translation pending...
Claim 6

Original Legal Text

6. The speech recognition method of claim 5, wherein the domain features comprise information on a type of an electronic apparatus providing a speech recognition service for the user, information on a region in which the electronic apparatus is located, and information on an age group of the user of the electronic apparatus.

Plain English translation pending...
Claim 7

Original Legal Text

7. The speech recognition method of claim 1, wherein, in the performing similar dialect word clustering, a degree of similarity between features is measured through weight calculation between features according to an unsupervised learning method, and a dialect word having a high degree of similarity to a threshold is clustered.

Plain English translation pending...
Claim 8

Original Legal Text

8. The speech recognition method of claim 1, wherein, in the extracting the core dialect word from the similar dialect word cluster, n number of objects having the highest frequency features in a cluster are extracted, and a core object is extracted through a feature similarity calculation with other objects in the cluster.

Plain English translation pending...
Claim 9

Original Legal Text

9. The speech recognition method of claim 1, wherein, in the standardizing the dialect corpus, an existing dialect word is replaced with a core object dialect word, and verification is performed through a similarity measurement between an original dialect sentence and a replaced sentence.

Plain English Translation

This invention relates to speech recognition, specifically improving accuracy for dialectal speech. The problem addressed is the difficulty in recognizing non-standard dialects due to variations in vocabulary and pronunciation compared to standard language models. The solution involves standardizing dialectal speech by replacing dialect-specific words with core object dialect words, which are standardized equivalents. The method includes verifying the accuracy of this replacement by measuring the similarity between the original dialect sentence and the modified sentence. This ensures that the core object dialect words retain the meaning and context of the original dialect while being compatible with standard speech recognition systems. The standardization process helps bridge the gap between dialectal speech and standard language models, improving recognition accuracy for speakers with strong regional or non-standard dialects. The similarity measurement step ensures that the replacement does not distort the original meaning, maintaining the integrity of the input speech. This approach is particularly useful in applications requiring high accuracy for diverse dialects, such as customer service systems, transcription services, and voice assistants.

Claim 11

Original Legal Text

11. The data mining device of claim 10, wherein the one or more processors are further configured to extract at least one among pronunciation string features, lexical features, domain features, and frequency features of dialect speech.

Plain English translation pending...
Claim 12

Original Legal Text

12. The data mining device of claim 11, wherein the domain features comprise information on a type of an electronic apparatus providing a speech recognition service for the user, information on a region in which the electronic apparatus is located, and information on an age group of the user of the electronic apparatus.

Plain English translation pending...
Claim 13

Original Legal Text

13. The data mining device of claim 10, wherein the one or more processors are further configured to measure a degree of similarity between features through weight calculation between features according to an unsupervised learning method, and cluster a dialect word having a high degree of similarity to a threshold.

Plain English translation pending...
Claim 14

Original Legal Text

14. The data mining device of claim 10, wherein the one or more processors are further configured to extract n number of objects having the highest frequency features in a cluster, and extract a core object through a feature similarity calculation with other objects in the cluster.

Plain English translation pending...
Claim 15

Original Legal Text

15. The data mining device of claim 10, wherein the one or more processors are further configured to replace an existing dialect word with a core object dialect word, and perform verification through a similarity measurement between an original dialect sentence and a replaced sentence.

Plain English Translation

This invention relates to data mining systems that process dialectal language variations. The problem addressed is the difficulty in accurately analyzing or translating dialectal words that differ from standard language forms, which can lead to misinterpretation or loss of meaning. The system includes a data mining device with processors that identify dialect words in input text and replace them with standardized core dialect words. The processors then verify the accuracy of this replacement by comparing the original dialect sentence with the modified sentence using a similarity measurement. This ensures that the core object dialect word retains the intended meaning of the original dialect word. The verification step helps maintain semantic consistency, which is critical for applications like machine translation, sentiment analysis, or natural language processing in multilingual or regional contexts. The system may also include preprocessing steps to normalize text and post-processing to refine the output, ensuring robust handling of dialectal variations. The similarity measurement can involve techniques like cosine similarity, edit distance, or other natural language processing metrics to quantify the degree of semantic preservation. This approach improves the reliability of data mining tasks involving dialectal language inputs.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

May 30, 2019

Publication Date

November 8, 2022

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Data mining apparatus, method and system for speech recognition using the same” (US-11495234). https://patentable.app/patents/US-11495234

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/US-11495234. See llms.txt for full attribution policy.

Data mining apparatus, method and system for speech recognition using the same