US-11495234

Data mining apparatus, method and system for speech recognition using the same

PublishedNovember 8, 2022

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A data mining device, and a speech recognition method and system using the same are disclosed. The speech recognition method includes selecting speech data including a dialect from speech data, analyzing and refining the speech data including a dialect, and learning an acoustic model and a language model through an artificial intelligence (AI) algorithm using the refined speech data including a dialect. The user is able to use a dialect speech recognition service which is improved using services such as eMBB, URLLC, or mMTC of 5G mobile communications.

Patent Claims

12 claims

Legal claims defining the scope of protection, as filed with the USPTO.

2. The speech recognition method of claim 1, further comprising collecting speech data.

3. The speech recognition method of claim 2, wherein, in the collecting speech data, speech data of users is collected through the users in a region in which different types of dialects are used, and through various types of speech recognition service domains.

5. The speech recognition method of claim 1, wherein, in the extracting features from the speech data including the dialect, at least one among pronunciation string features, lexical features, domain features, and frequency features of dialect speech is extracted.

6. The speech recognition method of claim 5, wherein the domain features comprise information on a type of an electronic apparatus providing a speech recognition service for the user, information on a region in which the electronic apparatus is located, and information on an age group of the user of the electronic apparatus.

7. The speech recognition method of claim 1, wherein, in the performing similar dialect word clustering, a degree of similarity between features is measured through weight calculation between features according to an unsupervised learning method, and a dialect word having a high degree of similarity to a threshold is clustered.

8. The speech recognition method of claim 1, wherein, in the extracting the core dialect word from the similar dialect word cluster, n number of objects having the highest frequency features in a cluster are extracted, and a core object is extracted through a feature similarity calculation with other objects in the cluster.

9. The speech recognition method of claim 1, wherein, in the standardizing the dialect corpus, an existing dialect word is replaced with a core object dialect word, and verification is performed through a similarity measurement between an original dialect sentence and a replaced sentence.

11. The data mining device of claim 10, wherein the one or more processors are further configured to extract at least one among pronunciation string features, lexical features, domain features, and frequency features of dialect speech.

12. The data mining device of claim 11, wherein the domain features comprise information on a type of an electronic apparatus providing a speech recognition service for the user, information on a region in which the electronic apparatus is located, and information on an age group of the user of the electronic apparatus.

13. The data mining device of claim 10, wherein the one or more processors are further configured to measure a degree of similarity between features through weight calculation between features according to an unsupervised learning method, and cluster a dialect word having a high degree of similarity to a threshold.

14. The data mining device of claim 10, wherein the one or more processors are further configured to extract n number of objects having the highest frequency features in a cluster, and extract a core object through a feature similarity calculation with other objects in the cluster.

15. The data mining device of claim 10, wherein the one or more processors are further configured to replace an existing dialect word with a core object dialect word, and perform verification through a similarity measurement between an original dialect sentence and a replaced sentence.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G10L

Patent Metadata

Filing Date

May 30, 2019

Publication Date

November 8, 2022

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search