A speaker recognition method and apparatus receives a first voice signal of a speaker, generates a second voice signal by enhancing the first voice signal through speech enhancement, generates a multi-channel voice signal by associating the first voice signal with the second voice signal, and recognizes the speaker based on the multi-channel voice signal.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A speaker recognition method comprising: receiving a first voice signal of a speaker; generating a second voice signal by enhancing the first voice signal through speech enhancement; generating a multi-channel voice signal by associating the first voice signal with the second voice signal; and recognizing the speaker based on the multi-channel voice signal, wherein generating the multi-channel voice signal comprises: extracting, from the first voice signal, a first feature vector including first voice information of the speaker; extracting, from the second voice signal, a second feature vector including second voice information of the speaker; and generating the multi-channel voice signal by associating the first feature vector with the second feature vector.
2. The speaker recognition method of claim 1 , wherein generating the second voice signal comprises one or both of: enhancing the first voice signal by removing a noise signal estimated from the first voice signal; and enhancing the first voice signal by increasing energy corresponding to a voice of the speaker detected from the first voice signal.
3. The speaker recognition method of claim 2 , wherein enhancing the first voice signal by removing the noise signal comprises at least one of: removing the noise signal by performing stationary noise suppression through minimum pooling on the first voice signal; removing the noise signal through channel normalization on the first voice signal; and removing the noise signal through sound source separation on the first voice signal.
4. The speaker recognition method of claim 1 , wherein extracting the first feature vector comprises: extracting a variable-length first feature vector from the first voice signal; and extracting a fixed-length first feature vector including the first voice information of the speaker from the variable-length first feature vector.
5. The speaker recognition method of claim 4 , wherein extracting the fixed-length first feature vector comprises: extracting the fixed-length first feature vector corresponding to a neural network that is trained to recognize the speaker from the variable-length first feature vector.
6. The speaker recognition method of claim 1 , wherein extracting the second feature vector comprises: extracting a variable-length second feature vector from the second voice signal; and extracting a fixed-length second feature vector including the second voice information of the speaker from the variable-length second feature vector.
7. The speaker recognition method of claim 6 , wherein extracting the variable-length second feature vector comprises: extracting the variable-length second feature vector using one or both of a feature extraction method based on a spectrum of the second voice signal, and a feature extraction method based on a neural network configured to extract a valid voice frequency interval based on a sync function of the second voice signal.
8. The speaker recognition method of claim 6 , wherein extracting the fixed-length second feature vector comprises: extracting the fixed-length second feature vector corresponding to a neural network that is trained to recognize the speaker from the variable-length second feature vector.
9. The speaker recognition method of claim 1 , wherein the multi-channel voice signal corresponds to a same utterance point as the first voice signal and the second voice signal, and includes a same dimension as the first voice signal and the second voice signal.
10. The speaker recognition method of claim 1 , further comprising: determining whether to use multiple channels, wherein generating the second voice signal comprises: generating the second voice signal based on a first determination of whether to use the multiple channels.
11. The speaker recognition method of claim 10 , wherein determining whether to use the multiple channels comprises: determining whether to use the multiple channels based on at least one of an operational load and a response speed according to a requirement of a speaker recognition apparatus, a magnitude of stationary noise included in the first voice signal, and a voice volume of the speaker corresponding to the first voice signal.
12. The speaker recognition method of claim 10 , wherein generating the multi-channel voice signal comprises: generating the multi-channel voice signal based on a second determination of whether to use the multiple channels.
13. The speaker recognition method of claim 1 , wherein generating the multi-channel voice signal comprises: determining a number of multiple channels; and generating the multi-channel voice signal by associating the first voice signal with the second voice signal based on the number of the multiple channels.
14. The speaker recognition method of claim 13 , wherein determining the number of the multiple channels comprises: determining the number of the multiple channels based on one or both of a feature of the first voice signal and noise at a location at which the first voice signal is uttered.
15. The speaker recognition method of claim 14 , wherein the feature of the first voice signal comprises: at least one of a magnitude of stationary noise included in the first voice signal, a voice volume of the speaker corresponding to the first voice signal, and a magnitude of additive noise corresponding to the first voice signal.
16. The speaker recognition method of claim 1 , wherein receiving the first voice signal comprises: collecting the first voice signal through a voice signal collector including a microphone.
17. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the speaker recognition method of claim 1 .
18. A speaker recognition method comprising: receiving a first voice signal of a speaker; generating a second voice signal by enhancing the first voice signal through speech enhancement; generating a multi-channel voice signal by associating the first voice signal with the second voice signal; and recognizing the speaker based on the multi-channel voice signal, wherein recognizing the speaker comprises: outputting a feature vector corresponding to the speaker by applying the multi-channel voice signal to a neural network; calculating a similarity score based on a result of comparing the feature vector and a registered feature vector of the speaker; and recognizing the speaker based on the similarity score.
19. A speaker recognition apparatus comprising: a microphone configured to collect a first voice signal of a speaker; and a processor configured to generate a second voice signal by enhancing the first voice signal through speech enhancement, generate a multi-channel voice signal by associating the first voice signal with the second voice signal, and recognize the speaker based on the multi-channel voice signal, wherein the processor is configured to generate the multi-channel voice signal by extracting, from the first voice signal, a first feature vector including first voice information of the speaker, extracting, from the second voice signal, a second feature vector including second voice information of the speaker, and generating the multi-channel voice signal by associating the first feature vector with the second feature vector.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
June 11, 2020
May 31, 2022
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.