Speech Recognition Device and Speech Recognition Method

PublishedMay 4, 2010

Assigneenot available in USPTO data we have

InventorsMaki Yamada Makoto Nishizaki Yoshihisa Nakatoh Shinichi Yoshizawa

Technical Abstract

Patent Claims

14 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A speech recognition apparatus for calculating a degree of likelihood of a non-language speech from an unidentified input speech, said speech recognition apparatus calculating, per path, a cumulative score of a language score, a word acoustic score and a garbage acoustic score and outputting a word string with a highest cumulative score as a recognition result of the unidentified input speech including a non-language speech, said apparatus comprising: a garbage acoustic model storage unit operable to store, in advance, a garbage acoustic model which is an acoustic model learned from a collection of a plurality of unnecessary words; a feature value calculation unit operable to calculate a feature parameter necessary for recognition by acoustically analyzing the unidentified input speech per frame which is a unit for speech analysis; a garbage acoustic score calculation unit operable to calculate the garbage acoustic score by comparing the feature parameter and the garbage acoustic model per frame; an estimate value calculation unit operable to calculate, per frame, an estimate value which indicates a degree of likelihood to be a non-language speech of one of the plurality of unnecessary words; a garbage acoustic score correction unit operable to correct the garbage acoustic score calculated by said garbage acoustic score calculation unit so as to raise the score only in the frame where the non-language speech is inputted by adding the estimate value to the garbage acoustic score, the estimate value being calculated by said estimate value calculation unit, and the garbage acoustic score being calculated by said garbage acoustic score calculation unit; and a recognition result output unit operable to output the word string as the recognition result of the unidentified input speech, the word string having the highest cumulative score of the language score, the word acoustic score, and the garbage acoustic score corrected by said garbage acoustic score correction unit.

2. The speech recognition apparatus according to claim 1 , wherein said estimate value calculation unit includes a non-language speech estimation unit operable to calculate, per frame, an estimate value which indicates a degree of likelihood to be a non-language speech of one of the plurality of unnecessary words, using a non-language speech estimate function, and said garbage acoustic score correction unit corrects the garbage acoustic score so as to raise the score by using the estimate value calculated by said non-language speech estimation unit in the frame where the non-language speech is inputted.

3. The speech recognition apparatus according to claim 2 , wherein said non-language speech estimation unit calculates an estimate value which is high in a part where spectra of the unidentified input speech become repeating patterns, based on the feature parameter calculated, per frame, by said feature value calculation unit.

4. The speech recognition apparatus according to claim 2 , further comprising: a non-language estimation specific feature value calculation unit operable to calculate, per frame, the feature parameter which is necessary for estimating the non-language speech; and a non-language acoustic model storage unit operable to store, in advance, a non-language acoustic model which is an acoustic model modeling the features of non-languages, wherein said non-language speech estimation unit calculates, per frame, a non-language comparative score as the estimate value by comparing the feature parameter for estimating the non-language and the non-language acoustic model.

5. The speech recognition apparatus according to claim 4 , further comprising a high frequency power retaining frame number calculation unit operable to calculate the number of frames that retain high frequency power based on the feature parameter for estimating the non-language which is calculated by said non-language estimation specific feature value calculation unit, wherein said non-language speech estimation unit calculates the non-language comparative score by comparing the feature parameter for estimating the non-language and the non-language acoustic model, and calculates the estimate value which indicates the likelihood to be the non-language from the non-language comparative score and the high frequency power retaining frame number.

6. The speech recognition apparatus according to claim 5 , wherein said high frequency power retaining frame number calculation unit regards the high frequency power acquired by said feature value calculation unit operable to estimate the non-language as the frame of “high” high frequency power in a case where the high frequency power is higher than a predetermined threshold value.

7. The speech recognition apparatus according to claim 2 , further comprising a non-language corresponding character insert unit operable to (i) select at least one of an ideogram and an emoticon corresponding to the non-language speech based on the estimate value estimated by said non-language speech estimation unit and (ii) insert at least either of the selected ideogram and emoticon into the recognition result of said recognition result output unit.

8. The speech recognition apparatus according to claim 2 , further comprising an agent control unit operable to control an agent's movement which is displayed on a screen and composite tones of an agent's speech based on the recognition result outputted by said recognition result output unit and the estimate value estimated by said non-language speech estimation unit.

9. The speech recognition apparatus according to claim 1 , wherein said estimate value calculation unit includes a non-language phenomenon estimation unit operable to calculate an estimate value of a non-language phenomenon which is related to the non-language speech based on user's information interlocking the non-language speech of one of the plurality of necessary words, and said garbage acoustic score correction unit corrects the garbage acoustic score so as to raise the score using the estimate value in the frame where the non-language phenomenon which is calculated by said non-language phenomenon estimation unit is inputted.

10. The speech recognition apparatus according to claim 9 , further comprising a non-language character insert unit operable to (i) select at least either of an ideogram and an emoticon corresponding to the non-language speech, based on the estimate value estimated by said non-language phenomenon estimation unit, and (ii) insert at least one of the selected ideogram and emoticon into the recognition result of said recognition result output unit.

11. The speech recognition apparatus according to claim 9 , further comprising an agent control unit operable to control an agent's movement which is displayed on a screen and composite tones of an agent's speech based on the recognition result outputted by said recognition result output unit and the estimate value estimated by said non-language speech estimation unit.

12. The speech recognition apparatus according to claim 1 , further comprising a correcting parameter selection change unit operable to (i) have a user select a value of a correcting parameter for determining a degree of correction of the garbage acoustic score in said garbage acoustic score correction unit and (ii) change the selected value of the correcting parameter, wherein said garbage acoustic score correction unit corrects the garbage acoustic score, based on the correcting parameter.

13. A speech recognition method used by a speech recognition apparatus for calculating a degree of likelihood of a non-language speech from an unidentified input speech , the speech recognition apparatus calculating a cumulative score of a language score, a word acoustic score, and a garbage acoustic score, per path, and outputting a word string with a highest cumulative score as a recognition result of the unidentified input speech including a non-language speech, said method comprising: a feature value calculation step of calculating a feature parameter necessary for recognition by acoustically analyzing the unidentified input speech per frame which is a unit for speech analysis; a garbage acoustic score calculation step of calculating the garbage acoustic score by comparing, per frame, the feature parameter and the garbage acoustic model which is stored in the garbage acoustic model storage unit and which is an acoustic model learned from a collection of unnecessary words; an estimate value calculation step of calculating, per frame, an estimate value which indicates a degree of likelihood to be a non-language speech of one of the plurality of unnecessary words; a garbage acoustic score correction step of correcting the garbage acoustic score calculated by the garbage acoustic score calculation step so as to raise the score only in the frame where the non-language speech is inputted by adding the estimate value to the garbage acoustic score, the estimate value being calculated in the estimate value calculation step, and the garbage acoustic score being calculated in the garbage acoustic score calculation step; and a recognition result output step of outputting, as the recognition result of the unidentified input speech, the word string with the highest cumulative score of the language score, the word acoustic score and the garbage acoustic score which is corrected by the garbage acoustic score correction step.

14. A program stored on a computer-readable medium for a speech recognition apparatus for calculating a degree of likelihood of a non-language speech from an unidentified input speech, the speech recognition apparatus calculating a cumulative score of a language score, a word acoustic score, and a garbage acoustic score, per path, and outputting a word string with a highest cumulative score as a recognition result of the unidentified input speech including a non-language speech, said program causing a computer to execute: a feature value calculation step of calculating a feature parameter necessary for recognition by acoustically analyzing the unidentified input speech per frame which is a unit for speech analysis; a garbage acoustic score calculation step of calculating the garbage acoustic score by comparing, per frame, the feature parameter and the garbage acoustic model which is stored in the garbage acoustic model storage unit and which is an acoustic model learned from a collection of unnecessary words; an estimate value calculation step of calculating, per frame, an estimate value which indicates a degree of likelihood to be a non-language speech of one or the plurality of unnecessary words; a garbage acoustic score correction step of correcting the garbage acoustic score calculated by the garbage acoustic score calculation step so as to raise the score only in the frame where the non-language speech is inputted by adding the estimate value to the garbage acoustic score, the estimate value being calculated in the estimate value calculation step, and the garbage acoustic score being calculated in the garbage acoustic score calculation step; and a recognition result output step of outputting, as the recognition result of the unidentified input speech, the word string with the highest cumulative score of the language score, the word acoustic score and the garbage acoustic score which is corrected by the garbage acoustic score correction step.

Patent Metadata

Filing Date

Unknown

Publication Date

May 4, 2010

Inventors

Maki Yamada

Makoto Nishizaki

Yoshihisa Nakatoh

Shinichi Yoshizawa

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search