Method and Apparatus for Performing Speech Segmentation

PublishedMarch 7, 2006

Assigneenot available in USPTO data we have

Technical Abstract

Patent Claims

16 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method for performing a segmentation operation upon a synthesizing speech signal and an input speech signal, comprising the steps of: generating a synthesized speech signal and a speech element duration signal from said synthesizing speech signal; extracting a first feature parameter from said synthesized speech signal; extracting a second feature parameter from said input speech signal; and performing a dynamic programming matching operation upon said second feature parameter with reference to said first feature parameter and said speech element duration signal to obtain segmentation points of said input speech signal.

2. The method as set forth in claim 1 , wherein said synthesizing speech signal includes no paused intervals, said dynamic programming matching operation performing step comprising the steps of: determining whether or not there are paused intervals in said second feature parameter; and controlling a searched path width and weight of said second feature parameter for said dynamic programming operation in said paused intervals when there are said paused intervals in said second feature parameter.

3. The method as set forth in claim 2 , wherein said dynamic programming matching operation performing step further comprises a step of determining pause boundaries in accordance with said paused intervals and said segmentation points close thereto.

4. The method as set forth in claim 3 , wherein said dynamic programming matching operation performing step comprises a step of performing a further dynamic programming matching operation upon said second feature parameter partitioned by said paused intervals in accordance with said first feature parameter and said speech element duration signal.

5. The method as set forth in claim 4 , wherein said further dynamic programming matching operation performing step comprises the steps of: storing a start time and an end time of a possible silent vowel of said first feature parameter; comparing a weight distance of a path from zero time to said start time with a weight distance of a path from said zero time to said end time; and determining whether or not a vowel of said second feature parameter corresponding to said possible silent vowel of said first feature parameter is silent in accordance with a result of said comparing step.

6. The method as set forth in claim 5 , wherein said further dynamic programming matching operation performing step increases a searched path width of said first feature parameter in an interval where the vowel of said second feature parameter is determined to be silent.

7. The method as set forth in claim 3 , further comprising a step of modifying said segmentation points in accordance with a change of said second feature parameter in specific boundaries.

8. The method as set forth in claim 1 , further comprising the steps of: forming speech synthesizing elements in accordance with said input speech signal and said segmentation points; and storing said speech synthesizing elements, said synthesized speech signal/speech element duration generating step generating said synthesized speech signal and said speech element duration signal from said stored speech synthesizing elements, said dynamic programming matching operation performing step performing said dynamic matching operation until said segmentation points are converged.

9. An apparatus for performing a segmentation operation upon a synthesizing speech signal and an input speech signal, comprising: a speech synthesizing unit for generating a synthesized speech signal and a speech element duration signal from said synthesizing speech signal; a feature parameter extracting unit for extracting a first feature parameter from said synthesized speech signal and extracting a second feature parameter from said input speech signal; and a matching unit for performing a dynamic programming matching operation upon said second feature parameter with reference to said first feature parameter and said speech element duration signal to obtain segmentation points of said input speech signal.

10. The apparatus as set forth in claim 9 , wherein said synthesizing speech signal includes no paused intervals, said matching unit comprising: means for determining whether or not there are paused intervals in said second feature parameter; and means for controlling a searched path width and weight of said second feature parameter for said dynamic programming operation in said paused intervals when there are said paused intervals in said second feature parameter.

11. The apparatus as set forth in claim 10 , wherein said matching unit further comprises means for determining pause boundaries in accordance with said paused intervals and said segmentation points close thereto.

12. The apparatus as set forth in claim 11 , wherein said matching unit comprises a further matching unit for performing a further dynamic programming matching operation upon said second feature parameter partitioned by said paused intervals in accordance with said first feature parameter and said speech element duration signal.

13. The apparatus as set forth in claim 12 , wherein said further matching unit comprises: means for storing a start time and an end time of a possible silent vowel of said first feature parameter; means for comparing a weight distance of a path from a zero time to said start time with a weight distance of a path from said zero time to said end time; and means for determining whether or not a vowel of said second feature parameter corresponding to said possible silent vowel of said first feature parameter is silent in accordance with a result of said comparing means.

14. The apparatus as set forth in claim 13 , wherein said further matching unit increases a searched path width of said first feature parameter in an interval where the vowel of said second feature parameter is determined to be silent.

15. The apparatus as set forth in claim 11 , further comprising a segmentation point modifying unit for modifying said segmentation points in accordance with a change of said second feature parameter in specific boundaries.

16. The apparatus as set forth in claim 9 , further comprising: a speech synthesizing element forming unit for forming speech synthesizing elements in accordance with said input speech signal and said segmentation points; and a storing unit for storing said speech synthesizing elements, said speech synthesizing unit generating said synthesized speech signal and said speech element duration signal from said stored speech synthesizing elements, said matching unit performing said dynamic programming matching operation until said segmentation points are converged.

Patent Metadata

Filing Date

Unknown

Publication Date

March 7, 2006

Inventors

Takuya Takizawa

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search