Method of Speech Segment Selection for Concatenative Synthesis Based on Prosody-Aligned Distance Measure

PublishedJanuary 1, 2008

Assigneenot available in USPTO data we have

Technical Abstract

Patent Claims

11 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method of speech segment selection for use in constructing a concatenative synthesizer's database based on prosody-aligned distance measure, comprising the steps of: (A) segmenting speech stored in a speech corpus, which is recorded in advance into a plurality of speech segments according to a unit type, wherein each of the speech segments has its prosody; (B) locating pitch marks for each of the speech segments; (C) selecting one of the speech segments according to the unit type as a source segment and the remaining speech segments as target segments, and performing a prosody alignment between the source segment and each of the target segments by modifying the prosody of the source segment with a respective prosody of each of the target segments, so as to obtain a prosody-aligned source segment with respect to each of the target segments, wherein the pitch marks of the prosody-aligned source segment are time-aligned and pitch-aligned with the pitch marks of each of the target segments; (D) respectively measuring distortion between the prosody-aligned source segment and each of the target segments to obtain a distance between the prosody-aligned source segment and each of the target segments, and to obtain an average distance for the prosody-aligned source segment with respect to each of the target segments; and (E) selecting at least one speech segment previously selected as the source segment with a relatively small average distance to be used as a synthetic speech unit of the unit type for constructing the synthesizer's database.

2. The method as claimed in claim 1 , wherein in step (A), the unit type is a syllable.

3. The method as claimed in claim 1 , wherein in step (A), the speech corpus is automatically segmented into a plurality of speech segments according to a unit type by a computer.

4. The method as claimed in claim 3 , wherein the speech is segmented by using a Markov model.

5. The method as claimed in claim 1 , wherein in step (C), the prosody alignment is performed between the source segment and each target segment by using a pitch synchronous overlap-and-add (PSOLA) algorithm.

6. The method as claimed in claim 1 , wherein in step (D), the distance is D ij =dist(Ŝ i <S j >, S j ), where S i is the source segment, S j is the target segment, and Ŝ i <S j > is the waveform of the prosody-aligned source segment.

7. The method as claimed in claim 6 , wherein step (D) measures the distortion between the prosody-aligned source segment and each of the target segments by using a Mel-frequency cepstrum coefficients (MFCC) algorithm.

8. The method as claimed in claim 6 , wherein step (D) measures the distortion between the prosody-aligned source segment and each of the target segments by using a perceptual speech quality measure (PSQM) method.

9. The method as claimed in claim 6 , wherein the average distance of one speech segment S i among other speech segments is D i = 1 N - 1 ⁢ ∑ j = 1 j ≠ i N ⁢ ⁢ D i , ⁢ j , wherein N is the number of speech segments.

10. The method as claimed in claim 9 , wherein the value i of the speech segment S i can be calculated according to an inverse function of the average distance, where the inverse function is i=arg {D i }.

11. The method as claimed in claim 10 , wherein the value of i of the speech segment S i with the smallest average distance can be calculated according to the inverse function i opt = arg ⁢ ⁢ min i ⁢ { D i } .

Patent Metadata

Filing Date

Unknown

Publication Date

January 1, 2008

Inventors

Chih-Chung Kuo

Chi-Shiang Kuo

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search