Legal claims defining the scope of protection, as filed with the USPTO.
2. The TTS system of claim 1 wherein the operations further comprise converting a frequency domain signal combined from a neural network and the pre-existing knowledgebase into the corrected speech signal.
3. The TTS system of claim 1 wherein the data in the pre-existing knowledgebase of phonemes comprises average basic acoustic signal data of how a speaker speaks derived from the recorded audible speech.
4. The TTS system of claim 1 wherein the operations further comprise correcting for psychoacoustic perceived speech signal distortions of the pre-existing knowledgebase of phonemes.
5. The TTS system of claim 1 wherein the operations further comprise upsampling a frequency of an input vector to another frequency of an intermediate vector.
6. The TTS system of claim 5, wherein the input vector comprises at least one of a base frequency, a phoneme duration, and a phoneme sequence.
7. The TTS system of claim 1 wherein the operations further comprise correcting voiced phonemes of the pre-existing knowledgebase of phonemes.
8. The TTS system of claim 1 wherein the operations further comprise correcting unvoiced phonemes of the pre-existing knowledgebase of phonemes.
9. The TTS system of claim 1 wherein a neural network is configured based on psychoacoustic modeling of phonemes.
11. The method of claim 10 wherein the operations further comprise converting a frequency domain signal combined from a neural network and the pre-existing knowledgebase into the corrected speech signal.
12. The method of claim 10 wherein the data in the pre-existing knowledgebase of phonemes comprises average basic acoustic signal data of how a speaker speaks derived from the recorded audible speech.
13. The method of claim 10 wherein the operations further comprise correcting for psychoacoustic perceived speech signal distortions of the pre-existing knowledgebase of phonemes.
14. The method of claim 10 wherein the operations further comprise upsampling a frequency of an input vector to another frequency of an intermediate vector.
15. The method of claim 14, wherein the input vector comprises at least one of a base frequency, a phoneme duration, and a phoneme sequence.
16. The method of claim 10 wherein the operations further comprise correcting voiced phonemes of the pre-existing knowledgebase of phonemes.
17. The method of claim 10 wherein the operations further comprise correcting unvoiced phonemes of the pre-existing knowledgebase of phonemes.
18. The method of claim 10 wherein a neural network is configured based on psychoacoustic modeling of phonemes.
20. The non-transitory computer-readable medium of claim 19 wherein the operations further comprise correcting for psychoacoustic perceived speech signal distortions of the pre-existing knowledgebase of phonemes.
Unknown
August 22, 2023
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.