Legal claims defining the scope of protection, as filed with the USPTO.
1. An electronic device comprising: a memory; and a processor connected to the memory, wherein the processor is configured to: receive a user voice, acquire a first text corresponding to the user voice, acquire a second text for responding to the user voice based on the first text, acquire information regarding a type of an application for providing an output speech, wherein the type of the application is determined based on at least one of the first text corresponding to the user voice or the second text for responding to the user voice, acquire parameter information for determining a style of an output speech corresponding to the second text based on information on a type of a plurality of text-to-speech (TTS) databases, the first text, the second text, and the type of the application for providing the output speech, identify a TTS database corresponding to the parameter information among the plurality of TTS databases, identify a weight set corresponding to the parameter information among a plurality of weight sets acquired through a trained artificial intelligence model, adjust information on the output speech stored in the TTS database based on the weight set, synthesize the output speech based on the adjusted information on the output speech, and output the output speech corresponding to the second text.
2. The electronic device as claimed in claim 1 , wherein the processor is further configured to: acquire the first text corresponding to the user voice by recognizing the user voice, and acquire the second text to respond to the user voice based on natural language processing for the first text corresponding to the user voice.
3. The electronic device as claimed in claim 1 , wherein the processor is further configured to: acquire information on an acoustic feature of the user voice based on the user voice, and acquire at least one of the parameter information based on the acquired information on the acoustic feature.
4. The electronic device as claimed in claim 1 , wherein the parameter information comprises at least one of context information of a user corresponding to the user voice or context information of the electronic device, and wherein the processor is further configured to acquire at least one of the context information of the user and the context information of the electronic device based on sensing information acquired from a sensing device.
5. The electronic device as claimed in claim 1 , further comprising: a user interface, wherein the processor is further configured to change at least one of the parameter information based on a user instruction input through the user interface.
6. The electronic device as claimed in claim 1 , wherein the parameter information comprises at least one of information on a language of the output speech, information on a speaker of the output speech, information on a type of an application that provides information on the output speech, information on a tone of the output speech, information on a user's preference regarding the output speech, context information of a user corresponding to the user voice, or context information of the electronic device.
7. The electronic device as claimed in claim 1 , wherein the plurality of weight sets comprises a plurality of weights for adjusting information on output speeches stored in the plurality of TTS databases, respectively, and wherein the plurality of weight sets is acquired by inputting a learning speech corresponding to the parameter information to the trained artificial intelligence model.
8. A method of controlling an electronic device, the method comprising: receiving a user voice; acquiring a first text corresponding to the user voice; acquiring a second text for responding to the user voice based on the first text; acquiring information regarding a type of an application for providing an output speech, wherein the type of the application is determined based on at least one of the first text corresponding to the user voice or the second text for responding to the user voice; acquiring parameter information for determining a style of an output speech corresponding to the second text based on information on a type of a plurality of text-to-speech (TTS) databases, the first text, the second text, and the type of the application for providing the output speech; identifying a TTS database corresponding to the parameter information among the plurality of TTS databases; identifying a weight set corresponding to the parameter information among a plurality of weight sets acquired through a trained artificial intelligence model; adjusting information on the output speech stored in the TTS database based on the weight set; synthesizing the output speech based on the adjusted information on the output speech; and outputting the output speech corresponding to the second text.
9. The method as claimed in claim 8 , wherein the acquiring of the text comprises: acquiring the first text corresponding to the user voice by recognizing the user voice; and acquiring the second text to respond on the user voice based on natural language processing for the first text corresponding to the user voice.
10. The method as claimed in claim 8 , further comprising: acquiring information on an acoustic feature of the user voice based on the user voice; and acquiring at least one of the parameter information based on the acquired information on the acoustic feature.
11. The method as claimed in claim 8 , wherein the parameter information includes at least one of context information of a user corresponding to the user voice or context information of the electronic device, and wherein the acquiring of the parameter information includes acquiring at least one of the context information of the user or the context information of the electronic device based on sensing information acquired from a sensing device.
12. The method as claimed in claim 8 , wherein the acquiring of the parameter information includes changing at least one of the parameter information based on an input user instruction.
13. The method as claimed in claim 8 , wherein the parameter information includes at least one of information on a language of the output speech, information on a speaker of the output speech, information on a type of an application that provides information on the output speech, information on a tone of the output speech, information on a user's preference regarding the output speech, context information of a user corresponding to the user voice, or context information of the electronic device.
14. The method as claimed in claim 8 , wherein the plurality of weight sets includes a plurality of weights for adjusting information on output speeches stored in the plurality of TTS databases, respectively, and wherein the plurality of weight sets are acquired by inputting a learning speech corresponding to the parameter information to the trained artificial intelligence model.
15. A non-transitory computer-readable recording medium including a program that, when executed by at least one processor, performs a method of controlling an electronic device, the method comprising: receiving a user voice; acquiring a first text corresponding to the user voice; acquiring a second text for responding to the user voice based on the first text; acquiring information regarding a type of an application for providing an output speech, wherein the type of the application is determined based on at least one of the first text corresponding to the user voice or the second text for responding to the user voice; acquiring parameter information for determining a style of an output speech corresponding to the second text based on information on a type of a plurality of text-to-speech (TTS) databases, the first text, the second text, and the type of the application for providing the output speech; identifying a TTS database corresponding to the parameter information among the plurality of TTS databases; identifying a weight set corresponding to the parameter information among a plurality of weight sets acquired through a trained artificial intelligence model; adjusting information on the output speech stored in the TTS database based on the weight set; synthesizing the output speech based on the adjusted information on the output speech; and outputting the output speech corresponding to the second text.
Unknown
May 17, 2022
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.