Methods And Systems For Detecting And Processing Speech Signals

PublishedDecember 25, 2018

Assigneenot available in USPTO data we have

InventorsJay Pierre Civelli Mikhal Shemer Turaj Zakizadeh Shabestary David Tapuska

Technical Abstract

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A computer-implemented method comprising: receiving, at a centralized processing device, a corresponding hotword confidence score from each of multiple media devices in communication with the centralized processing device via a network, each hotword confidence score indicating a likelihood that audio data corresponding to a first utterance of a user received by the corresponding media device includes a particular, predefined hotword; determining, by the centralized processing device, that two or more of the received hotword confidence scores satisfy a hotword score threshold; for each of the two or more media devices having hotword confidence scores that satisfy the hotword score threshold, receiving, at the centralized processing device, second audio data from the corresponding media device, the second audio data recorded by the corresponding media device and including a user speech command; and generating, by the centralized processing device, a request associated with the user speech command based on the second audio data received from each of the two or more media devices having hotword confidence scores that satisfy the hotword score threshold.

2. The computer-implemented method of claim 1 , further comprising, in response to receiving the corresponding hotword confidence score from each of the two or more media devices in communication with the centralized processing device, instructing, by the centralized processing device, each of the two or more media devices to mute a corresponding loudspeaker.

3. The computer-implemented method of claim 1 further comprising, prior to receiving the second audio data from the corresponding media device, instructing, by the centralized processing device, the corresponding media device to activate a corresponding microphone.

4. The computer-implemented method of claim 1 , wherein each of the multiple media devices are configured to: record the first audio data corresponding to the first user utterance; detect the particular, predefined hotword in the first audio data using a corresponding hotword data module; compute the corresponding hotword confidence score indicating the likelihood that the first audio data recorded by the corresponding media device includes the particular, predefined hotword; and transmit the corresponding hotword confidence score over the network to the centralized processing device.

5. The computer-implemented method of claim 1 , further comprising: transmitting the request associated with the user speech command from the centralized processing device to an external server; receiving, at the centralized processing device, an audio response associated with the user speech command from the external server; and transmitting the audio response to at least one of the multiple media devices, the audio response when received by the at least one media device causing the at least one media device to play the audio response over a corresponding loudspeaker associated with the at least one media device.

6. The computer-implemented method of claim 1 , wherein generating the request associated with the user speech command based on the second audio data comprises combining the second audio data received from each of the two or more media devices having hotword confidence scores that satisfy the hotword score threshold to generate the request associated with the user speech command.

7. The computer-implemented method of claim 1 , further comprising, for each of the two or more media devices having hotword confidence scores that satisfy the hotword score threshold, determining, by the centralized computing device, a word transcription and a word segment score for each word in the second audio data received from the corresponding media device.

8. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving a corresponding hotword confidence score from each of multiple media devices in communication with the one or more computers via a network, each hotword confidence score indicating a likelihood that audio data corresponding to a first utterance of a user received by the corresponding media device includes a particular, predefined hotword; determining that two or more of the received hotword confidence scores satisfy a hotword score threshold; for each of the two or more media devices having hotword confidence scores that satisfy the hotword score threshold, receiving second audio data from the corresponding media device, the second audio data recorded by the corresponding media device and including a user speech command; and generating a request associated with the user speech command based on the second audio data received from each of the two or more media devices having hotword confidence scores that satisfy the hotword score threshold.

9. The system of claim 8 , wherein the operations further comprise in response to receiving the corresponding hotword confidence score from each of the two or more media devices, instructing each of the two or more media devices to mute a corresponding loudspeaker.

10. The system of claim 8 , wherein the operations further comprise, prior to receiving the second audio data from the corresponding media device, instructing the corresponding media device to activate a corresponding microphone.

11. The system of claim 8 , wherein each of the multiple media devices are configured to: record the first audio data corresponding to the first user utterance; detect the particular, predefined hotword in the first audio data using a corresponding hotword data module; compute the corresponding hotword confidence score indicating the likelihood that the first audio data recorded by the corresponding media device includes the particular, predefined hotword; and transmit the corresponding hotword confidence score over the network to the one or more computers.

12. The system of claim 8 , wherein the operations further comprise: transmitting the request associated with the user speech command to an external server; receiving an audio response associated with the user speech command from the external server; and transmitting the audio response to at least one of the multiple media devices, the audio response when received by the at least one media device causing the at least one media device to play the audio response over a corresponding loudspeaker associated with the at least one media device.

13. The system of claim 8 , wherein generating the request associated with the user speech command based on the second audio data comprises combining the second audio data received from each of the two or more media devices having hotword confidence scores that satisfy the hotword score threshold to generate the request associated with the user speech command.

14. The system of claim 8 , wherein the operations further comprise, for each of the two or more media devices having hotword confidence scores that satisfy the hotword score threshold, determining a word transcription and a word segment score for each word in the second audio data received from the corresponding media device.

15. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising: receiving a corresponding hotword confidence score from each of multiple media devices in communication with the one or more computers via a network, each hotword confidence score indicating a likelihood that audio data corresponding to a first utterance of a user received by the corresponding media device includes a particular, predefined hotword; determining that two or more of the received hotword confidence scores satisfy a hotword score threshold; for each of the two or more media devices having hotword confidence scores that satisfy the hotword score threshold, receiving second audio data from the corresponding media device, the second audio data recorded by the corresponding media device and including a user speech command; and generating a request associated with the user speech command based on the second audio data received from each of the two or more media devices having hotword confidence scores that satisfy the hotword score threshold.

16. The computer-readable medium of claim 15 , wherein the operations further comprise, in response to receiving the corresponding hotword confidence score from each of the two or more media devices, instructing each of the two or more media devices to mute a corresponding loudspeaker.

17. The computer-readable medium of claim 15 , wherein the operations further comprise, prior to receiving the second audio data from the corresponding media device, instructing the corresponding media device to activate a corresponding microphone.

18. The computer-readable medium of claim 15 , wherein each of the multiple media devices are configured to: record the first audio data corresponding to the first user utterance; detect the particular, predefined hotword in the first audio data using a corresponding hotword data module; compute the corresponding hotword confidence score indicating the likelihood that the first audio data recorded by the corresponding media device includes the particular, predefined hotword; and transmit the corresponding hotword confidence score over the network to the one or more computers.

19. The computer-readable medium of claim 15 , wherein the operations further comprise: transmitting the request associated with the user speech command to an external server; receiving an audio response associated with the user speech command from the external server; and transmitting the audio response to at least one of the multiple media devices, the audio response when received by the at least one media device causing the at least one media device to play the audio response over a corresponding loudspeaker associated with the at least one media device.

20. The computer-readable medium of claim 15 , wherein generating the request associated with the user speech command based on the second audio data comprises combining the second audio data received from each of the two or more media devices having hotword confidence scores that satisfy the hotword score threshold to generate the request associated with the user speech command.

Patent Metadata

Filing Date

Unknown

Publication Date

December 25, 2018

Inventors

Jay Pierre Civelli

Mikhal Shemer

Turaj Zakizadeh Shabestary

David Tapuska

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search