Directional Processing of Far-Field Audio

PublishedOctober 1, 2019

Assigneenot available in USPTO data we have

InventorsLae-Hoon Kim Erik Visser Asif Mohammad Ian Ernan Liu Ye Jiang

Technical Abstract

Patent Claims

30 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. An apparatus comprising: multiple microphones configured to generate multiple audio signals, each microphone of the multiple microphones configured to generate a respective audio signal of the multiple audio signals based on sound of a far-field acoustic environment as detected at the microphone; a signal processing system configured to process the multiple audio signals to generate at least one processed audio signal, the signal processing system configured to update one or more processing parameters while operating in a first operational mode and configured to use a static version of the one or more processing parameters while operating in a second operational mode; and a keyword detection system configured to perform keyword detection based on the at least one processed audio signal to determine whether the sound includes an utterance corresponding to a keyword and, based on a determination that the utterance corresponds to the keyword, to send a control signal to the signal processing system to change an operational mode of the signal processing system from the first operational mode to the second operational mode, wherein the signal processing system is configured to use directional parameters while operating in the second operation mode, and wherein the directional parameters are determined based on a direction of arrival of a portion of the sound corresponding to the utterance.

2. The apparatus of claim 1 , wherein the signal processing system is configured to generate a processed signal based on a direction of arrival (DOA) of the portion of the sound corresponding to the utterance and is configured to provide the processed signal to a voice recognition system.

3. The apparatus of claim 1 , wherein the signal processing system includes a noise cancellation system configured to reduce a noise component of the far-field acoustic environment.

4. The apparatus of claim 3 , wherein the one or more processing parameters include noise cancellation parameters, and wherein the noise cancellation system is configured to use adaptive noise cancellation parameters while operating in the first operational mode and is configured to use static noise cancellation parameters while operating in the second operational mode.

5. The apparatus of claim 1 , wherein the signal processing system includes a beamformer, wherein the one or more processing parameters include beamforming parameters, and the beamformer is configured to use adaptive beamforming parameters while operating in the first operational mode and is configured to use static beamforming parameters while operating in the second operational mode.

6. The apparatus of claim 5 , wherein the static beamforming parameters are determined based on the direction of arrival of the portion of the sound corresponding to the utterance.

7. The apparatus of claim 1 , wherein the signal processing system includes a nullformer, wherein the one or more processing parameters include nullforming parameters, and the nullformer is configured to use adaptive nullforming parameters while operating in the first operational mode and is configured to use static nullforming parameters while operating in the second operational mode.

8. The apparatus of claim 1 , wherein the signal processing system includes a beamformer, wherein the beamformer is configured to use adaptive beamforming parameters while operating in the first operational mode and configured to use static beamforming parameters while operating in the second operational mode.

9. The apparatus of claim 1 , further comprising an acoustic environment analyzer to generate data descriptive of the far-field acoustic environment and to provide the data descriptive of the far-field acoustic environment to at least one of the keyword detection system or a voice recognition system, the data based on a noise signal corresponding to a noise component of the far-field acoustic environment.

10. The apparatus of claim 9 , wherein the data descriptive of the far-field acoustic environment includes a signal-to-noise ratio, a noise type indicator, or a combination thereof.

11. The apparatus of claim 1 , further comprising a voice recognition system to analyze a voice input and to initiate an action based on speech content of the voice input.

12. The apparatus of claim 11 , wherein the voice recognition system is configured to generate a second control signal to change the operational mode of the signal processing system from the second operational mode to the first operational mode responsive to detection of an end of the voice input.

13. The apparatus of claim 1 , further comprising a network interface coupled to the signal processing system and configured to, after the keyword detection system detects the keyword, send a signal encoding a voice input to a voice recognition device via a network.

14. The apparatus of claim 1 , wherein the signal processing system is configured to generate a plurality of directional processed audio signals including the at least one processed audio signal, each directional processed audio signal of the plurality of directional processed audio signals encoding sound from a portion of the far-field acoustic environment associated with a corresponding direction, and wherein the keyword detection system is configured to perform the keyword detection based on each directional processed audio signal of the plurality of directional processed audio signals.

15. The apparatus of claim 14 , wherein the keyword detection system sends the control signal to the signal processing system based on determining that at least one directional processed audio signal of the plurality of directional processed audio signals encodes the portion of the sound corresponding to the utterance, and wherein the control signal indicates a direction of arrival associated with the portion of the sound corresponding to the utterance based on which directional processed audio signal of the plurality of directional processed audio signals encodes the portion of the sound corresponding to the utterance.

16. The apparatus of claim 15 , wherein the keyword detection system is configured to determine a confidence metric for each directional processed audio signal of the plurality of directional processed audio signals, and wherein the control signal is generated based on which confidence metrics satisfy a confidence threshold.

17. The apparatus of claim 16 , wherein the signal processing system is configured to provide each directional processed audio signal of the plurality of directional processed audio signals to the keyword detection system while operating in the first operational mode and is configured to provide only a subset of the directional processed audio signals to a voice recognition system while operating in the second operational mode.

18. The apparatus of claim 17 , wherein the subset of the directional processed audio signals corresponds to particular directional processed audio signals that are associated with confidence metrics that satisfy the confidence threshold.

19. The apparatus of claim 1 , wherein the control signal indicates direction of arrival information associated with the portion of the sound corresponding to the utterance.

20. The apparatus of claim 14 , wherein the signal processing system includes a beamformer and a rake combiner coupled to the beamformer, wherein the beamformer is configured to generate the plurality of directional processed audio signals and the rake combiner is configured to, while operating in the second operational mode, combine two or more of the plurality of directional processed audio signals to form a raked beamformer output based on the control signal.

21. A method for processing sound of a far-field acoustic environment, the method comprising: obtaining multiple audio signals, each audio signal of the multiple audio signals generated by a respective microphone of multiple microphones based on the sound of the far-field acoustic environment as detected at the respective microphone; processing, at a signal processing system, the multiple audio signals to generate at least one processed audio signal; performing keyword detection, at a keyword detection system, based on the at least one processed audio signal to determine whether the sound includes an utterance corresponding to a keyword; and changing an operational mode of the signal processing system from a first operational mode to a second operational mode based on a determination that the utterance corresponds to the keyword, wherein the signal processing system updates one or more processing parameters while operating in the first operational mode and uses a static version of the one or more processing parameters while operating in the second operational mode, wherein the signal processing system uses directional parameters while operating in the second operation mode, and wherein the directional parameters are determined based on a direction of arrival of a portion of the sound corresponding to the utterance.

22. The method of claim 21 , further comprising after changing the operational mode of the signal processing system to the second operational mode and based on detection of an end of a voice input, changing the operational mode of the signal processing system from the second operational mode to the first operational mode.

23. The method of claim 21 , further comprising generating a processed signal based on a direction of arrival (DOA) of the portion of the sound corresponding to the utterance and providing the processed signal to a voice recognition system.

24. The method of claim 21 , further comprising, after detecting the keyword, sending a signal encoding a voice input to a voice recognition device via a network.

25. The method of claim 21 , further comprising: generating, at the signal processing system, a plurality of directional processed audio signals including the at least one processed audio signal, each directional processed audio signal of the plurality of directional processed audio signals encoding sound from a portion of the far-field acoustic environment associated with a corresponding direction; and performing, at the keyword detection system, the keyword detection based on each directional processed audio signal of the plurality of directional processed audio signals.

26. The method of claim 25 , further comprising: determining, at the keyword detection system, that at least one directional processed audio signal of the plurality of directional processed audio signals encodes the portion of the sound corresponding to the utterance; and sending a control signal, from the keyword detection system to the signal processing system, a control signal indicating direction of arrival information associated with the portion of the sound corresponding to the utterance based on which directional processed audio signal of the plurality of directional processed audio signals encodes the portion of the sound corresponding to the utterance, wherein the operational mode of the signal processing system is changed based on the control signal.

27. The method of claim 26 , further comprising, while the signal processing system is operating in the second operational mode, combining two or more of the plurality of directional processed audio signals to form a raked output based on the control signal.

28. An apparatus for processing sound of a far-field acoustic environment, the apparatus comprising: means for generating multiple audio signals, each audio signal of the multiple audio signals generated based on the sound of the far-field acoustic environment; means for processing the multiple audio signals to generate at least one processed audio signal; means for keyword detection to determine, based on the at least one processed audio signal, whether the sound includes an utterance corresponding to a keyword; and means for changing an operational mode of the means for processing the multiple audio signals, the means for changing the operational mode configured to change the operational mode of the means for processing the multiple audio signals from a first operational mode to a second operational mode based on a determination that the utterance corresponds to the keyword by the means for keyword detection, wherein the means for processing the multiple audio signals is configured to update one or more processing parameters while operating in a first operational mode and is configured to use a static version of the one or more processing parameters while operating in the second operational mode, wherein the means for processing uses directional parameters while operating in the second operation mode, and wherein the directional parameters are determined based on a direction of arrival of a portion of the sound corresponding to the utterance.

29. The apparatus of claim 28 , wherein the means for processing the multiple audio signals is configured to generate a plurality of directional processed audio signals including the at least one processed audio signal, each directional processed audio signal of the plurality of directional processed audio signals encoding sound from a portion of the far-field acoustic environment associated with a corresponding direction, and wherein the means for keyword detection is further configured to determine a confidence metric for each directional processed audio signal of the plurality of directional processed audio signals and to generate, based on which confidence metrics satisfy a confidence threshold, a control signal associated with changing the operational mode of the means for processing the multiple audio signals.

30. A non-transitory computer-readable medium storing instructions for processing sound of a far-field acoustic environment, the instructions executable by a processor to cause the processor to perform operations comprising: obtaining multiple audio signals, each audio signal of the multiple audio signals generated by a respective microphone of multiple microphones based on the sound of the far-field acoustic environment as detected at the respective microphone; processing the multiple audio signals to generate at least one processed audio signal; performing keyword detection based on the at least one processed audio signal to determine whether the sound includes an utterance corresponding to a keyword; and changing an operational mode of a signal processing system from a first operational mode to a second operational mode based on a determination that the utterance corresponds to the keyword, wherein the signal processing system updates one or more processing parameters while operating in a first operational mode and uses a static version of the one or more processing parameters while operating in the second operational mode, wherein the signal processing system uses directional parameters while operating in the second operation mode, and wherein the directional parameters are determined based on a direction of arrival of a portion of the sound corresponding to the utterance.

Patent Metadata

Filing Date

Unknown

Publication Date

October 1, 2019

Inventors

Lae-Hoon Kim

Erik Visser

Asif Mohammad

Ian Ernan Liu

Ye Jiang

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search