10431211

Directional Processing of Far-Field Audio

PublishedOctober 1, 2019
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
30 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. An apparatus comprising: multiple microphones configured to generate multiple audio signals, each microphone of the multiple microphones configured to generate a respective audio signal of the multiple audio signals based on sound of a far-field acoustic environment as detected at the microphone; a signal processing system configured to process the multiple audio signals to generate at least one processed audio signal, the signal processing system configured to update one or more processing parameters while operating in a first operational mode and configured to use a static version of the one or more processing parameters while operating in a second operational mode; and a keyword detection system configured to perform keyword detection based on the at least one processed audio signal to determine whether the sound includes an utterance corresponding to a keyword and, based on a determination that the utterance corresponds to the keyword, to send a control signal to the signal processing system to change an operational mode of the signal processing system from the first operational mode to the second operational mode, wherein the signal processing system is configured to use directional parameters while operating in the second operation mode, and wherein the directional parameters are determined based on a direction of arrival of a portion of the sound corresponding to the utterance.

Plain English Translation

This invention relates to an audio processing apparatus designed for far-field sound detection, particularly for voice-activated systems. The apparatus includes multiple microphones that capture audio signals from a distant acoustic environment. A signal processing system processes these signals to generate at least one processed audio signal. The system operates in two modes: a first mode where processing parameters are dynamically updated, and a second mode where static parameters are used. A keyword detection system analyzes the processed audio to identify keyword utterances. Upon detecting a keyword, the system sends a control signal to switch the signal processing system from the first mode to the second mode. In the second mode, the system applies directional parameters derived from the direction of the sound source corresponding to the keyword utterance. This ensures that subsequent audio processing focuses on the direction of the speaker, improving accuracy and reducing interference from other sounds. The invention enhances voice recognition in noisy or multi-speaker environments by dynamically adapting to the speaker's location.

Claim 2

Original Legal Text

2. The apparatus of claim 1 , wherein the signal processing system is configured to generate a processed signal based on a direction of arrival (DOA) of the portion of the sound corresponding to the utterance and is configured to provide the processed signal to a voice recognition system.

Plain English Translation

This invention relates to signal processing systems for enhancing voice recognition in noisy environments. The apparatus includes a signal processing system that processes audio signals containing speech utterances. The system is configured to determine the direction of arrival (DOA) of the sound corresponding to the utterance, which helps isolate the speech signal from background noise. By analyzing the DOA, the system generates a processed signal that emphasizes the speech while suppressing unwanted noise. This processed signal is then provided to a voice recognition system, improving its accuracy in recognizing the spoken utterance. The apparatus may also include an array of microphones to capture the audio signals, and the signal processing system may apply beamforming techniques to focus on the direction of the speaker. The invention addresses the challenge of accurate voice recognition in environments with multiple sound sources or high ambient noise by leveraging spatial audio processing to enhance the target speech signal.

Claim 3

Original Legal Text

3. The apparatus of claim 1 , wherein the signal processing system includes a noise cancellation system configured to reduce a noise component of the far-field acoustic environment.

Plain English Translation

This invention relates to an apparatus for processing acoustic signals in a far-field environment, addressing the challenge of noise interference in such settings. The apparatus includes a signal processing system designed to enhance audio quality by mitigating noise from the surrounding acoustic environment. The noise cancellation system within the signal processing system actively reduces unwanted noise components, improving signal clarity and intelligibility. The apparatus may also incorporate a microphone array to capture acoustic signals from distant sources, with beamforming techniques to focus on desired sound sources while suppressing background noise. Additionally, the system may include adaptive filtering to dynamically adjust noise cancellation parameters based on real-time environmental conditions. The noise cancellation system may employ algorithms such as spectral subtraction, adaptive noise cancellation, or machine learning-based approaches to identify and suppress noise. The apparatus is particularly useful in applications like conference rooms, smart home devices, or public address systems where far-field audio capture is critical. By reducing noise interference, the invention enhances the accuracy and reliability of speech recognition, communication, and audio recording in noisy environments.

Claim 4

Original Legal Text

4. The apparatus of claim 3 , wherein the one or more processing parameters include noise cancellation parameters, and wherein the noise cancellation system is configured to use adaptive noise cancellation parameters while operating in the first operational mode and is configured to use static noise cancellation parameters while operating in the second operational mode.

Plain English Translation

This invention relates to an apparatus for noise cancellation in audio systems, addressing the challenge of optimizing noise reduction performance under varying operational conditions. The apparatus includes a noise cancellation system that dynamically adjusts its parameters based on the operational mode of the device. In a first operational mode, the system employs adaptive noise cancellation parameters, which continuously adjust to changing environmental noise conditions for real-time optimization. In a second operational mode, the system uses static noise cancellation parameters, which remain fixed to ensure consistent performance under stable conditions. The apparatus may also include a mode selection module that determines the operational mode based on factors such as user input, environmental conditions, or system performance metrics. The noise cancellation system processes audio signals to reduce unwanted noise, enhancing audio clarity for applications like headphones, communication devices, or audio recording systems. The adaptive and static parameter approaches balance responsiveness and stability, improving user experience across different scenarios.

Claim 5

Original Legal Text

5. The apparatus of claim 1 , wherein the signal processing system includes a beamformer, wherein the one or more processing parameters include beamforming parameters, and the beamformer is configured to use adaptive beamforming parameters while operating in the first operational mode and is configured to use static beamforming parameters while operating in the second operational mode.

Plain English Translation

This invention relates to signal processing systems, particularly for adaptive and static beamforming in communication or sensing devices. The system addresses the challenge of balancing computational efficiency and signal quality by dynamically adjusting beamforming parameters based on operational modes. The apparatus includes a signal processing system with a beamformer that processes signals from an array of transducers or antennas. The beamformer applies beamforming parameters to focus or steer the signal direction, improving signal quality or spatial resolution. The system operates in at least two modes: a first mode using adaptive beamforming parameters and a second mode using static beamforming parameters. Adaptive beamforming dynamically adjusts parameters in real-time to optimize performance, such as tracking moving targets or compensating for environmental changes. Static beamforming uses fixed parameters, reducing computational load while maintaining acceptable performance for stable conditions. The beamformer switches between these modes based on system requirements, such as power constraints, processing capacity, or environmental conditions. This dual-mode approach allows the system to prioritize either computational efficiency or signal accuracy as needed. The invention is applicable in radar, sonar, medical imaging, or wireless communication systems where adaptive and static beamforming are used to enhance signal processing flexibility.

Claim 6

Original Legal Text

6. The apparatus of claim 5 , wherein the static beamforming parameters are determined based on the direction of arrival of the portion of the sound corresponding to the utterance.

Plain English Translation

This invention relates to audio processing systems, specifically apparatuses for enhancing speech recognition in noisy environments. The problem addressed is the difficulty of accurately capturing and processing speech signals when interference from background noise or other sound sources is present. The apparatus includes a microphone array configured to receive sound, including an utterance from a speaker, and a processor that processes the received sound to isolate and enhance the utterance. The processor applies static beamforming parameters to the microphone array to focus on the direction of the utterance's sound source. These parameters are determined based on the direction of arrival (DOA) of the sound corresponding to the utterance, allowing the system to dynamically adjust its focus to prioritize the speaker's voice over other sounds. The apparatus may also include additional components, such as a noise reduction module, to further improve speech clarity. The invention aims to improve speech recognition accuracy in environments with multiple sound sources by leveraging directional audio processing.

Claim 7

Original Legal Text

7. The apparatus of claim 1 , wherein the signal processing system includes a nullformer, wherein the one or more processing parameters include nullforming parameters, and the nullformer is configured to use adaptive nullforming parameters while operating in the first operational mode and is configured to use static nullforming parameters while operating in the second operational mode.

Plain English Translation

This invention relates to signal processing systems, particularly those used in adaptive and static nullforming applications. The system addresses the challenge of dynamically adjusting signal processing parameters to optimize performance in different operational modes. In a first operational mode, the system employs adaptive nullforming parameters to actively suppress or enhance specific signal components based on real-time conditions. This mode is useful in environments where signal characteristics change frequently, such as in interference mitigation or beamforming applications. In a second operational mode, the system uses static nullforming parameters, which remain fixed to maintain consistent performance in stable conditions. The transition between modes allows the system to balance adaptability and stability, ensuring efficient signal processing across varying scenarios. The nullformer component within the signal processing system is responsible for applying these parameters, whether adaptive or static, to achieve the desired signal suppression or enhancement. This dual-mode approach enhances flexibility and reliability in applications requiring precise signal control.

Claim 8

Original Legal Text

8. The apparatus of claim 1 , wherein the signal processing system includes a beamformer, wherein the beamformer is configured to use adaptive beamforming parameters while operating in the first operational mode and configured to use static beamforming parameters while operating in the second operational mode.

Plain English Translation

This invention relates to signal processing systems, particularly for adaptive and static beamforming in different operational modes. The system addresses the challenge of optimizing signal reception by dynamically adjusting beamforming parameters based on operational conditions. In a first operational mode, the system employs adaptive beamforming, where beamforming parameters are continuously adjusted to enhance signal quality, such as improving signal-to-noise ratio or mitigating interference. In a second operational mode, the system uses static beamforming, where fixed beamforming parameters are applied to maintain consistent performance under stable conditions. The beamformer within the signal processing system switches between these modes to balance flexibility and stability. Adaptive beamforming is useful in environments with varying interference or signal conditions, while static beamforming ensures reliability in predictable scenarios. The invention improves signal processing efficiency by tailoring beamforming strategies to specific operational needs, reducing computational overhead in static mode while maximizing adaptability in dynamic mode. This approach is applicable in wireless communications, radar systems, and other fields requiring precise signal control.

Claim 9

Original Legal Text

9. The apparatus of claim 1 , further comprising an acoustic environment analyzer to generate data descriptive of the far-field acoustic environment and to provide the data descriptive of the far-field acoustic environment to at least one of the keyword detection system or a voice recognition system, the data based on a noise signal corresponding to a noise component of the far-field acoustic environment.

Plain English Translation

This invention relates to acoustic signal processing systems, specifically improving voice recognition and keyword detection in noisy environments. The apparatus includes an acoustic environment analyzer that captures and processes far-field acoustic signals to generate data describing the acoustic environment. This data is derived from noise signals representing the noise component of the far-field environment. The analyzer provides this noise data to either a keyword detection system or a voice recognition system to enhance their performance in distinguishing speech from background noise. The keyword detection system identifies specific spoken keywords within the acoustic environment, while the voice recognition system converts spoken language into text or commands. By supplying the noise data, the analyzer helps these systems filter out noise and improve accuracy in recognizing speech. The apparatus is designed for applications where far-field audio capture is necessary, such as smart speakers, virtual assistants, or conference systems, where ambient noise can interfere with accurate voice processing. The invention focuses on mitigating noise interference to enhance the reliability of voice-based interactions in real-world settings.

Claim 10

Original Legal Text

10. The apparatus of claim 9 , wherein the data descriptive of the far-field acoustic environment includes a signal-to-noise ratio, a noise type indicator, or a combination thereof.

Plain English Translation

This invention relates to apparatuses for analyzing far-field acoustic environments, particularly in systems where accurate audio processing is critical, such as in communication devices, speech recognition systems, or environmental monitoring. The problem addressed is the need to improve audio processing by better characterizing the acoustic environment, which can be affected by noise, interference, or other environmental factors that degrade signal quality. The apparatus includes components for capturing and processing acoustic data from a far-field environment, where the sound source is distant from the recording device. The apparatus determines data descriptive of the far-field acoustic environment, which may include metrics such as signal-to-noise ratio (SNR) or indicators of noise type (e.g., ambient noise, speech interference, or mechanical noise). These metrics help assess the quality and reliability of the captured audio signals. By analyzing these factors, the apparatus can enhance audio processing, such as noise suppression, beamforming, or adaptive filtering, to improve speech intelligibility or signal clarity in noisy conditions. The apparatus may also adjust processing parameters dynamically based on the environmental data to optimize performance. This approach ensures that audio systems can adapt to varying acoustic conditions, improving functionality in real-world applications.

Claim 11

Original Legal Text

11. The apparatus of claim 1 , further comprising a voice recognition system to analyze a voice input and to initiate an action based on speech content of the voice input.

Plain English Translation

This invention relates to an apparatus that includes a voice recognition system designed to analyze voice inputs and trigger actions based on the speech content. The apparatus is part of a broader system that monitors and controls environmental conditions, such as temperature, humidity, and lighting, within a defined space. The voice recognition system processes spoken commands or queries to adjust these environmental parameters or perform other predefined actions. For example, a user may verbally request a temperature adjustment, and the system will interpret the command and execute the necessary changes through connected devices like thermostats or HVAC systems. The voice recognition system may also integrate with other components of the apparatus, such as sensors or actuators, to ensure accurate and responsive control. The invention aims to enhance user convenience by enabling hands-free interaction with environmental control systems, reducing the need for manual adjustments and improving overall efficiency. The system may also include features like natural language processing to better understand and respond to varied speech patterns and accents, ensuring broader accessibility.

Claim 12

Original Legal Text

12. The apparatus of claim 11 , wherein the voice recognition system is configured to generate a second control signal to change the operational mode of the signal processing system from the second operational mode to the first operational mode responsive to detection of an end of the voice input.

Plain English Translation

This invention relates to a voice-controlled signal processing system designed to dynamically adjust its operational modes based on voice input. The system addresses the challenge of efficiently managing signal processing resources by automatically transitioning between different operational modes in response to voice commands and their termination. The apparatus includes a voice recognition system and a signal processing system with at least two operational modes. The voice recognition system detects the start of a voice input and generates a first control signal to switch the signal processing system from a first operational mode to a second operational mode. The second operational mode is optimized for processing voice-related signals, such as enhancing voice clarity or prioritizing voice data. Upon detecting the end of the voice input, the voice recognition system generates a second control signal to revert the signal processing system back to the first operational mode, which may be optimized for other types of signal processing tasks. This dynamic mode switching ensures that the signal processing system allocates resources efficiently, improving performance and reducing unnecessary power consumption. The invention is particularly useful in applications where voice commands are used to control devices or systems, such as smart home devices, voice assistants, or communication systems. The automatic transition between modes enhances user experience by minimizing delays and ensuring optimal processing for both voice and non-voice signals.

Claim 13

Original Legal Text

13. The apparatus of claim 1 , further comprising a network interface coupled to the signal processing system and configured to, after the keyword detection system detects the keyword, send a signal encoding a voice input to a voice recognition device via a network.

Plain English Translation

This invention relates to a signal processing apparatus for detecting keywords in audio signals and transmitting voice inputs to a voice recognition system. The apparatus includes a signal processing system that receives an audio signal and processes it to detect a predefined keyword. Upon detection, the apparatus sends a signal encoding the voice input to a voice recognition device over a network. The signal processing system may include an analog-to-digital converter to convert the audio signal into a digital format, a noise reduction module to filter out background noise, and a keyword detection system to identify the keyword in the processed audio signal. The keyword detection system may use pattern matching, machine learning, or other techniques to recognize the keyword. The network interface ensures secure and efficient transmission of the voice input to the voice recognition device, enabling further processing such as speech-to-text conversion or command execution. This system is useful in applications like voice-controlled devices, smart assistants, and automated customer service systems, where accurate keyword detection and reliable transmission of voice data are critical. The invention improves upon existing systems by integrating robust noise reduction and efficient network communication to enhance accuracy and responsiveness in voice recognition applications.

Claim 14

Original Legal Text

14. The apparatus of claim 1 , wherein the signal processing system is configured to generate a plurality of directional processed audio signals including the at least one processed audio signal, each directional processed audio signal of the plurality of directional processed audio signals encoding sound from a portion of the far-field acoustic environment associated with a corresponding direction, and wherein the keyword detection system is configured to perform the keyword detection based on each directional processed audio signal of the plurality of directional processed audio signals.

Plain English Translation

This invention relates to audio processing systems designed to enhance keyword detection in far-field acoustic environments. The system addresses the challenge of accurately detecting spoken keywords in noisy or multi-directional sound fields, where traditional microphone arrays may struggle to isolate relevant audio sources. The apparatus includes a signal processing system that generates multiple directional processed audio signals from a far-field acoustic environment. Each directional signal encodes sound from a specific portion of the environment, corresponding to a particular direction. This directional processing allows the system to spatially separate audio sources, improving the clarity of sound from different directions. A keyword detection system then analyzes each directional signal independently to identify keywords, enhancing detection accuracy by focusing on specific sound sources rather than a mixed audio input. The system may also include a microphone array to capture the far-field acoustic environment, with the signal processing system applying beamforming techniques to generate the directional signals. By processing audio from multiple directions, the system can better distinguish keywords from background noise or interfering sounds, improving performance in environments with multiple speakers or ambient noise. The invention enables more reliable voice command recognition in smart devices, virtual assistants, and other applications requiring far-field audio processing.

Claim 15

Original Legal Text

15. The apparatus of claim 14 , wherein the keyword detection system sends the control signal to the signal processing system based on determining that at least one directional processed audio signal of the plurality of directional processed audio signals encodes the portion of the sound corresponding to the utterance, and wherein the control signal indicates a direction of arrival associated with the portion of the sound corresponding to the utterance based on which directional processed audio signal of the plurality of directional processed audio signals encodes the portion of the sound corresponding to the utterance.

Plain English Translation

This invention relates to audio processing systems that enhance speech recognition by identifying the direction of a speaker's voice. The system addresses the challenge of accurately capturing and processing speech in noisy environments where multiple sound sources may be present. The apparatus includes a microphone array that captures sound from multiple directions and a signal processing system that generates directional processed audio signals by applying beamforming techniques to isolate sounds from specific directions. A keyword detection system analyzes these directional signals to identify utterances containing predefined keywords. When a keyword is detected, the system sends a control signal to the signal processing system, specifying the direction of the detected utterance. This direction is determined by identifying which of the directional processed audio signals contains the keyword. The control signal enables the system to prioritize or further process the audio from the direction of the speaker, improving speech recognition accuracy in multi-source environments. The invention enhances voice-controlled devices by focusing on the most relevant audio source, reducing interference from background noise or other speakers.

Claim 16

Original Legal Text

16. The apparatus of claim 15 , wherein the keyword detection system is configured to determine a confidence metric for each directional processed audio signal of the plurality of directional processed audio signals, and wherein the control signal is generated based on which confidence metrics satisfy a confidence threshold.

Plain English Translation

This invention relates to audio processing systems, specifically for directional audio signal detection and confidence-based control. The system addresses the challenge of accurately identifying and processing audio signals from specific directions in noisy environments, ensuring reliable operation in applications like voice recognition, surveillance, or smart devices. The apparatus includes a microphone array that captures audio signals from multiple directions. These signals are processed to generate a plurality of directional processed audio signals, each corresponding to a distinct direction. A keyword detection system analyzes these directional signals to identify keywords or specific audio patterns. The system calculates a confidence metric for each directional signal, representing the likelihood that the detected keyword is accurate. These confidence metrics are compared against a predefined confidence threshold. The apparatus generates a control signal based on which directional signals meet or exceed this threshold, enabling selective activation or response to audio inputs from specific directions. This ensures that only high-confidence audio detections trigger further actions, improving system reliability in noisy or multi-source environments. The invention enhances directional audio processing by dynamically filtering out low-confidence detections, reducing false positives, and improving overall system performance.

Claim 17

Original Legal Text

17. The apparatus of claim 16 , wherein the signal processing system is configured to provide each directional processed audio signal of the plurality of directional processed audio signals to the keyword detection system while operating in the first operational mode and is configured to provide only a subset of the directional processed audio signals to a voice recognition system while operating in the second operational mode.

Plain English Translation

This invention relates to audio processing systems designed to enhance directional audio capture and processing for applications such as voice recognition and keyword detection. The system addresses the challenge of efficiently processing multiple directional audio signals in different operational modes to optimize computational resources and accuracy. The apparatus includes an array of microphones configured to capture audio from multiple directions, generating a plurality of directional audio signals. A signal processing system processes these signals to produce directional processed audio signals, which are then provided to a keyword detection system and a voice recognition system. The system operates in at least two modes: a first mode where all directional processed audio signals are provided to the keyword detection system, and a second mode where only a subset of these signals is provided to the voice recognition system. This selective processing reduces computational load while maintaining accuracy for voice recognition tasks. The apparatus may also include a beamforming system to further enhance directional audio capture, and a mode selection system to dynamically switch between operational modes based on system requirements or environmental conditions. The invention improves efficiency and performance in audio processing applications by dynamically allocating resources between keyword detection and voice recognition functions.

Claim 18

Original Legal Text

18. The apparatus of claim 17 , wherein the subset of the directional processed audio signals corresponds to particular directional processed audio signals that are associated with confidence metrics that satisfy the confidence threshold.

Plain English Translation

This invention relates to audio signal processing, specifically a system for selecting and processing directional audio signals based on confidence metrics. The problem addressed is the need to filter or prioritize audio signals from different directions to improve audio clarity or focus on relevant sound sources. The apparatus includes a directional audio processing system that generates multiple directional processed audio signals from an array of microphones. Each directional signal is associated with a confidence metric indicating the reliability or relevance of the signal. The apparatus further includes a selection module that filters or selects a subset of these directional signals based on a confidence threshold. Only signals with confidence metrics meeting or exceeding the threshold are retained, while others are discarded or attenuated. This ensures that only high-confidence directional audio signals are used for further processing, such as beamforming, noise suppression, or speech recognition. The confidence metrics may be derived from signal-to-noise ratios, source localization accuracy, or other reliability indicators. The system improves audio quality by focusing on the most reliable directional signals, reducing interference from low-confidence sources. This approach is useful in applications like voice assistants, conference systems, or hearing aids where directional audio separation is critical.

Claim 19

Original Legal Text

19. The apparatus of claim 1 , wherein the control signal indicates direction of arrival information associated with the portion of the sound corresponding to the utterance.

Plain English Translation

This invention relates to sound processing systems, specifically apparatuses for analyzing and processing sound signals to extract directional information from spoken utterances. The problem addressed is the need to accurately determine the direction from which a sound, such as a spoken utterance, originates in an environment where multiple sound sources may be present. Traditional systems often struggle with distinguishing between overlapping sounds or accurately pinpointing the source of a specific utterance. The apparatus includes a sound processing system configured to receive a sound signal containing an utterance and generate a control signal. The control signal carries direction of arrival (DOA) information, which indicates the spatial origin of the utterance within the sound signal. This DOA information helps identify the direction from which the utterance was detected, enabling applications such as voice command systems, speech recognition, or audio beamforming to focus on the correct sound source. The system may use techniques like beamforming, time-difference-of-arrival (TDOA), or other spatial processing methods to determine the DOA. The apparatus may also include additional components, such as microphones or signal processors, to enhance the accuracy of the DOA estimation. By providing precise directional data, the invention improves the reliability of systems that rely on sound source localization, particularly in noisy or multi-source environments.

Claim 20

Original Legal Text

20. The apparatus of claim 14 , wherein the signal processing system includes a beamformer and a rake combiner coupled to the beamformer, wherein the beamformer is configured to generate the plurality of directional processed audio signals and the rake combiner is configured to, while operating in the second operational mode, combine two or more of the plurality of directional processed audio signals to form a raked beamformer output based on the control signal.

Plain English Translation

This invention relates to signal processing systems for audio applications, specifically improving directional audio capture and processing. The problem addressed is enhancing audio quality in environments with multiple sound sources by dynamically adjusting signal processing to optimize directional audio output. The apparatus includes a signal processing system with a beamformer and a rake combiner. The beamformer generates multiple directional processed audio signals from input audio data, each representing sound from different directions. The rake combiner operates in multiple modes, including a second operational mode where it combines two or more of the directional processed audio signals to form a raked beamformer output. This combination is based on a control signal that determines which signals to combine and how, improving signal clarity and reducing interference. The system dynamically adapts to environmental conditions by selecting and combining the most relevant directional signals, enhancing audio quality in noisy or multi-source environments. The beamformer and rake combiner work together to provide flexible, high-quality directional audio processing.

Claim 21

Original Legal Text

21. A method for processing sound of a far-field acoustic environment, the method comprising: obtaining multiple audio signals, each audio signal of the multiple audio signals generated by a respective microphone of multiple microphones based on the sound of the far-field acoustic environment as detected at the respective microphone; processing, at a signal processing system, the multiple audio signals to generate at least one processed audio signal; performing keyword detection, at a keyword detection system, based on the at least one processed audio signal to determine whether the sound includes an utterance corresponding to a keyword; and changing an operational mode of the signal processing system from a first operational mode to a second operational mode based on a determination that the utterance corresponds to the keyword, wherein the signal processing system updates one or more processing parameters while operating in the first operational mode and uses a static version of the one or more processing parameters while operating in the second operational mode, wherein the signal processing system uses directional parameters while operating in the second operation mode, and wherein the directional parameters are determined based on a direction of arrival of a portion of the sound corresponding to the utterance.

Plain English Translation

This invention relates to processing sound in a far-field acoustic environment, addressing challenges in accurately detecting and responding to spoken keywords from distant sound sources. The method involves capturing multiple audio signals from an array of microphones, each detecting sound from the environment. These signals are processed by a signal processing system to generate at least one processed audio signal. A keyword detection system analyzes the processed signal to identify if the sound contains an utterance matching a predefined keyword. Upon detecting such an utterance, the system transitions from a first operational mode to a second operational mode. In the first mode, the signal processing system dynamically updates its processing parameters, such as adaptive filters or beamforming weights, to optimize for general sound capture. In the second mode, the system locks these parameters into a static version, ensuring stability during keyword processing. Additionally, the second mode employs directional parameters derived from the direction of arrival of the keyword utterance, enhancing focus on the sound source. This approach improves keyword detection accuracy and reduces interference from background noise or other sound sources in far-field scenarios.

Claim 22

Original Legal Text

22. The method of claim 21 , further comprising after changing the operational mode of the signal processing system to the second operational mode and based on detection of an end of a voice input, changing the operational mode of the signal processing system from the second operational mode to the first operational mode.

Plain English Translation

This invention relates to signal processing systems, specifically methods for dynamically adjusting operational modes in response to voice input. The problem addressed is the need for efficient and responsive signal processing in systems that handle voice commands or interactions, ensuring optimal performance during active voice input while conserving resources when no input is detected. The method involves a signal processing system that operates in at least two distinct modes: a first operational mode optimized for general processing tasks and a second operational mode optimized for handling voice input. When a voice input is detected, the system transitions from the first mode to the second mode, which is configured to prioritize voice processing tasks, such as speech recognition or audio enhancement. After the voice input ends, the system automatically reverts to the first operational mode, restoring the system to its default state for non-voice processing tasks. The transition between modes is triggered by detecting the start and end of voice input, ensuring that the system dynamically adapts to the presence or absence of voice commands. This approach improves efficiency by allocating resources appropriately during voice interactions while minimizing unnecessary processing overhead when no voice input is present. The method is particularly useful in devices where power consumption and processing efficiency are critical, such as mobile devices, smart assistants, or embedded systems.

Claim 23

Original Legal Text

23. The method of claim 21 , further comprising generating a processed signal based on a direction of arrival (DOA) of the portion of the sound corresponding to the utterance and providing the processed signal to a voice recognition system.

Plain English Translation

This invention relates to audio processing systems for improving voice recognition in noisy environments. The problem addressed is the difficulty of accurately capturing and processing speech in the presence of background noise, interference, or multiple sound sources. The invention provides a method for enhancing speech recognition by analyzing the direction of arrival (DOA) of sound waves to isolate and process the relevant speech signal. The method involves capturing audio signals from multiple microphones, which may be arranged in an array or distributed configuration. The system identifies a portion of the sound corresponding to a user's utterance by analyzing the audio signals to determine the direction from which the sound originates. This direction of arrival (DOA) information is used to distinguish the speech signal from other sounds in the environment. The system then generates a processed signal by applying spatial filtering or beamforming techniques to emphasize the sound coming from the identified direction while attenuating sounds from other directions. This processed signal, which has an improved signal-to-noise ratio, is then provided to a voice recognition system for further analysis. The method may also include additional steps such as noise suppression, echo cancellation, or adaptive filtering to further enhance the quality of the processed signal. The system can dynamically adjust the processing parameters based on changes in the acoustic environment or the position of the sound source. This approach improves the accuracy and reliability of voice recognition in real-world scenarios where background noise and interference are common.

Claim 24

Original Legal Text

24. The method of claim 21 , further comprising, after detecting the keyword, sending a signal encoding a voice input to a voice recognition device via a network.

Plain English Translation

This invention relates to voice recognition systems and methods for improving keyword detection and processing in voice-based interfaces. The problem addressed is the need for efficient and accurate handling of voice inputs, particularly after detecting specific keywords that trigger further processing. The method involves detecting a keyword in a voice input, which may be part of a larger voice command or query. After detection, the system sends a signal encoding the voice input to a voice recognition device via a network. This allows the voice recognition device to process the input further, such as converting speech to text or executing a command. The method may also include receiving the voice input from a microphone or another input device and preprocessing the input to enhance signal quality before keyword detection. The preprocessing may involve noise reduction, filtering, or normalization to improve accuracy. The system may also verify the detected keyword to reduce false positives before sending the signal to the voice recognition device. The network used for transmission can be a wired or wireless network, depending on the application. The voice recognition device may be a remote server or a local processing unit, depending on the system architecture. The method ensures reliable and efficient voice input processing in various applications, such as virtual assistants, smart home devices, or automotive voice control systems.

Claim 25

Original Legal Text

25. The method of claim 21 , further comprising: generating, at the signal processing system, a plurality of directional processed audio signals including the at least one processed audio signal, each directional processed audio signal of the plurality of directional processed audio signals encoding sound from a portion of the far-field acoustic environment associated with a corresponding direction; and performing, at the keyword detection system, the keyword detection based on each directional processed audio signal of the plurality of directional processed audio signals.

Plain English Translation

This invention relates to audio signal processing and keyword detection in far-field acoustic environments. The problem addressed is the difficulty of accurately detecting keywords in noisy or multi-directional sound fields, where traditional systems may struggle to isolate relevant audio sources. The method involves processing audio signals from a far-field acoustic environment to enhance keyword detection. A signal processing system generates multiple directional processed audio signals, each encoding sound from a specific portion of the environment corresponding to a particular direction. This directional processing helps isolate audio sources based on their spatial origin. A keyword detection system then analyzes each directional processed audio signal independently to identify keywords. By evaluating multiple directional signals, the system improves the accuracy and reliability of keyword detection in complex acoustic scenarios, such as rooms with multiple speakers or background noise. The directional processing step ensures that the keyword detection system can focus on relevant sound sources, reducing interference from unrelated audio. This approach enhances performance in environments where traditional omnidirectional systems may fail to distinguish between different sound sources. The method is particularly useful in applications like smart home devices, voice assistants, and conference systems where precise audio source identification is critical.

Claim 26

Original Legal Text

26. The method of claim 25 , further comprising: determining, at the keyword detection system, that at least one directional processed audio signal of the plurality of directional processed audio signals encodes the portion of the sound corresponding to the utterance; and sending a control signal, from the keyword detection system to the signal processing system, a control signal indicating direction of arrival information associated with the portion of the sound corresponding to the utterance based on which directional processed audio signal of the plurality of directional processed audio signals encodes the portion of the sound corresponding to the utterance, wherein the operational mode of the signal processing system is changed based on the control signal.

Plain English Translation

This invention relates to audio processing systems that detect and localize spoken keywords in an environment. The problem addressed is accurately identifying the direction of a speaker's voice to improve keyword detection and system responsiveness. The system includes a microphone array that captures sound from multiple directions and a signal processing system that processes these signals to enhance audio quality. A keyword detection system analyzes the processed audio signals to identify utterances containing predefined keywords. When a keyword is detected, the system determines which directional audio signal contains the utterance by analyzing direction of arrival (DOA) information. The system then sends a control signal to the signal processing system, indicating the direction of the speaker. This control signal adjusts the operational mode of the signal processing system, such as focusing audio processing on the detected direction to improve keyword detection accuracy and reduce interference from other sound sources. The system dynamically adapts to the speaker's location, enhancing performance in noisy or multi-speaker environments.

Claim 27

Original Legal Text

27. The method of claim 26 , further comprising, while the signal processing system is operating in the second operational mode, combining two or more of the plurality of directional processed audio signals to form a raked output based on the control signal.

Plain English Translation

This invention relates to signal processing systems for directional audio processing, particularly in environments where adaptive beamforming or spatial filtering is used to enhance or suppress audio signals from specific directions. The problem addressed is the need to dynamically adjust audio processing based on changing environmental conditions or user preferences, such as switching between different operational modes (e.g., beamforming for speech enhancement or omnidirectional listening) and combining directional audio signals to optimize output quality. The system processes multiple input audio signals from an array of microphones, generating a plurality of directional processed audio signals. Each directional signal corresponds to a specific spatial direction or beam. The system operates in at least two modes: a first mode where individual directional signals are processed independently, and a second mode where these signals are combined. In the second mode, the system uses a control signal to dynamically select and combine two or more directional signals to form a "raked" output. This raked output enhances desired audio sources while suppressing interference, improving signal clarity in noisy environments. The control signal may be derived from user input, environmental analysis, or other system parameters, allowing real-time adaptation to changing conditions. The method ensures seamless transitions between modes and optimized audio output based on the current operational requirements.

Claim 28

Original Legal Text

28. An apparatus for processing sound of a far-field acoustic environment, the apparatus comprising: means for generating multiple audio signals, each audio signal of the multiple audio signals generated based on the sound of the far-field acoustic environment; means for processing the multiple audio signals to generate at least one processed audio signal; means for keyword detection to determine, based on the at least one processed audio signal, whether the sound includes an utterance corresponding to a keyword; and means for changing an operational mode of the means for processing the multiple audio signals, the means for changing the operational mode configured to change the operational mode of the means for processing the multiple audio signals from a first operational mode to a second operational mode based on a determination that the utterance corresponds to the keyword by the means for keyword detection, wherein the means for processing the multiple audio signals is configured to update one or more processing parameters while operating in a first operational mode and is configured to use a static version of the one or more processing parameters while operating in the second operational mode, wherein the means for processing uses directional parameters while operating in the second operation mode, and wherein the directional parameters are determined based on a direction of arrival of a portion of the sound corresponding to the utterance.

Plain English Translation

This invention relates to sound processing in far-field acoustic environments, addressing challenges in accurately detecting and responding to spoken keywords from distant sound sources. The apparatus captures multiple audio signals from the environment, each derived from the far-field sound. These signals are processed to generate at least one processed audio signal, which is then analyzed for keyword detection. If a keyword is detected, the system transitions from a first operational mode to a second operational mode. In the first mode, the processing parameters are dynamically updated, allowing for adaptive adjustments to environmental changes. In the second mode, the system uses static processing parameters, ensuring stability during keyword detection. Additionally, the second mode employs directional parameters based on the direction of arrival of the sound corresponding to the keyword, enhancing accuracy in identifying the source of the utterance. This approach improves reliability in far-field voice recognition by dynamically adapting to environmental conditions while maintaining precise directional tracking upon keyword detection.

Claim 29

Original Legal Text

29. The apparatus of claim 28 , wherein the means for processing the multiple audio signals is configured to generate a plurality of directional processed audio signals including the at least one processed audio signal, each directional processed audio signal of the plurality of directional processed audio signals encoding sound from a portion of the far-field acoustic environment associated with a corresponding direction, and wherein the means for keyword detection is further configured to determine a confidence metric for each directional processed audio signal of the plurality of directional processed audio signals and to generate, based on which confidence metrics satisfy a confidence threshold, a control signal associated with changing the operational mode of the means for processing the multiple audio signals.

Plain English Translation

This invention relates to audio processing systems designed to enhance sound capture and keyword detection in far-field acoustic environments. The system addresses the challenge of accurately identifying and processing directional audio signals in noisy or multi-source environments, where traditional methods may struggle to distinguish between relevant and irrelevant sounds. The apparatus includes multiple microphones configured to capture audio signals from different directions in the far-field environment. A processing module processes these signals to generate directional audio outputs, each representing sound from a specific portion of the environment. A keyword detection module analyzes these directional signals to determine confidence metrics, indicating the likelihood that a keyword was detected in each direction. If a confidence metric exceeds a predefined threshold, the system generates a control signal that adjusts the processing module's operational mode, such as focusing on the direction with the highest confidence or suppressing other directions. This approach improves keyword detection accuracy by leveraging spatial audio information, allowing the system to dynamically adapt to the acoustic environment. The invention is particularly useful in applications like smart speakers, voice assistants, and conference systems where precise sound localization and adaptive processing are critical.

Claim 30

Original Legal Text

30. A non-transitory computer-readable medium storing instructions for processing sound of a far-field acoustic environment, the instructions executable by a processor to cause the processor to perform operations comprising: obtaining multiple audio signals, each audio signal of the multiple audio signals generated by a respective microphone of multiple microphones based on the sound of the far-field acoustic environment as detected at the respective microphone; processing the multiple audio signals to generate at least one processed audio signal; performing keyword detection based on the at least one processed audio signal to determine whether the sound includes an utterance corresponding to a keyword; and changing an operational mode of a signal processing system from a first operational mode to a second operational mode based on a determination that the utterance corresponds to the keyword, wherein the signal processing system updates one or more processing parameters while operating in a first operational mode and uses a static version of the one or more processing parameters while operating in the second operational mode, wherein the signal processing system uses directional parameters while operating in the second operation mode, and wherein the directional parameters are determined based on a direction of arrival of a portion of the sound corresponding to the utterance.

Plain English Translation

This invention relates to sound processing in far-field acoustic environments, addressing challenges in accurately detecting and responding to spoken keywords from distant sound sources. The system uses multiple microphones to capture audio signals from the environment, which are then processed to generate at least one refined audio output. A keyword detection mechanism analyzes the processed audio to identify if a spoken utterance matches a predefined keyword. Upon detection, the system transitions from a dynamic operational mode, where processing parameters are continuously updated, to a static mode where fixed parameters are applied. In the static mode, the system employs directional parameters derived from the direction of arrival of the keyword utterance, enhancing focus on the sound source. This approach improves keyword detection accuracy and reduces computational overhead by dynamically adjusting processing strategies based on environmental conditions and user interactions. The invention is particularly useful in voice-activated devices where reliable far-field speech recognition is critical.

Patent Metadata

Filing Date

Unknown

Publication Date

October 1, 2019

Inventors

Lae-Hoon Kim
Erik Visser
Asif Mohammad
Ian Ernan Liu
Ye Jiang

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “DIRECTIONAL PROCESSING OF FAR-FIELD AUDIO” (10431211). https://patentable.app/patents/10431211

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10431211. See llms.txt for full attribution policy.