An ear-worn device may include two or more microphones configured to generate time-domain audio signals, each of the two or more microphones configured to generate one of the time-domain audio signals; processing circuitry comprising analog processing circuitry, digital processing circuitry, beamforming circuitry, and short-time Fourier transformation (STFT) circuitry, the processing circuitry configured to generate, from the time-domain audio signals, one or more frequency-domain non-beamformed audio signals and one or more frequency-domain beamformed signals; and enhancement circuitry comprising neural network circuitry configured to receive multiple frequency-domain input audio signals originating from the one or more frequency-domain non-beamformed audio signals and the one or more frequency-domain beamformed signals, and implement a single neural network trained to generate, based on the multiple frequency-domain input audio signals, a noise-reduced and spatially-focused output audio signal or an output for generating a noise-reduced and spatially-focused output audio signal.
Legal claims defining the scope of protection, as filed with the USPTO.
. An ear-worn device, comprising:
. The ear-worn device of, further comprising interpolation circuitry configured to interpolate between:
. The ear-worn device of, wherein the generating of the spatially-focused output audio signal is based on a wearer selection of a size of a front-facing spatial region.
. A system comprising:
. The system of, wherein the processing device is configured to display multiple options for the size of the front-facing spatial region.
. The system of, wherein the processing device is configured to display exactly two, exactly three, or exactly four options for the size of the front-facing spatial region.
. The system of, wherein the processing device is configured, when displaying the multiple options for the size of the front-facing spatial region, to display graphical representations of the multiple options for the size of the front-facing spatial region.
. The ear-worn device of, wherein the multiple beamformed signals include a beamformed signal having a dipole, hypercardioid, supercardioid, or cardioid directional pattern.
. The ear-worn device of, wherein the mapping of gain to direction-of-arrival is relative to a wearer of the ear-worn device.
. The ear-worn device of, wherein the mapping of gain to direction-of-arrival comprises applying a gain of 1 to audio generated from sounds coming from a target spatial region and applying a gain of 0 to audio generated from sounds coming from other spatial regions.
. The ear-worn device of, wherein the target spatial region has an angle relative to a wearer of the ear-worn device of approximately equal to 50 degrees, approximately equal to 90 degrees, or between approximately 50 and approximately 90 degrees.
. The ear-worn device of, wherein the mapping of gain to direction-of-arrival comprises mapping more than two spatial regions each to a different gain, and one or more of the spatial regions are processed with gains not equal to 1 or 0.
. The ear-worn device of, further comprising:
. The ear-worn device of, wherein the multiple beamformed signals include a front-facing beamformed signal and a back-facing beamformed signal.
. The ear-worn device of, wherein the spatially-focused output audio signal comprises speech components with different gains based on their different directions-of-arrival.
Complete technical specification and implementation details from the patent document.
This application is a Continuation of U.S. Ser. No. 18/477,087, filed Sep. 28, 2023; which claims priority to U.S. Provisional Ser. No. 63/517,755, filed Aug. 4, 2023; which are incorporated herein by reference.
The present disclosure relates to an ear-worn device, such as a hearing aid.
Hearing aids are used to help those who have trouble hearing to hear better. Typically, hearing aids amplify received sound. Some hearing aids attempt to enhance incoming sound.
According to one aspect, an ear-worn device includes: two or more microphones configured to generate time-domain audio signals, each of the two or more microphones configured to generate one of the time-domain audio signals; processing circuitry including analog processing circuitry, digital processing circuitry, beamforming circuitry, and short-time Fourier transformation (STFT) circuitry, the processing circuitry configured to generate, from the time-domain audio signals, one or more frequency-domain non-beamformed audio signals and one or more frequency-domain beamformed signals; and enhancement circuitry including neural network circuitry configured to: receive multiple frequency-domain input audio signals originating from the one or more frequency-domain non-beamformed audio signals and the one or more frequency-domain beamformed signals; and implement a single neural network trained to generate, based on the multiple frequency-domain input audio signals, a noise-reduced and spatially-focused output audio signal or an output for generating a noise-reduced and spatially-focused output audio signal.
According to one aspect, an ear-worn device, includes two or more microphones configured to generate time-domain audio signals, each of the two or more microphones configured to generate one of the time-domain audio signals; processing circuitry including analog processing circuitry, digital processing circuitry, beamforming circuitry, and short-time Fourier transformation (STFT) circuitry, the processing circuitry configured to generate, from the time-domain audio signals, either or both of one or more frequency-domain non-beamformed audio signals and one or more frequency-domain beamformed signals; and enhancement circuitry including neural network circuitry configured to: receive multiple frequency-domain input audio signals originating from either or both of the one or more frequency-domain non-beamformed audio signals and the one or more frequency-domain beamformed signals; and implement a single neural network trained to generate, based on the multiple frequency-domain input audio signals, a noise-reduced and spatially-focused output audio signal or an output for generating a noise-reduced and spatially-focused output audio signal.
According to one aspect, an ear-worn device includes two or more microphones configured to generate time-domain audio signals, each of the two or more microphones configured to generate one of the time-domain audio signals; processing circuitry including analog processing circuitry, digital processing circuitry, beamforming circuitry, and short-time Fourier transformation (STFT) circuitry, the processing circuitry configured to generate, from the time-domain audio signals, either or both of one or more frequency-domain non-beamformed audio signals and one or more frequency-domain beamformed signals; and enhancement circuitry including neural network circuitry configured to: receive multiple frequency-domain input audio signals originating from either or both of the one or more frequency-domain non-beamformed audio signals and the one or more frequency-domain beamformed signals; and implement a neural network trained to generate, based on the multiple frequency-domain input audio signals, a noise-reduced output audio signal or an output for generating a noise-reduced output audio signal.
According to one aspect, an ear-worn device includes two or more microphones configured to generate time-domain audio signals, each of the two or more microphones configured to generate one of the time-domain audio signals; processing circuitry including analog processing circuitry, digital processing circuitry, beamforming circuitry, and short-time Fourier transformation (STFT) circuitry, the processing circuitry configured to generate, from the time-domain audio signals, either or both of one or more frequency-domain non-beamformed audio signals and one or more frequency-domain beamformed signals; and enhancement circuitry including neural network circuitry configured to: receive multiple frequency-domain input audio signals originating from either or both of the one or more frequency-domain non-beamformed audio signals and the one or more frequency-domain beamformed signals; and implement a neural network trained to generate, based on the multiple frequency-domain input audio signals, a spatially-focused output audio signal or an output for generating a spatially-focused output audio signal.
According to one aspect, an ear-worn device includes two or more microphones configured to generate audio signals, each of the two or more microphones configured to generate one of the audio signals; processing circuitry including analog processing circuitry, digital processing circuitry, beamforming circuitry, the processing circuitry configured to generate, from the time-domain audio signals, either or both of one or more non-beamformed audio signals one or more beamformed signals; and enhancement circuitry including neural network circuitry configured to: receive multiple input audio signals originating from either or both of the one or more non-beamformed audio signals the one or more beamformed signals; and implement a neural network trained to generate, based on the multiple input audio signals, a noise-reduced and spatially-focused output audio signal or an output for generating a noise-reduced and spatially-focused output audio signal.
According to one aspect, an ear-worn device, includes two or more microphones configured to generate audio signals, each of the two or more microphones configured to generate one of the audio signals; processing circuitry including analog processing circuitry, digital processing circuitry, beamforming circuitry, the processing circuitry configured to generate, from the time-domain audio signals, either or both of one or more non-beamformed audio signals and one or more beamformed signals; and enhancement circuitry including neural network circuitry configured to: receive multiple input audio signals originating from either or both of the one or more non-beamformed audio signals and the one or more beamformed signals; and implement a neural network trained to generate, based on the multiple input audio signals, a noise-reduced output audio signal or an output for generating a noise-reduced output audio signal.
According to one aspect, an ear-worn device includes two or more microphones configured to generate audio signals, each of the two or more microphones configured to generate one of the audio signals; processing circuitry including analog processing circuitry, digital processing circuitry, beamforming circuitry, the processing circuitry configured to generate, from the time-domain audio signals, either or both of one or more non-beamformed audio signals and one or more beamformed signals; and enhancement circuitry including neural network circuitry configured to: receive multiple input audio signals originating from either or both of the one or more non-beamformed audio signals and the one or more beamformed signals; and implement a neural network trained to generate, based on the multiple input audio signals, a spatially-focused output audio signal or an output for generating a spatially-focused output audio signal.
According to one aspect, an ear-worn device, includes two or more microphones configured to generate time-domain audio signals, each of the two or more microphones configured to generate one of the time-domain audio signals; and enhancement circuitry including neural network circuitry configured to receive multiple frequency-domain input audio signals originating from the time-domain audio signals and implement a neural network trained to to generate, based on the multiple frequency-domain input audio signals, a noise-reduced and spatially-focused output audio signal or an output for generating a noise-reduced and spatially-focused output audio signal.
According to one aspect, an ear-worn device, includes two or more microphones configured to generate time-domain audio signals, each of the two or more microphones configured to generate one of the time-domain audio signals; enhancement circuitry including neural network circuitry configured to receive multiple frequency-domain input audio signals originating from the time-domain audio signals and implement a neural network trained to generate, based on the multiple frequency-domain input audio signals, a noise-reduced output audio signal or an output for generating a noise-reduced output audio signal.
According to one aspect, an ear-worn device, includes two or more microphones configured to generate time-domain audio signals, each of the two or more microphones configured to generate one of the time-domain audio signals; and enhancement circuitry including neural network circuitry configured to receive multiple frequency-domain input audio signals originating from the time-domain audio signals and implement a neural network trained to generate, based on the multiple frequency-domain input audio signals, a spatially-focused output audio signal or an output for generating a spatially-focused output audio signal.
According to one aspect, an ear-worn device, includes two or more microphones configured to generate microphone audio signals, each of the two or more microphones configured to generate one of the microphone audio signals; and enhancement circuitry including neural network circuitry configured to receive multiple input audio signals originating from the microphone audio signals and implement a neural network trained to generate, based on the multiple input audio signals, a noise-reduced and spatially-focused output audio signal or an output for generating a noise-reduced and spatially-focused output audio signal.
According to one aspect, an ear-worn device includes two or more microphones configured to generate microphone audio signals, each of the two or more microphones configured to generate one of the microphone audio signals; and enhancement circuitry including neural network circuitry configured to receive multiple input audio signals originating from the microphone audio signals and implement a neural network trained to generate, based on the multiple input audio signals, a noise-reduced output audio signal or an output for generating a noise-reduced output audio signal.
According to one aspect, an ear-worn device, includes two or more microphones configured to generate microphone audio signals, each of the two or more microphones configured to generate one of the microphone audio signals; and enhancement circuitry including neural network circuitry configured to receive multiple input audio signals originating from the microphone audio signals and implement a neural network trained to generate, based on the multiple input audio signals, a noise-reduced and spatially-focused output audio signal or an output for generating a noise-reduced and spatially-focused output audio signal.
In some embodiments of any of the above aspects, the one or more frequency-domain beamformed signals include a frequency-domain beamformed signal having a dipole, hypercardioid, supercardioid, or cardioid directional pattern.
In some embodiments of any of the above aspects, the two or more microphones include a front microphone and a back microphone, and the one of more frequency-domain non-beamformed audio signals include a frequency-domain non-beamformed audio signal originating from the front microphone and a frequency-domain non-beamformed audio signal originating from the back microphone.
In some embodiments of any of the above aspects, the single neural network includes a recurrent network.
In some embodiments of any of the above aspects, the ear-worn device further includes interpolation circuitry configured to interpolate between 1. one of the multiple frequency-domain input audio signals, and 2. the noise-reduced and spatially-focused output, or a processed version thereof.
In some embodiments of any of the above aspects, the noise-reduced and spatially-focused output audio signal uses a mapping of gains to respective spatial regions. In some embodiments of any of the above aspects, the mapping is predetermined. In some embodiments of any of the above aspects, the mapping of the gains to the respective spatial regions includes applying a gain of 1 to audio generated from sounds coming from a target spatial region and applying a gain of 0 to audio generated from sounds coming from other spatial regions. In some embodiments of any of the above aspects, the target spatial region has an angle relative to a wearer of the ear-worn device of approximately equal to or between 10-180 degrees. In some embodiments of any of the above aspects, the mapping of the gains to the respective spatial regions includes mapping more than two spatial regions each to a different gain, and one or more of the spatial regions are processed with gains not equal to 1 or 0. In some embodiments of any of the above aspects, the mapping is not predetermined. In some embodiments of any of the above aspects, the output for generating the noise-reduced and spatially-focused output audio signal includes a sound map indicating frequency components originating from each of multiple spatial regions. In some embodiments of any of the above aspects, the enhancement circuitry is further configured to apply a beam pattern to the sound map, and the beam pattern is based on a selection from a wearer of the ear-worn device. In some embodiments of any of the above aspects, the selection from the wearer of the ear-worn device includes a selection of a size of a front-facing spatial region to use for focusing. In some embodiments of any of the above aspects, the output for generating the noise-reduced and spatially-focused output audio signal includes values calculated for a metric from audio from the multiple beams, each of the multiple beams pointing at a different angle around a wearer of the ear-worn device, and the enhancement circuitry is configured to combine the audio from the multiple beams using the values for the metric.
In some embodiments of any of the above aspects, the neural network is trained on both captured data and synthetic data.
In some embodiments of any of the above aspects, the ear-worn device further includes an inertial measurement unit (IMU) and second processing circuitry configured to track head movements of a wearer of the ear-worn device using measurements from the IMU and cause an absolute coordinate system to be used for the spatial focusing based on the head movements.
In some embodiments of any of the above aspects, the processing circuitry is coupled between the two or more microphones and the enhancement circuitry; the analog processing circuitry is coupled between the two or more microphones and the digital processing circuitry; the digital processing circuitry is coupled between the analog processing circuitry and the beamforming circuitry; the beamforming circuitry is coupled between the digital processing circuitry and the STFT circuitry; the analog processing circuitry is configured to perform one or more of analog preamplification, analog filtering, and analog-to-digital conversion; and the digital processing circuitry is configured to perform one or more of wind reduction, input calibration, and anti-feedback processing.
In some embodiments of any of the above aspects, the processing circuitry is coupled between the two or more microphones and the enhancement circuitry; the analog processing circuitry is coupled between the two or more microphones and the digital processing circuitry; the digital processing circuitry is coupled between the analog processing circuitry and the STFT circuitry; the STFT circuitry is coupled between the digital processing circuitry and the beamforming circuitry; the analog processing circuitry is configured to perform one or more of analog preamplification, analog filtering, and analog-to-digital conversion; and the digital processing circuitry is configured to perform one or more of wind reduction, input calibration, and anti-feedback processing.
In some embodiments of any of the above aspects, the neural network circuitry is configured to output a single output based on the multiple frequency-domain input audio signals.
According to one aspect, an ear-worn device is configured to collect audio from multiple beams, each of the multiple beams pointing at a different angle around a wearer of the ear-worn device; calculate values for a metric from the audio from the multiple beams; and combine the audio from the multiple beams using the values for the metric. In some embodiments, the ear-worn device includes neural network circuitry configured to implement a neural network trained to calculate the values for the metric. In some embodiments, the metric includes signal-to-noise ratio. In some embodiments, the metric includes speaker power.
According to on aspect, an ear-worn device includes an inertial measurement unit (IMU) and second processing circuitry configured to track head movements of a wearer of the ear-worn device using measurements from the IMU and cause an absolute coordinate system to be used for the spatial focusing based on the head movements.
Reducing noise in the output of hearing aids and other ear-worn devices is a difficult challenge. Reducing noise in scenarios in which the wearer is listening to one speaker while there are other interfering speakers in the vicinity is a particularly difficult challenge. Recently, neural networks for separating speech from noise have been developed. Further description of such neural networks for reducing noise may be found in U.S. Patent App. Pub. No. US20230232169A1, titled “Method, Apparatus and System for Neural Network Hearing Aid,” published on Jul. 20, 2023 and based on an application filed on Jan. 14, 2022, which is incorporated by reference herein in its entirety.
The inventors have recognized that neural networks that accept two more input audio signals originating from two or more microphones on an ear-worn device may be better able to reduce noise and sound from interfering speakers. For general noise reduction, the inventors have recognized that if, at a previous time step, the neural network heard noise coming from a certain direction-of-arrival (DOA), the neural network may have a prior to cancel out noise from that DOA on the current time step. From another perspective, sound sources tend to move slowly with time so if the neural network has identified a particular segment of sound as speech and knows its DOA, the neural network may reasonably infer that other sounds from the same direction are also speech.
For reducing sound from interfering speakers, conventional ear-worn devices may use beamforming to focus on sound received from certain directions and attenuate sounds received from other directions. This may involve processing sounds from different microphones in different ways (e.g., applying different delays to the signals received at different microphones). Conventional beamforming (both adaptive and non-adaptive) may provide an intelligibility boost, because it may enable focusing on sounds coming from in front of the wearer (from where it is assumed that sounds of interest originate) and attenuate sounds (e.g., background noise and interfering speakers) on the sides and back of the wearer.
However, conventional beamforming patterns (e.g., cardioids, supercardioids, hypercardiods, and dipoles) may also have shortcomings, including: 1. A theoretical beamforming pattern may become warped once it is implemented by microphones placed on a behind-the-ear hearing aid, due at least in part to interference from the head, torso, and car of the wearer; this may cause performance to suffer. 2. In reverberant environments, the indirect path may come through from the front-facing direction. For example, in a reverberant room, when a speaker is talking from directly behind the wearer, that speaker's voice may reverberate all around the room and enter the hearing aid microphones from in front of the wearer; such sounds may not be attenuated by a front-facing beamforming pattern. 3. Conventional beamforming may work better on high-frequency sounds than low-frequency sounds. In other words, conventional beamforming may be better at using high-frequency sounds for sound localization than low-frequency sounds. 4. Generally, there is a limit to how much sound reduction conventional beamforming patterns can provide.
The inventors have addressed these shortcomings by developing neural networks that perform spatial focusing. Spatial focusing may include applying different gains to audio signals based on the locations of the sources of the sounds from which the audio signals were generated. The locations of the sounds may be derived from differences in timing of the sounds arriving at multiple microphones. The inventors have recognized that with a single microphone, speakers from different directions may sound the same to a neural network; in other words, a neural network may not be able to tell whether a speaker is in front of the wearer or behind the wearer (or, in general, where a wearer is located). A neural network using inputs from multiple microphones may break this ambiguity. Thus, the neural network may accept multiple input audio signals originating from two or more microphones on an ear-worn device and be trained to perform spatial focusing (in some embodiments, in addition to performing noise reduction), which may help to focus on sounds coming from a target direction and reduce sound coming from other directions. As one particular example, focusing on sounds originating from in front of the ear-worn device wearer may help to reduce sound from interfering speakers located behind and to the sides of the ear-worn device wearer; other target directions may be used as well.
Generally, the inventors have developed neural networks that may accept two or more input audio signals originating from two or more microphones on an ear-worn device and that may provide a significantly larger increase in signal-to-noise ratio (SNR) and higher-performance speaker isolation than achievable with conventional noise reduction and/or beamforming techniques. Thus, these neural networks may be trained to perform noise reduction, spatial focusing, or both. Additionally, the inventors have developed different methods for how such neural networks may perform spatial focusing, including methods for mapping gains to respective spatial regions from which sounds originated, methods for generating sound maps and applying beam patterns using the sound maps, and methods for collecting sound using multiple beams around the wearer and weighting the sound using a metric such as SNR. Additionally, the inventors have developed different graphical user interfaces (GUIs) for a processing device (e.g., a smartphone or tablet) in operative communication with an ear-worn device; a wearer may use these GUIs to control spatial focusing performed by the ear-worn device. The neural networks may have sufficiently low latencies to enable implementation of the neural networks in an ear-worn device. In some embodiments, a single recurrent neural network may be used in conjunction with frequency-domain processing, which may be helpful for realizing low latencies.
illustrates a hearing aid, in accordance with certain embodiments described herein. The hearing aidmay be any of the ear-worn devices or hearing aids described herein. The hearing aidincludes a front microphone, a back microphone, a user input device, and a receiver. When the hearing aidis worn, the front microphonemay be closer to the front of the wearer and the back microphone may be closer to the back of the wearer. The user input devicemay be configured for controlling certain functions of the hearing aid, such as volume, activation of neural network circuitry (e.g., any of the neural network circuitry described below), etc. A processing device (e.g., a smartphone or tablet) in operative communication with the hearing aidmay also be used by the wearer to control certain functions of the hearing aid. Any of the data paths described below may be implemented in the hearing aid. In particular, as described above, certain embodiments described herein include neural networks that accept two or more input audio signals originating from two or more microphones. The two or more microphones may be the front microphoneand the back microphoneof the hearing aid.
illustrates the hearing aidon a wearer, in accordance with certain embodiments described herein.shows the wearerfrom the back, and as illustrated, the front microphoneis closer to the front of the wearerand the back microphoneis closer to the back of the wearer. Whileillustrate a behind-the-ear hearing aid, hearing aids with other form factors may be used as well.
illustrates eyeglassesincluding built-in hearing aids, in accordance with certain embodiments described herein. The eyeglassesmay be any of the ear-worn devices or hearing aids described herein. The eyeglasseshave a left temple, a right temple, and a front rim. The eyeglassesfurther include receiversconnected to each of the left templeand the right temple.illustrates microphonesdisposed on the left temple. It should be appreciated that microphonesmay also be disposed on the right temple(but not visible in the figure). It should also be appreciated that microphonesmay also be disposed on the front rim(but not visible in the figure). Whileillustrates five microphones on the left temple, more or fewer microphones may be disposed on a temple or rim. In some embodiments (such as that of), the inlets for the microphonesmay be disposed on the inner side of the temples and/or rim (i.e., the side facing toward the wearer's face), thereby reducing visibility of the inlets to other people. In some embodiments, the inlets for the microphonesmay be disposed on the upper side of the temples and/or rim, thereby reducing visibility of the inlets to other people. In some embodiments, the inlets for the microphonesmay be disposed on the outer side of the temples and/or rim (i.e., the side facing away from the wearer's face). Any of the data paths described below may be implemented in the eyeglasses. In particular, as described above, certain embodiments described herein include neural networks that accept two or more input audio signals originating from two or more microphones. The two or more microphones may be the microphonesof the eyeglasses. It should be appreciated that whileillustrate a hearing aid and eyeglasses, other ear-worn devices such as cochlear implants or earphones may be used as well.
illustrates a data pathin an ear-worn device (e.g., a hearing aid, such as the hearing aidand/or the eyeglassesdescribed above), in accordance with certain embodiments described herein. The data pathincludes microphones, processing circuitry, enhancement circuitry, processing circuitry, and a receiver. The enhancement circuitryincludes neural network circuitry. It should be appreciated that the data pathmay include more circuitry and components than shown (e.g., anti-feedback circuitry, calibration circuitry, etc.) and such circuitry and components may be disposed between certain of the circuitry and components illustrated in.
In the data path, the processing circuitryis coupled between the microphonesand the enhancement circuitry. The enhancement circuitryis coupled between the processing circuitryand the processing circuitry. The processing circuitryis coupled between the enhancement circuitryand the receiver. As referred to herein, if element A is described as coupled between element B and element C, there may be other elements between elements A and B and/or between elements A and C.
The microphonesmay include two or more (e.g., 2, 3, 4, or more) microphones. For example, the microphonesmay include two microphones, a front microphone that is closer to the front of the wearer of the ear-worn device and a back microphone that is closer to the back of the wearer of the ear-worn device (e.g., as in the hearing aid). As another example, the microphones AAAmay include more than two microphones in an array (e.g., as in the eyeglasses). As another example, one microphone may be on a first ear-worn device and one microphone may be on a second ear-worn device coupled wirelessly to the first ear-worn device. The microphonesmay be configured to receive sound signals and generate time-domain audio signalsfrom the sound signals. The time-domain audio signalsmay represent multiple individual audio signals, each generated by one of the microphones. Thus, each of the time-domain audio signalsmay originate from one of the microphones.
In some embodiments, the processing circuitrymay include analog processing circuitry. The analog processing circuitry may be configured to perform analog processing on the time-domain audio signalsreceived from the microphones. For example, the analog processing circuitry may be configured to perform one or more of analog preamplification, analog filtering, and analog-to-digital conversion. Thus, the analog processing circuitry may be configured to generate analog-processed time-domain audio signals from the time-domain audio signals. The analog-processed time-domain audio signals may include multiple individual signals, each an analog-processed version of one of the time-domain audio signals. As referred to herein, analog processing circuitry may include analog-to-digital conversion circuitry, and an analog-processed signal may be a digital signal that has been converted from analog to digital by analog-to-digital conversion circuitry.
In some embodiments, the processing circuitrymay include digital processing circuitry. The digital processing circuitry may be configured to perform digital processing on the analog-processed time-domain audio signals received from the analog processing circuitry. For example, the digital processing circuitry may be configured to perform one or more of wind reduction, input calibration, and anti-feedback processing. Thus, the digital processing circuitry may be configured to generate digital-processed time-domain audio signals from the analog-processed time-domain audio signals. The digital-processed time-domain audio signals may include multiple individual signals, each a digital-processed version of one of the analog-processed time-domain audio signals
In some embodiments, the processing circuitrymay include beamforming circuitry. In some embodiments, the beamforming circuitry may be configured to operate in the time domain, and the enhancement circuitrymay be configured to operate in the frequency domain. In such embodiments, STFT circuitry in the processing circuitrymay be coupled between the beamforming circuitry and the enhancement circuitry, and the beamforming circuitry may be configured to perform beamforming on two or more of the digital-processed time-domain audio signals received from the digital processing circuitry. Thus, the beamforming circuitry may be configured to generate one or more time-domain beamformed audio signals from two or more of the digital-processed time-domain audio signals. The time-domain beamformed audio signals may include one or more individual signals, each a beamformed version of two or more digital-processed time-domain audio signals. Beamforming will be described in further detail below.
The STFT circuitry may be configured to perform STFT on one or more of the time-domain beamformed audio signals and/or one or more of the digital-processed time-domain audio signals. The STFT may convert a signal within a short time window (e.g., on the order of milliseconds) from a time-domain signal into a frequency-domain signal. Thus, the STFT circuitry may be configured to generate one or more frequency-domain beamformed audio signalsfrom the time-domain beamformed audio signals and to generate one or more frequency-domain non-beamformed audio signalsfrom one or more of the digital-processed time-domain (non-beamformed) audio signals. The one or more frequency-domain beamformed audio signalsmay include one or more individual signals, each a frequency-domain version of one of the time-domain beamformed signals. The one or more frequency-domain non-beamformed audio signalsmay include one or more individual signals, each a frequency-domain version of one of the digital-processed time-domain signals.
In some embodiments, the beamforming circuitry may be configured to operate in the frequency domain, and the enhancement circuitrymay be configured to operate in the frequency domain. In such embodiments, the STFT circuitry may be coupled between the processing circuitryand the beamforming circuitry. The STFT circuitry may be configured to perform STFT on the digital-processed time-domain audio signals received from the digital processing circuitry. The STFT may convert a signal within a short time window (e.g., on the order of milliseconds) from a time-domain signal into a frequency-domain signal. Thus, the STFT circuitry may be configured to generate one or more frequency-domain non-beamformed audio signalsfrom the digital-processed time-domain audio signals. The one or more frequency-domain non-beamformed audio signalsmay include one or more individual signals, each a frequency-domain version of one of the digital-processed time-domain signals. The beamforming circuitry may be configured to perform beamforming on two or more of the frequency-domain non-beamformed audio signals received from the STFT circuitry. Thus, the beamforming circuitry may be configured to generate one or more frequency-domain beamformed audio signalsfrom two or more of the frequency-domain non-beamformed audio signals. The frequency-domain beamformed audio signalsmay include one or more individual signals, each a beamformed version of two or more frequency-domain non-beamformed audio signals. Beamforming will be described in further detail below.
Thus, in some embodiments (e.g., when the beamforming circuitry is configured to operate on time-domain signals), the analog processing circuitry may be coupled between the two or more microphones and the digital processing circuitry, the digital processing circuitry may be coupled between the analog processing circuitry and the beamforming circuitry, and the beamforming circuitry may be coupled between the digital processing circuitry and the STFT circuitry. In some embodiments (e.g., when the beamforming circuitry is configured to operate on frequency-domain signals), the analog processing circuitry may be coupled between the two or more microphones and the digital processing circuitry, the digital processing circuitry may be coupled between the analog processing circuitry and the STFT circuitry, and the STFT circuitry may be coupled between the digital processing circuitry and the beamforming circuitry.
The enhancement circuitryincludes the neural network circuitrywhich may be configured to receive multiple input audio signals originating from the one or more frequency-domain non-beamformed audio signalsand the one or more frequency-domain beamformed audio signals. As referred to herein, a first signal may be said to originate from a second signal if the first signal is the same as the second signal or results from processing of the second signal. Thus, when this description describes that neural network circuitry, for example, receives first signals originating from second signals, it should be understood that in some embodiments, the neural network circuitrymay receive the second signals themselves. In some embodiments, the neural network circuitrymay receive processed versions of the second signals. As particular examples, in some embodiments, the multiple input audio signals received by the neural network circuitrymay be the one or more frequency-domain non-beamformed audio signalsand the one or more frequency-domain beamformed audio signals. In some embodiments, the multiple input audio signals received by the neural network circuitrymay be versions of the one or more frequency-domain non-beamformed audio signalsand the one or more frequency-domain beamformed audio signalsthat have been processed further by the enhancement circuitry.
The neural network circuitrymay be configured to implement a neural network (e.g., a recurrent neural network) trained to perform noise reduction and/or spatial focusing. Thus, in some embodiments, the neural network may be trained to reduce noise (i.e., reduce non-speech). In some embodiments, the neural network may be trained to perform spatial focusing. In some embodiments, the neural network may be trained to both reduce noise and perform spatial focusing. It should be appreciated that a neural network may be considered trained to perform noise reduction even if the neural network itself does not generate a noise-reduced audio signal; a neural network that generates an output for use in generating a noise-reduced audio signal may still be considered trained to perform noise reduction. For example, the neural network may generate a mask that may be used to generate a noise-reduced audio signal. It should also be appreciated that a neural network may be considered trained to perform spatial focusing even if the neural network itself does not generate a spatially-focused audio signal; a neural network that generates an output for use in generating a spatially-focused audio signal may still be considered trained to perform spatial focusing. The output may be, as non-limiting examples, a mask for generating a spatially-focused audio signal, a sound map, a mask for generating a sound map, or values calculated for a metric from audio from multiple beams (each of the multiple beams pointing at a different angle around a wearer of the ear-worn device). In some embodiments, the neural network circuitrymay be configured to output a single output based on the multiple input audio signals. In some embodiments, the outputof the enhancement circuitrymay be the output of the neural network circuitry. In some embodiments, the output of the neural network circuitrymay undergo further processing (e.g., by the interpolation circuitryand/or the processing circuitrydescribed below) prior to being outputted as the outputof the enhancement circuitry.
The processing circuitrymay be configured to perform further processing on the outputof the enhancement circuitry. For example, the processing circuitrymay include digital processing circuitry configured to perform one or more of wide-dynamic range compression and output calibration. Additionally, the processing circuitrymay include inverse STFT (iSTFT) circuitry configured to perform inverse STFT on the output of the digital processing circuitry. The iSTFT may be configured to convert a frequency-domain signal into a time-domain signal having a short time window.
The receiver(which may be the same as the receiversand/or) may be configured to play back the output of the iSTFT circuitry as sound into the ear of the user. The receivermay also be configured to implement digital-to-analog conversion prior to the playing back.
Unknown
March 17, 2026
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.