A beamforming device in the present invention includes a probability estimation unit, a steering vector unit, and a beamforming unit, wherein the probability estimation unit estimates a speech existence probability corresponding to a probability that a target speech signal exists based on an input vector; the steering vector unit provides an estimated steering vector according to the speech existence probability and an input vector; and the beamforming unit calculates a weight vector based on the speech existence probability, the input vector, and the estimated steering vector to provide an output vector, and the beamforming device of the present invention can more accurately extract the target speech signal from the input signal by estimating the speech existence probability corresponding to the probability that the target speech signal exists based on the input vector to provide the steering vector and the weight vector.
Legal claims defining the scope of protection, as filed with the USPTO.
. A beamforming device, comprising:
. The beamforming device of, wherein a noise spatial covariance inverse matrix for the noise included in the input vector is calculated according to a variance-weighted spatial covariance inverse matrix in the previous frame.
. The beamforming device of, wherein an estimated time-varying variance included in the noise spatial covariance inverse matrix is calculated by weighted-averaging a time-varying variance in the previous frame.
. The beamforming device of, further comprising:
. The beamforming device of, further comprising:
. The beamforming device of, wherein the estimated steering vector is determined according to a re-estimated time-varying variance calculated based on the target speech mask.
. The beamforming device of, wherein the weight vector is determined according to the re-estimated time-varying variance calculated based on the target speech mask.
. The beamforming device of, wherein the time-varying variance is determined according to power of an output signal calculated based on the target speech mask.
. The beamforming device of, wherein the variance-weighted spatial covariance inverse matrix is determined according to the re-estimated time-varying variance calculated based on the target speech mask.
. The beamforming device of, further comprising:
. The beamforming device of, wherein when the diagonal component of the target speech signal spatial covariance matrix is the negative number, the target speech mask of the current frame is the same as the target speech mask of the previous frame, and the estimated steering vector of the current frame is the same as the estimated steering vector of the previous frame.
. The beamforming device of, wherein the input vector is composed of a portion of the input vector.
Complete technical specification and implementation details from the patent document.
This application claims benefit of priority to Korean Patent Application No. 10-2023-0055999 filed Apr. 28, 2023, the contents of which is incorporated herein by reference in its entirety.
The present invention relates to a beamforming device.
A sound input signal input through a microphone may include not only a target speech required for speech recognition but also noise that interferes with speech recognition. Various researches are being conducted to improve the performance of the speech recognition by removing noise from the sound input signal and extracting only the desired target speech.
The present invention provides a beamforming device capable of more accurately extracting a target speech signal from an input signal by estimating a speech existence probability corresponding to a probability that the target speech signal exists based on an input vector to provide a steering vector and a weight vector.
According to an embodiment of the present invention, a beamforming device may include a probability estimation unit, a steering vector unit, and a beamforming unit. The probability estimation unit may estimate a speech existence probability corresponding to a probability that a target speech signal exists based on an input vector. The steering vector unit may provide an estimated steering vector according to the speech existence probability and the input vector. The beamforming unit may calculate a weight vector based on the speech existence probability, the input vector, and the estimated steering vector to provide an output vector.
In an embodiment, the speech existence probability may be determined according to a target speech signal spatial covariance matrix for the target speech signal included in the input vector.
In an embodiment, the target speech signal spatial covariance matrix for the target speech signal included in the input vector may be calculated according to a noise spatial covariance matrix.
In an embodiment, the noise spatial covariance matrix for noise included in the input vector may be calculated according to a noise spatial covariance matrix estimate of a previous frame corresponding to the previous frame of a current frame.
In an embodiment, a noise spatial covariance inverse matrix for the noise included in the input vector may be calculated according to a variance-weighted spatial covariance inverse matrix in the previous frame.
In an embodiment, an estimated time-varying variance included in the noise spatial covariance inverse matrix is calculated by weighted-averaging a time-varying variance in the previous frame.
In an embodiment, the beamforming device may further include a probability providing unit. The probability providing unit may provide the speech existence probability based on the target speech signal spatial covariance matrix.
In an embodiment, the beamforming device may further include a mask unit. The mask unit may provide a target speech mask according to the speech existence probability.
In an embodiment, the estimated steering vector may be determined according to a re-estimated time-varying variance calculated based on the target speech mask.
In an embodiment, the weight vector may be determined according to the re-estimated time-varying variance calculated based on the target speech mask.
In an embodiment, the variance-weighted spatial covariance inverse matrix may be determined according to the re-estimated time-varying variance calculated based on the target speech mask.
In an embodiment, the time-varying variance may be determined according to power of an output signal calculated based on the target speech mask.
In an embodiment, the beamforming device may further include a determination unit. The determination unit may determine whether a diagonal component of the target speech signal spatial covariance matrix estimate is a negative number.
In an embodiment, when the diagonal component of the target speech signal spatial covariance matrix estimate is the negative number, the target speech mask of the current frame may be the same as the target speech mask of the previous frame, and the estimated steering vector of the current frame may be the same as the estimated steering vector of the previous frame.
In an embodiment, when the beamforming device operates in a single channel, the input vector may be configured by changing the frame and frequency based on the current frame and a reference frequency.
In an embodiment, the input vector may be composed of a portion of the input vector.
In addition to the technical problems of the present invention described above, other features and advantages of the present invention will be described below, or may be clearly understood by those skilled in the art from such description and explanation.
In the specification, in adding reference numerals to components throughout the drawings, it is to be noted that like reference numerals designate like components even though components are shown in different drawings.
On the other hand, the meaning of the terms described in the present specification should be understood as follows.
Singular expressions should be understood as including plural expressions, unless the context clearly defines otherwise, and the scope of rights should not be limited by these terms.
Also, it should be understood that terms such as “include” and “have” do not preclude the existence or addition possibility of one or more other features or numbers, steps, operations, components, parts, or combinations thereof.
Hereinafter, preferred embodiments of the present invention designed to solve the above problems will be described in detail with reference to the accompanying drawings.
are diagrams for describing a beamforming device according to embodiments of the present invention.
Referring to, a beamforming deviceaccording to an embodiment of the present invention may include a probability estimation unit, a steering vector unit, and a beamforming unit. The probability estimation unitmay estimate a speech existence probability SPP corresponding to a probability that a target speech signal TSS exists based on an input vector X. For example, the target speech signal may be provided as a microphone input through a space (transfer function, steering vector) between a target speech and a microphone, and the microphone input may include noise. Here, the microphone input may be the input vector X according to the present invention.
In addition, the speech existence probability (SPP) may be defined as a posterior probability of the existence of the target speech signal TSS in the input vector X at time t and frequency f, and may be expressed as [Equation 1] below using a Bayes rule.
Here, pmay be the speech existence probability,
may be a posterior probability for when the target speech signal exists in the input vector, and ∧may be a generalized likelihood ratio. The generalized likelihood ratio may be expressed as [Equation 2] below.
Here,
may be a prior probability when there is no target speech signal and may be set to a constant between 0 and 1,
may be a likelihood of when the target speech signal existing in the input vector, and
may be the likelihood of when the target speech signal does not exist in the input vector.
According to an embodiment, the speech existence probability SPP may be determined according to a target speech signal spatial covariance matrix TGM for the target speech signal TSS included in the input vector X. Summarizing [Equation 1] above, it may be expressed as [Equation 3] below.
Here,
may be a noise spatial covariance matrix, and
may be the target speech signal spatial covariance matrix.
According to an embodiment, the target speech signal spatial covariance matrix TGM for the target speech signal TSS included in the input vector X may be calculated according to the noise spatial covariance matrix. For example, the target speech signal spatial covariance matrix TGM for the target speech signal (TSS) may be expressed as [Equation 4] below:
Here,
may be the target speech signal spatial covariance matrix,
may be the noise spatial covariance matrix, and
may be the spatial covariance matrix for the input vector. The spatial covariance matrix for the input vector X may be expressed as [Equation 5] below.
Here, xmay be the input vector,
Unknown
April 14, 2026
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.