Systems and methods are described for modifying one of far-end signal playback and capture of local audio on an audio device. Frames of both a far-end audio stream and a near-end audio stream may be analyzed using a measure of voice activity, the analyzing producing voice data associated with each frame. Based on the voice data, a conference state may be determined, and one of playback of the far-end audio stream and capture of local audio on an audio device may be modified based on the determined conference state. By associating the likely intent with a predefined state, the device may further cull or remove unwanted or unlikely content from the device input and output. This may have a substantial advantage in allowing for full duplex operation in the case of more meaningful and continuing voice activity, particularly in the case where there are many connected endpoints.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A method for improving audio processing on an audio device during a conference call, the method comprising the steps of: receiving, by the audio device, a far-end audio stream; splitting the received far-end audio stream into a plurality of frames; analyzing each frame using a measure of voice activity to determine if there is voice activity within each frame, the analyzing producing far-end voice data associated with each frame; analyzing frames of a local input audio stream using the measure of voice activity to produce near-end voice data; determining a conference state based on the far-end voice data and the near-end voice data; and based on the determined conference state, modifying at least one of voice-activity thresholds for the measure of voice activity or noise suppression of the audio device.
2. The method of claim 1 , the determining the conference state comprising: calculating a transaction parameter value based on the far-end voice data and the near-end voice data; and based on the calculated transaction parameter value, assigning the conference state to the conference call.
3. The method of claim 2 , the calculated transaction parameter value falling within one of a plurality of predetermined ranges, each predetermined range being associated with a conference state.
4. The method of claim 3 , a predetermined range being assigned to each of a near-end presentation state, a far-end presentation state, and a conversation state.
5. The method of claim 1 , further comprising filtering the voice data to remove short bursts, and/or applying a hold-over on the far-end voice data that indicates presence of voice for a predetermined additional time period after voice sound has ended in the far-end audio stream.
6. A method for improving audio processing on an audio device during a conference call, the method comprising the steps of: receiving, by the audio device, a plurality of far-end audio streams, each far-end audio stream being transmitted with an associated data stream, each associated data stream comprising context information for the corresponding far-end audio stream; splitting each far-end audio stream into a plurality of frames; analyzing each frame of each far-end audio stream using a measure of voice activity to determine if there is voice activity within each frame, the analyzing producing far-end voice data associated with each frame of the corresponding far-end audio stream; analyzing frames of a local input audio stream using the measure of voice activity to produce near-end voice data; determining a conference state based on the far-end voice data for each far-end audio stream and the near-end voice data; and modifying at least one of voice-activity thresholds for the measure of voice activity or noise suppression of the audio device based on the determined conference state and the context information for each far-end audio stream of the plurality, playback of the plurality of far-end audio streams being performed on a speaker of the audio device.
7. The method of claim 6 , further comprising identifying a far-end audio stream of the plurality of audio streams that is contributing a nuisance, the identifying being based on the voice data associated with the far-end audio stream contributing the nuisance, and modifying playback of the far-end audio stream contributing the nuisance in response to the identifying the far-end audio stream contributing the nuisance.
8. A method for improving audio processing on an audio device during a conference call, the method comprising the steps of: receiving, by the audio device, a far-end audio stream; splitting the received far-end audio stream into a plurality of frames; analyzing each frame using two measures of voice activity to determine if there is voice activity within each frame, the analyzing producing both binary voice data and continuous voice data for each frame; determining whether a nuisance state exists within the far-end audio stream by, for each frame: defining a nuisance parameter value for the far-end audio stream that decays over time; modifying the nuisance parameter value based on one or more rules, the binary voice data for the frame, and the continuous voice data for the frame; and comparing the modified nuisance parameter value to a threshold, wherein a nuisance state is identified for the far-end audio stream when the modified nuisance parameter value exceeds the threshold; and when a nuisance state is identified, modifying at least one of a playback of the far-end audio stream on a speaker of the audio device and capture of local audio on the audio device.
9. The method of claim 8 , the one or more rules comprising a rule that the nuisance parameter value is increased when a length of an activity burst indicated by the binary voice data is less than a predetermined threshold.
10. The method of claim 8 , the continuous voice data having a value within a range of values corresponding to a level of detected voice activity, the one or more rules comprising a rule that the nuisance parameter value is increased when the continuous voice data value falls below a first threshold value within the range of values.
11. The method of claim 8 , the continuous voice data having a value within a range of values corresponding to a level of detected voice activity, the one or more rules comprising a rule that the nuisance parameter value is decreased when the continuous voice data value rises above a second threshold value within the range of values.
12. The method of claim 8 , comprising the further steps of: receiving a plurality of incoming audio streams, each incoming audio stream being transmitted with an associated data stream, each associated data stream comprising nuisance parameter values for the corresponding audio stream; selecting an audio stream of the plurality of incoming audio streams having a lowest nuisance parameter value; combining the plurality of incoming audio streams into a mixed far-end audio stream; and modifying playback of the mixed far-end audio stream based only on the lowest nuisance parameter value.
13. An audio device comprising, a near-end audio processor that receives audio signal data from a microphone, and analyzes frames of the audio signal data using a measure of voice activity to produce near-end voice data; a far-end audio processor that receives an incoming audio stream, splits the incoming audio stream into a plurality of frames, and analyzes each frame using the measure of voice activity to produce far-end voice data; and a conference modeling circuit coupled to the both the near-end audio processor and the far-end audio processor, the conference modeling circuit determining a conference state based on both the near-end voice data and the far-end voice data, and, based on the determined conference state, modifying at least one of voice-activity thresholds for the measure of voice activity or noise suppression of the audio device.
14. An audio device comprising, a far-end audio processor that receives an incoming audio stream, splits the incoming audio stream into a plurality of frames, and analyzes each frame using two measures of voice activity to produce both binary voice data and continuous voice data associated with each frame; and a conference modeling circuit coupled to the far-end audio processor, the conference modeling circuit: determining whether a nuisance state exists within the incoming audio stream by, for each frame: defining a nuisance parameter value for the incoming audio stream that decays over time; modifying the nuisance parameter value based on one or more rules, the binary voice data for the frame, and the continuous voice data for the frame; and comparing the modified nuisance parameter value to a threshold, wherein a nuisance state is identified for the incoming audio stream when the modified nuisance parameter value exceeds the threshold; and when a nuisance state is identified, modifying at least one of a playback of the incoming audio stream on a speaker of the audio device and capture of local audio on the audio device.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
August 2, 2017
September 8, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.