Speaker Tracking on a Single Core in a Packet Based Conferencing System

PublishedDecember 5, 2006

Assigneenot available in USPTO data we have

Technical Abstract

Patent Claims

11 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A single core speaker tracking system in a distributed conferencing system, comprising: a system communication bus; a plurality of nodes connected to said system communication bus, each of said plurality of nodes comprising a processor that comprises: a state machine module for receiving a signal from each of a plurality of local participants connected to said node for classification of said signals as voice or noise and for producing an indicator of the classification of each of said signals; a voice activity detection module, operatively connected to the state machine module and local communication channels of each node, for classifying voice signals in the signal from each of the plurality of local participants; a feature extraction module, operatively connected to said state machine and the voice detection module, for receiving said indicator and for measuring and extracting at least one feature of a signal feature of a power level, a short-term energy, a long-term energy, and a zero crossing level from each of said local participant signals classified as voice by the voice activity detection module; a speaker tracking module, operatively connected to said feature extraction module, for selecting a number of said local participant signals for transmission to the system communication bus and for selecting a number of local participant signals for muting based upon said extracted signal features; and a multiplexer, operatively connected to said speaker tracking module and said system communication bus, for passing said local participant signals selected by the speaking tracking module for transmission to said system communication bus and for blocking the remaining local participant signals selected for muting from said system communication bus.

2. The single core speaker tracking system of claim 1 , wherein: said multiplexer further communicates said features of each of said selected signals to said plurality of conferencing nodes.

3. The single core speaker tracking system according to claim 1 , wherein: said speaker tracking module uses a high threshold to select signals and a low threshold to exclude signals from the selection.

4. The single core speaker tracking system according to claim 1 , wherein: said speaker tracking module selects said signals on the basis of a measured energy level of said signal.

5. The single core speaker tracking system according to claim 1 , wherein: said speaker tracking module considers a previous selected status of each speaker in determination of a current selection of speakers.

6. The single core speaker tracking system of claim 1 , wherein said selection of local participant signals is based upon the one or more extracted features of said signals to select a strongest speaker signal while maintaining channel continuity of previously selected speaker signals.

7. A method for tracking speakers on a single core of a multi-core distributed conferencing system, comprising: setting a maximum number of active speakers to be transmitted by each core; determining the number of channels on said single core; determining the voice or noise state of each speaker on said single core; assigning a valid identifier to each speaker signal on said single core which is identified as a voice signal; extracting values for one of a feature of each identified voice signal of a short-term energy, a long term energy, and an energy variation of each speaker signal identified as voice; detecting idle channels; comparing the number of assigned valid speakers to the allowed maximum number of active speakers and designating the valid speakers as current active speakers when the number of valid speakers does not exceed the allowed maximum number of active speakers; updating the parameters and features for the current active speakers based upon the features extracted from the valid speakers when the identity of the current active speakers is the same as the identity of the previous active speakers; and communicating the current active speaker identifications and the extracted values for the one of the voice signal features to the other cores in the conferencing system when said active speakers are not on an idle channel.

8. The method for tracking speakers of claim 7 , further comprising: ordering the speakers according to one of the long term and short term energies of the current signals when there is a change in the identity of one or more of the speakers from the previous frame; and sequentially selecting the valid speakers beginning with the speaker having a highest energy as the current active speakers.

9. The method for tracking speakers of claim 7 , further comprising: reclassifying previous active speakers as current active speakers when a number of current valid speakers is less than one.

10. The method for tracking speakers of claim 7 , further comprising: ordering the speakers according to the long term and the short term energies of the current signals when there are more valid speakers than the allowed maximum number of active speakers; and sequentially selecting the valid speakers beginning with a speaker having a highest energy as the current active speakers.

11. A method for tracking speakers on a single core of a multi-core distributed conferencing system, comprising: setting a maximum number of active speakers to be transmitted by each core; determining a number of channels on said single core; determining a voice or a noise state of each speaker on said single core; assigning a valid identifier to each speaker signal on said single core which is identified as a voice signal; extracting values for one of a signal feature of short-term energies, long term energies, a standard energy, and an energy variation of each speaker signal identified as voice; detecting idle channels; comparing the assigned valid speakers to the allowed maximum number of active speakers and comparing the energy of each valid speaker to a threshold energy standard deviation when the number of valid speakers exceeds the allowed maximum number of active speakers; determining the number of speakers with energy above the noise threshold; updating one of the extracted values for the features for the current active speakers based upon the features extracted from the valid speakers above the threshold, when number of valid speakers above the threshold is equal to the allowed maximum number of active speakers; ordering the speakers according to the long term and short term energies of the current signals when there are more valid speakers above the threshold than the allowed maximum number of active speakers; sequentially selecting the valid speakers beginning with the speaker having a highest energy as the current active speakers; lowering the threshold energy standard deviation when the number of valid speakers above the previous threshold energy standard deviation is less than the allowed maximum number of active speakers and re-comparing the assigned valid speakers to the revised lowered threshold; and communicating the current active speaker identifications and the extracted values for the one of the voice signal features to the other cores in the conferencing system when said active speakers are not on an idle channel.

Patent Metadata

Filing Date

Unknown

Publication Date

December 5, 2006

Inventors

Dunling Li

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search