US-10334384

Scheduling playback of audio in a virtual acoustic space

PublishedJune 25, 2019

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A method for processing audio data, the method comprising: receiving audio data corresponding to a plurality of instances of audio, including at least one of: (a) audio data from multiple endpoints, recorded separately or (b) audio data from a single endpoint corresponding to multiple talkers and including spatial information for each of the multiple talkers; rendering the audio data in a virtual acoustic space such that each of the instances of audio has a respective different virtual position in the virtual acoustic space; and scheduling the instances of audio to be played back with a playback overlap between at least two of the instances of audio, wherein the scheduling is performed, at least in part, according to a set of perceptually-motivated rules.

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method for processing audio data, the method comprising: receiving audio data corresponding to a plurality of instances of speech spoken by a plurality of talkers, the audio data including at least one of: (a) audio data from multiple endpoints, recorded separately or (b) audio data from a single endpoint corresponding to multiple talkers and including spatial information for each of the multiple talkers; rendering the audio data in a virtual acoustic space such that each of the talkers has a respective different virtual position in the virtual acoustic space; and scheduling the instances of speech to be played back such that an amount of playback overlap between at least two of the instances of speech is greater than an amount of original overlap between two corresponding instances of speech, wherein the scheduling is performed, at least in part, according to a set of perceptually-motivated rules.

2. The method of claim 1 , wherein the audio data comprises live audio data, said scheduling being performed dynamically while the live audio data is generated.

3. The method of claim 2 , wherein the audio data comprises conference audio data corresponding to a teleconference or in-person conference, some or all of the talkers being conference participants.

4. The method of claim 3 , wherein the conference audio data is pre-recorded.

5. The method of claim 3 , wherein the live audio data comprises at least some of the conference audio data.

6. The method of claim 2 , wherein the live audio data comprises internet audio data streamed from an Internet-based audio or video streaming service, one or more of the talkers being people featured in the internet audio data.

7. The method of claim 2 , wherein the live audio data comprises call audio data received in a voice or video call, one or more of the talkers being far-end participants of the call.

8. The method of claim 1 , wherein the set of perceptually-motivated rules includes a rule indicating that two of the instances of speech from a single one of the talkers should not overlap in time.

9. The method of claim 1 , wherein the set of perceptually-motivated rules includes a rule indicating that two of the instances of speech should not overlap in time if the two instances of speech correspond to a single endpoint.

10. The method of claim 1 , wherein, given two of said instances of speech A and B, the set of perceptually-motivated rules includes a rule allowing the playback of B to begin before the playback of A is complete, but not before the playback A has started.

11. The method of claim 1 , wherein, given two of said instances of speech A and B, the set of perceptually-motivated rules includes a rule allowing the playback of B to begin no sooner than a time T before the playback of A is complete, wherein T is greater than zero.

12. The method of claim 1 , comprising determining a measure of perceptual similarity between the instances of speech, wherein the set of perceptually-motivated rules includes a rule that the playback overlap between two of the instances of speech is allowed on condition of being perceptually dissimilar by more than a predetermined amount according to said measure.

13. The method of claim 1 , comprising determining a measure of perceptual similarity between two of the instances of speech, wherein the set of perceptually-motivated rules includes a rule that determines a length of the playback overlap between two of the instances of speech based on said measure.

14. The method of claim 1 , wherein at least some of the instances of speech are scheduled to be played back at a faster rate than a rate at which the instance of speech was recoded recorded.

15. The method of claim 1 , comprising using a search engine to determine search results based on one or more search parameters derived from a user input, wherein at least some of said instances of speech correspond to the search results.

16. The method of claim 1 , comprising, via a user interface, providing a listener with an option to switch to a non-overlapped playback mode to listen to a portion of one of said instances of speech in more detail.

17. The method of claim 1 , further comprising: receiving further audio data corresponding to one or more instances of non-speech audio; wherein said rendering comprises rendering the audio data in a virtual acoustic space such that each of the instances of speech and each of the instances of non-speech audio has a respective different virtual position in the virtual acoustic space; and wherein said scheduling comprises: scheduling the instances of speech and non-speech audio to be played back with a playback overlap between at least one of the instances of speech and at least one of the instances of non-speech audio, and/or between at least two of the instances of non-speech audio.

18. The method of claim 1 , wherein the audio data comprises conference audio data corresponding to a recording of a complete or substantially complete conference.

19. A non-transitory medium having software stored thereon, the software including instructions for controlling at least one device to perform the method of claim 1 .

20. An apparatus, comprising: an interface system; and a control system configured for communication with the interface system, the control system being further configured to perform operations of: receiving, via the interface system, audio data corresponding to a plurality of instances of speech, the audio data including at least one of: (a) speech data from multiple endpoints, recorded separately or (b) speech data from a single endpoint corresponding to multiple conference participants and including spatial information for each of the multiple talkers; rendering the speech data for each of the talkers to a separate virtual position in a virtual acoustic space; and scheduling the instances of speech to be played back such that an amount of playback overlap between at least two of the instances of speech is greater than an amount of original overlap between two corresponding instances of speech, wherein the scheduling is performed, at least in part, according to a set of perceptually-motivated rules.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

H04S G10L H04M H04L H04R

Patent Metadata

Filing Date

February 3, 2016

Publication Date

June 25, 2019

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search