US-11178447

Audio synchronization for audio and video streaming

PublishedNovember 16, 2021

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A computing device may receive video content along with first audio content that is synchronized with the video content. The computing device may also receive second audio content that is not synchronized with the video content. The computing device may, in turn, transmit output content that includes the video content and the second audio content. A second portion of the second audio content may be identified that has second audio characteristics that are within a selected range of similarity to first audio characteristics of a first portion of the first audio content. A temporal offset may be calculated between the first portion and the second portion. The video content and the second audio content may be synchronized within the output content by delaying, by an amount of the temporal offset, a transmission of the second audio content relative to a transmission of video content.

Patent Claims

19 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A computing system comprising: one or more processors; and one or more memories having stored therein computing instructions that, upon execution by the one or more processors, cause the computing system to perform operations comprising: receiving, by a computing device, video content and first audio content, wherein the first audio content is synchronized with the video content, and wherein the video content is included in output content that is transmitted by the computing device; receiving, by the computing device, second audio content that is not synchronized with the video content when the second audio content is received by the computing device, wherein the second audio content is also included in the output content; determining first audio characteristics of a first portion of the first audio content; comparing the first audio characteristics to a plurality of audio characteristics of a plurality of portions of the second audio content to identify a portion of the second audio content that matches the first portion, wherein a maximum offset threshold between the first audio content and the second audio content is determined, wherein the plurality of portions of the second audio content are within the maximum offset threshold relative to the first portion, and wherein attempts are made to match the first portion only with the plurality of portions of the second audio content that are within the maximum offset threshold relative to the first portion; identifying a second portion of the plurality of portions of the second audio content that has second audio characteristics of the plurality of audio characteristics that are within a selected range of similarity to the first audio characteristics; calculating a temporal offset between the first portion and the second portion; and synchronizing, within the output content, the video content and the second audio content, wherein the synchronizing is achieved by delaying, by an amount of the temporal offset, a transmission of the second audio content relative to a transmission of video content.

2. The computing system of claim 1 , wherein the video content is live video content that is transmitted by the computing device and played to viewers using live streaming techniques.

3. The computing system of claim 1 , wherein the video content is provided by a camera, and wherein the first audio content is provided by an audio device that is integrated with the camera.

4. The computing system of claim 1 , wherein the first audio characteristics are first frequency domain characteristics of the first portion, and wherein the second audio characteristics are second frequency domain characteristics of the second portion.

5. A computer-implemented method comprising: receiving, by a computing device, first video content and first audio content, wherein the first audio content is synchronized with the first video content, and wherein the first video content is included in output content that is transmitted by the computing device; receiving, by the computing device, second audio content that is not synchronized with the first video content when the second audio content is received by the computing device, wherein the second audio content is also included in the output content; determining first audio characteristics of a first portion of the first audio content; comparing the first audio characteristics to a plurality of audio characteristics of a plurality of portions of the second audio content to identify a portion of the second audio content that matches the first portion, wherein a maximum offset threshold between the first audio content and the second audio content is determined, and wherein the plurality of portions of the second audio content are within the maximum offset threshold relative to the first portion, and wherein attempts are made to match the first portion only with the plurality of portions of the second audio content that are within the maximum offset threshold relative to the first portion; identifying a second portion of the second audio content that has second audio characteristics that are within a selected range of similarity to the first audio characteristics; calculating a temporal offset between the first portion and the second portion; and performing a first synchronization, within the output content, of the first video content and the second audio content, wherein the first synchronization is achieved by delaying, based on an amount of the temporal offset, a transmission of the second audio content relative to a transmission of the first video content.

6. The computer-implemented method of claim 5 , further comprising: receiving, by the computing device, second video content, wherein the second video content is synchronized with the second audio content when the second video content and the second audio content are received by the computing device; and performing a second synchronization of the first video content and the second video content, wherein the second synchronization is achieved by delaying a transmission of the second video content based on the amount of the temporal offset.

7. The computer-implemented method of claim 6 , wherein the second video content is also included in the output content, and wherein the output content comprises a picture-in-picture display that includes the first video content and the second video content.

8. The computer-implemented method of claim 6 , wherein the second video content is provided by a camera, and wherein the second audio content is provided by an audio device that is integrated with the camera.

9. The computer-implemented method of claim 5 , wherein the plurality of portions include the second portion and a third portion, and wherein the second portion and the third portion partially overlap one another.

10. The computer-implemented method of claim 5 , wherein the first audio content is provided by a first audio device, wherein the second audio content is provided by a second audio device, and wherein the second audio device is a higher quality audio device than the first audio device.

11. The computer-implemented method of claim 5 , wherein the first audio characteristics are first frequency domain characteristics of the first portion, and wherein the second audio characteristics are second frequency domain characteristics of the second portion.

12. The computer-implemented method of claim 11 , wherein the first frequency domain characteristics and the second frequency domain characteristics are determined based at least in part on a Fast Fourier Transform.

13. The computer-implemented method of claim 5 , wherein the first video content includes video of a user, and wherein the first audio content and the second audio content include audio of words that are spoken by the user in the first video content.

14. One or more non-transitory computer-readable storage media having stored thereon computing instructions that, upon execution by a computing device, cause the computing device to perform operations comprising: receiving video content and first audio content, wherein the first audio content is synchronized with the video content, and wherein the video content is included in output content that is transmitted by the computing device; receiving second audio content that is not synchronized with the video content when the second audio content is received by the computing device, wherein the second audio content is also included in the output content; determining first audio characteristics of a first portion of the first audio content; comparing the first audio characteristics to a plurality of audio characteristics of a plurality of portions of the second audio content to identify a portion of the second audio content that matches the first portion, wherein a maximum offset threshold between the first audio content and the second audio content is determined, and wherein the plurality of portions of the second audio content are within the maximum offset threshold relative to the first portion, and wherein attempts are made to match the first portion only with the plurality of portions of the second audio content that are within the maximum offset threshold relative to the first portion; identifying a second portion of the second audio content that has second audio characteristics that are within a selected range of similarity to the first audio characteristics; calculating a temporal offset between the first portion and the second portion; and synchronizing, within the output content, the video content and the second audio content, wherein the synchronizing is achieved by adjusting, based on an amount of the temporal offset, a relative timing between a transmission of the second audio content and a transmission of the video content.

15. The one or more non-transitory computer-readable storage media of claim 14 , wherein the adjusting the relative timing comprises delaying the transmission of the second audio content relative to the transmission of the video content.

16. The one or more non-transitory computer-readable storage media of claim 14 , wherein the adjusting the relative timing comprises delaying the transmission of the video content relative to the transmission of the second audio content.

17. The one or more non-transitory computer-readable storage media of claim 14 , wherein the first audio characteristics are first frequency domain characteristics of the first portion, and wherein the second audio characteristics are second frequency domain characteristics of the second portion.

18. The one or more non-transitory computer-readable storage media of claim 14 , wherein the first audio content is not included in the output content.

19. The computer-implemented method of claim 5 , wherein the first audio content is not included in the output content.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

H04N

Patent Metadata

Filing Date

May 5, 2020

Publication Date

November 16, 2021

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search