Systems and methods for extracting audio components of a portion of a video to facilitate editing the audio portion are presented. In one or more aspects, a system is provided that includes a receiving component configured to receive a video as an upload from a client device over a network and an identification component configured to identify two or more different audio components of an audio track of the video. The system further comprises an extraction component configured to extract and separate the two or more different audio components, and an editing component configured to generate an editing interface that receives input via the editing interface regarding editing the two or more different audio components separately.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A system, comprising: a memory having stored thereon computer executable components; a processor that executes at least the following computer executable components: a receiving component configured to receive a video as an upload to a website from a client device over a network; an identification component configured to: analyze audio frequencies of an audio track of the video; identify patterns in the audio frequencies; identify two or more different and concurrent audio layers of the audio track based on the patterns; and identify at least one of the audio layers as a dialogue audio layer based on the identified patterns including at least one pattern corresponding to one or more voices within the audio track; an extraction component configured to extract and separate the audio layers; an editing component configured to: generate an editing interface on the website, the interface including a set of editing options and a representation of each of the audio layers; receive, via the editing interface, input from the client device over the network regarding editing the audio layers separately, the input including a selection of at least one of the editing options and at least one of the representations of the audio layers; edit the selected audio layers based on the selected editing options; and generate an edited audio track comprising the audio layers as edited; and a reproduction component configured to combine the edited audio track with an extracted video track of the video to generate an edited video to post on the website.
2. The system of claim 1 , wherein the system is located at a server device accessible to one or more client devices via the network.
3. The system of claim 1 , wherein the extraction component is further configured to separate the audio track from a video track of the video.
4. The system of claim 1 , wherein the input regarding editing the two or more different and concurrent audio layers includes at least one of, a request to modify volume, a request to mute, a request to add a sound effect, a request to remove a sound effect, or a request to change pitch.
5. The system of claim 1 , wherein the input regarding editing the two or more different and concurrent audio layers includes a request to apply a first editing option to a first one of the two or more different and concurrent audio layers and a request to apply a second editing option to a second one of the two or more different and concurrent audio layers, wherein the first one of the two or more different and concurrent audio layers includes the dialogue audio layer and the first editing option and the second editing option are different.
6. The system of claim 1 , wherein the audio track comprises a plurality of sequential segments respectively associated with sequential frames of the video, wherein the identification component is configured to identify two or more different and concurrent audio layers respectively associated with respective segments of the sequential segments.
7. The system of claim 1 , further comprising an inference component configured to analyze the two or more different and concurrent audio layers and determine or infer an editing option to apply to at least one of the two or more different and concurrent audio layers.
8. The system of claim 1 , wherein the two or more different and concurrent audio layers span along an entirety of the audio track.
9. The system of claim 1 , wherein the representations of each of the audio layers are presented within a respective frame of a set of layered frames.
10. The system of claim 1 , wherein the identification component identifies a set of audio layers not including the dialogue audio layer as background noise.
11. The system of claim 1 , further comprising an automatic enhancement component configured to automatically edit the audio layers by increasing a volume of the dialogue audio layer and decreasing or muting a volume of a remaining set of audio layers, wherein the extracted audio layers are automatically edited in response to the selected editing option including a selection that corresponds to a dialogue enhancement option.
12. The system of claim 1 , further comprising a matching component configured to match one of the audio layers with a reference file, wherein the set of editing options includes an option to replace the matched audio layer with the reference file.
13. The system of claim 12 , wherein the editing component replaces the matched audio layer with the reference file in response to the selected editing option including an option to replace the matched audio layer with the reference file.
14. The system of claim 13 , wherein the reference file includes a music track.
15. The system of claim 1 , wherein the identification component is configured to identify patterns in the audio frequencies by referencing a look-up table storing patterns corresponding to previously identified sounds.
16. The system of claim 1 , wherein the identification component is configured to identify patterns in the audio frequencies by employing a voice to text recognition to covert spoken language into a text file to identify the dialogue audio layer.
17. The system of claim 1 , further comprising a media tagging component configured to associate metadata with each of the identified audio layers.
18. A method comprising: using a processor to execute the following computer executable instructions stored in a memory to perform the following acts: receiving a video as an upload from a client device over a network; analyzing audio frequencies of an audio track of the video; identifying patterns in the audio frequencies; identifying two or more different and concurrent audio layers of the audio track based on the patterns; identifying at least one of the audio layers as a dialogue audio layer based on the identified patterns including at least one pattern corresponding to one or more voices within the audio track; separating the two or more different and concurrent audio layers; generating an editing interface on a website, the interface including a set of editing options and a representation of each of the two or more different and concurrent audio layers; receiving input from the client device over the network regarding editing the two or more different and concurrent audio layers separately via the editing interface; editing the two or more different and concurrent audio layers based on the input; and generating an edited audio track comprising the two or more different and concurrent audio layers as edited.
19. The method of claim 18 , wherein the receiving the input comprises at receiving at least one of, a request to modify volume, a request to mute, a request to add a sound effect, a request to remove a sound effect, or a request to change pitch.
20. The method of claim 18 , further comprising combining the edited audio track with an extracted video track of the video to generate an edited video.
21. The method of claim 18 , wherein the receiving the input comprises receiving a request to apply a first editing option to a first one of the two or more different and concurrent audio layers and a second editing option to a second one of the two or more different and concurrent audio layers, wherein the first one of the two or more different and concurrent audio layers includes the dialogue audio layer and the first editing option and the second editing option are different.
22. The method of claim 18 , further comprising analyzing the two or more different audio components and determining or inferring an editing option to apply to at least one of the two or more different and concurrent audio layers.
23. A non-transitory computer-readable storage storing computer-readable instructions that, in response to execution, cause a computing system to perform operations, comprising: receiving a video as an upload from a client device over a network; analyzing audio frequencies of an audio track of the video; identifying patterns in the audio frequencies; identifying two or more different and concurrent audio layers of the audio track based on the patterns; identifying at least one of the audio layers as a dialogue audio layer based on the identified patterns including at least one pattern corresponding to one or more voices within the audio track; separating the two or more different and concurrent audio layers; generating an editing interface on a website; receiving a request from the client device over the network to apply an editing option to a subset of the two or more different and concurrent audio layers via the editing interface; applying the editing option to only the subset of the two or more different and concurrent audio layers in response to the request; generating an edited audio track comprising the subset of the two or more different and concurrent audio layers in response to the applied editing option; and combining the edited audio track with an extracted video track of the video to generate an edited video.
24. The non-transitory computer-readable storage of claim 23 , wherein the editing option includes at least one of, an option to modify volume, an option to mute, an option to add a sound effect, an option to remove a sound effect, or an option to change pitch.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
June 24, 2013
February 23, 2016
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.