US-11482243

System and method for automatically identifying and ranking key moments in media

PublishedOctober 25, 2022

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A system including a key moments engine (KME) and a method for automatically identifying and ranking key moments in a media asset. The KME extracts an audio stream from the media asset and stores the audio stream as an audio file. The KME divides the audio file into sub-second audio segments. The KME computes an audio signal level for each of the sub-second audio segments and generates an array of audio signal levels for the audio file. The KME generates clusters of the audio signal levels from the array of audio signal levels. The KME dynamically determines threshold levels for classifying the audio signal levels in the array using the clusters. The KME identifies the key moments from the classified audio signal levels and computes a rank for each of the identified key moments based on ranking criteria.

Patent Claims

22 claims

Legal claims defining the scope of protection, as filed with the USPTO.

2. The system of claim 1, wherein the audio signal level is measured in decibels relative to full scale.

3. The system of claim 1, wherein one or more of the computer program instructions, which when executed by the at least one processor, cause the at least one processor to store the audio signal level as an absolute value of the audio signal level for the generation of the clusters free of errors.

4. The system of claim 1, wherein the key moments engine is configured to execute at least one of a plurality of clustering algorithms for generating the clusters of the audio signal levels, wherein the at least one of the clustering algorithms is a k-means clustering algorithm.

5. The system of claim 1, wherein the audio signal levels are classified as one of high audio signal levels, medium audio signal levels, and low audio signal levels.

6. The system of claim 1, wherein one or more of the computer program instructions, which when executed by the at least one processor, cause the at least one processor to process events with non-distinguishable audio signal levels for the identification of the key moments.

7. The system of claim 1, wherein one or more of the computer program instructions, which when executed by the at least one processor, cause the at least one processor to represent the each of the identified key moments using a start time code and an end time code.

8. The system of claim 7, wherein one or more of the computer program instructions, which when executed by the at least one processor, cause the at least one processor to adjust the start time code and the end time code of the each of the identified key moments to boundaries of shots of the media asset to enhance a visual representation of the identified key moments.

9. The system of claim 7, wherein one or more of the computer program instructions, which when executed by the at least one processor, cause the at least one processor to discard fringe shots that end just before the start time code of the each of the identified key moments and fringe shots that start just after the end time code of the each of the identified key moments.

10. The system of claim 1, wherein one or more of the computer program instructions, which when executed by the at least one processor, cause the at least one processor to create a dictionary for the each of the identified key moments for storing the audio signal levels of audio segments in the each of the identified key moments, and a start time code and an end time code of an event defining the each of the identified key moments, wherein the dictionary comprises shots that fall inside the each of the identified key moments.

11. The system of claim 1, wherein the ranking criteria comprise one or more of variations in the audio signal levels of the identified key moments, presence of one or more key actors in the media asset determined using a reference database of key actors, presence of scenes of interest in the media asset, and an average audio signal level of the each of the identified key moments.

12. The system of claim 1, wherein one or more of the computer program instructions, which when executed by the at least one processor, cause the at least one processor to generate a report comprising an audio plot of the each of the identified key moments and the shots in the each of the identified key moments, the computed rank of the each of the identified key moments, and a start time code and an end time code of the each of the identified key moments.

14. The method of claim 13, wherein the audio signal level is measured in decibels relative to full scale.

15. The method of claim 13, further comprising storing the audio signal level as an absolute value of the audio signal level by the key moments engine for the generation of the clusters free of errors.

16. The method of claim 13, wherein the key moments engine is configured to execute at least one of a plurality of clustering algorithms for generating the clusters of the audio signal levels, wherein the at least one of the clustering algorithms is a k-means clustering algorithm.

17. The method of claim 13, wherein the audio signal levels are classified as one of high audio signal levels, medium audio signal levels, and low audio signal levels.

18. The method of claim 13, further comprising processing events with non-distinguishable audio signal levels by the key moments engine for the identification of the key moments.

19. The method of claim 13, further comprising representing the each of the identified key moments using a start time code and an end time code by the key moments engine.

20. The method of claim 19, further comprising adjusting the start time code and the end time code of the each of the identified key moments to boundaries of shots of the media asset by the key moments engine to enhance a visual representation of the identified key moments.

21. The method of claim 19, further comprising discarding, by the key moments engine, fringe shots that end just before the start time code of the each of the identified key moments and fringe shots that start just after the end time code of the each of the identified key moments.

22. The method of claim 13, further comprising creating a dictionary for the each of the identified key moments by the key moments engine for storing the audio signal levels of audio segments in the each of the identified key moments, and a start time code and an end time code of an event defining the each of the identified key moments, wherein the dictionary comprises shots that fall inside the each of the identified key moments.

23. The method of claim 13, wherein the ranking criteria comprise one or more of variations in the audio signal levels of the identified key moments, presence of one or more key actors in the media asset determined using a reference database of key actors, presence of scenes of interest in the media asset, and an average audio signal level of the each of the identified key moments.

24. The method of claim 13, further comprising generating, by the key moments engine, a report comprising an audio plot of the each of the identified key moments and the shots in the each of the identified key moments, the computed rank of the each of the identified key moments, and a start time code and an end time code of the each of the identified key moments.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G10L G11B G06V

Patent Metadata

Filing Date

June 9, 2021

Publication Date

October 25, 2022

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search