Patentable/Patents/US-11482243
US-11482243

System and method for automatically identifying and ranking key moments in media

PublishedOctober 25, 2022
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

A system including a key moments engine (KME) and a method for automatically identifying and ranking key moments in a media asset. The KME extracts an audio stream from the media asset and stores the audio stream as an audio file. The KME divides the audio file into sub-second audio segments. The KME computes an audio signal level for each of the sub-second audio segments and generates an array of audio signal levels for the audio file. The KME generates clusters of the audio signal levels from the array of audio signal levels. The KME dynamically determines threshold levels for classifying the audio signal levels in the array using the clusters. The KME identifies the key moments from the classified audio signal levels and computes a rank for each of the identified key moments based on ranking criteria.

Patent Claims
22 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 2

Original Legal Text

2. The system of claim 1, wherein the audio signal level is measured in decibels relative to full scale.

Plain English translation pending...
Claim 3

Original Legal Text

3. The system of claim 1, wherein one or more of the computer program instructions, which when executed by the at least one processor, cause the at least one processor to store the audio signal level as an absolute value of the audio signal level for the generation of the clusters free of errors.

Plain English Translation

This invention relates to a system for processing audio signals to generate clusters of data points, addressing the problem of errors in cluster generation due to variations in audio signal levels. The system includes at least one processor and a memory storing computer program instructions. When executed, these instructions cause the processor to analyze an audio signal and determine its level. The system then stores the audio signal level as an absolute value, ensuring that the subsequent generation of clusters from this data is free from errors that could arise from negative or varying signal levels. The use of absolute values standardizes the input data, improving the accuracy and reliability of the clustering process. The system may also include additional components, such as input interfaces for receiving audio signals and output interfaces for displaying or transmitting the generated clusters. The clustering process itself may involve grouping similar audio signal levels into distinct clusters based on predefined criteria, such as proximity or statistical thresholds. This approach is particularly useful in applications where precise audio analysis is required, such as speech recognition, noise reduction, or audio signal classification. By ensuring that the audio signal levels are stored as absolute values, the system eliminates potential distortions or inaccuracies in the clustering results, leading to more consistent and reliable outcomes.

Claim 4

Original Legal Text

4. The system of claim 1, wherein the key moments engine is configured to execute at least one of a plurality of clustering algorithms for generating the clusters of the audio signal levels, wherein the at least one of the clustering algorithms is a k-means clustering algorithm.

Plain English Translation

The system is designed for analyzing audio signals to identify key moments within the content. The problem addressed is the need to automatically detect and categorize significant segments in audio data, such as speech, music, or environmental sounds, to improve content analysis, indexing, or user interaction. The system includes a key moments engine that processes audio signals to generate clusters of audio signal levels, representing distinct segments of the audio. These clusters help in identifying transitions or notable events within the audio stream. The key moments engine employs clustering algorithms to group similar audio segments based on their signal characteristics. Specifically, the system uses a k-means clustering algorithm, which partitions the data into k predefined clusters by minimizing the variance within each cluster. This approach allows for efficient and scalable analysis of audio content, enabling applications such as automated content summarization, adaptive audio processing, or personalized audio experiences. The clustering process enhances the system's ability to distinguish between different audio patterns, improving the accuracy and relevance of the identified key moments.

Claim 5

Original Legal Text

5. The system of claim 1, wherein the audio signal levels are classified as one of high audio signal levels, medium audio signal levels, and low audio signal levels.

Plain English Translation

This invention relates to audio signal processing systems designed to classify audio signal levels into distinct categories for improved audio analysis or control. The system processes an input audio signal to determine its amplitude or power, then categorizes it into one of three predefined levels: high, medium, or low. This classification can be used for various applications, such as adaptive noise reduction, dynamic volume control, or audio event detection. The system may include an audio input interface to receive the signal, a signal processing module to analyze its characteristics, and a classification module to assign the signal to the appropriate level based on predefined thresholds. The thresholds may be fixed or dynamically adjusted based on environmental conditions or user preferences. By categorizing audio signals into discrete levels, the system enables more efficient and context-aware audio processing, improving performance in applications like voice recognition, audio monitoring, or sound enhancement. The classification may also be used to trigger specific actions, such as adjusting gain settings or activating noise suppression algorithms. The system may further include feedback mechanisms to refine classification accuracy over time.

Claim 6

Original Legal Text

6. The system of claim 1, wherein one or more of the computer program instructions, which when executed by the at least one processor, cause the at least one processor to process events with non-distinguishable audio signal levels for the identification of the key moments.

Plain English translation pending...
Claim 7

Original Legal Text

7. The system of claim 1, wherein one or more of the computer program instructions, which when executed by the at least one processor, cause the at least one processor to represent the each of the identified key moments using a start time code and an end time code.

Plain English Translation

The invention relates to a system for analyzing and representing key moments in digital media content, such as video or audio files. The problem addressed is the need for an efficient and precise way to identify and represent significant segments within media content for purposes like editing, indexing, or retrieval. The system processes media content to detect and extract key moments, which are then represented using time-based metadata. Specifically, each identified key moment is encoded with a start time code and an end time code, allowing for precise temporal localization within the media file. This enables users to quickly access or manipulate these segments without manually reviewing the entire content. The system may also include additional features, such as generating visual or textual annotations for the key moments or integrating with editing tools to streamline workflows. The time-based representation ensures compatibility with various media formats and editing software, enhancing usability and interoperability. The invention improves efficiency in media processing by automating the identification and representation of key moments, reducing the need for manual intervention.

Claim 8

Original Legal Text

8. The system of claim 7, wherein one or more of the computer program instructions, which when executed by the at least one processor, cause the at least one processor to adjust the start time code and the end time code of the each of the identified key moments to boundaries of shots of the media asset to enhance a visual representation of the identified key moments.

Plain English translation pending...
Claim 9

Original Legal Text

9. The system of claim 7, wherein one or more of the computer program instructions, which when executed by the at least one processor, cause the at least one processor to discard fringe shots that end just before the start time code of the each of the identified key moments and fringe shots that start just after the end time code of the each of the identified key moments.

Plain English Translation

This system is designed for automatically identifying and ranking key moments within media content. Once these significant key moments are identified, they are each represented by a precise start time code and an end time code. To refine the boundaries of these identified key moments, the system's software is configured to discard specific video segments referred to as "fringe shots." These fringe shots are defined as shots that either finish just before the start time code of an identified key moment or begin just after its end time code. This process ensures a more focused and accurate definition of each key moment by excluding very short, peripheral video segments at its edges. ERROR (embedding): Error: Failed to save embedding: Could not find the 'embedding' column of 'patent_claims' in the schema cache

Claim 10

Original Legal Text

10. The system of claim 1, wherein one or more of the computer program instructions, which when executed by the at least one processor, cause the at least one processor to create a dictionary for the each of the identified key moments for storing the audio signal levels of audio segments in the each of the identified key moments, and a start time code and an end time code of an event defining the each of the identified key moments, wherein the dictionary comprises shots that fall inside the each of the identified key moments.

Plain English translation pending...
Claim 11

Original Legal Text

11. The system of claim 1, wherein the ranking criteria comprise one or more of variations in the audio signal levels of the identified key moments, presence of one or more key actors in the media asset determined using a reference database of key actors, presence of scenes of interest in the media asset, and an average audio signal level of the each of the identified key moments.

Plain English translation pending...
Claim 12

Original Legal Text

12. The system of claim 1, wherein one or more of the computer program instructions, which when executed by the at least one processor, cause the at least one processor to generate a report comprising an audio plot of the each of the identified key moments and the shots in the each of the identified key moments, the computed rank of the each of the identified key moments, and a start time code and an end time code of the each of the identified key moments.

Plain English translation pending...
Claim 14

Original Legal Text

14. The method of claim 13, wherein the audio signal level is measured in decibels relative to full scale.

Plain English Translation

This invention relates to audio signal processing, specifically measuring and managing audio signal levels to prevent distortion or clipping. The method involves monitoring an audio signal in real-time to detect when its level approaches a predefined threshold, which is set relative to the maximum possible signal level (full scale). When the signal level reaches or exceeds this threshold, the system automatically adjusts the gain or applies dynamic range compression to reduce the signal amplitude, ensuring the audio remains within acceptable limits. The measurement is performed in decibels relative to full scale (dBFS), a common unit in digital audio systems that quantifies the signal level relative to the maximum possible amplitude. This approach helps maintain audio quality by preventing distortion while preserving dynamic range. The method may be applied in various audio processing applications, including digital audio workstations, audio interfaces, and consumer electronics, where maintaining signal integrity is critical. The system may also include additional features such as adjustable threshold settings, multiple measurement points, and real-time feedback to the user. The invention ensures that audio signals remain within safe operating limits, improving overall sound quality and user experience.

Claim 15

Original Legal Text

15. The method of claim 13, further comprising storing the audio signal level as an absolute value of the audio signal level by the key moments engine for the generation of the clusters free of errors.

Plain English Translation

This invention relates to audio signal processing, specifically for analyzing and clustering audio signals based on key moments. The problem addressed is the generation of clusters from audio signals that may contain errors due to variations in signal levels. The invention provides a method to improve the accuracy of audio signal clustering by storing the audio signal level as an absolute value. This ensures that the clustering process is not affected by positive or negative fluctuations in the signal, leading to more reliable and error-free clusters. The method involves processing an audio signal to identify key moments, which are significant points in the audio data that represent meaningful events or changes. These key moments are then used to generate clusters, which are groups of similar audio segments. By storing the audio signal level as an absolute value, the method ensures that the clustering algorithm operates on a standardized representation of the signal, reducing the impact of noise or signal variations. This approach enhances the robustness of the clustering process, making it suitable for applications such as speech recognition, music analysis, or audio event detection. The invention improves the reliability of audio signal analysis by mitigating errors caused by signal level variations, resulting in more accurate and consistent clustering outcomes.

Claim 16

Original Legal Text

16. The method of claim 13, wherein the key moments engine is configured to execute at least one of a plurality of clustering algorithms for generating the clusters of the audio signal levels, wherein the at least one of the clustering algorithms is a k-means clustering algorithm.

Plain English translation pending...
Claim 17

Original Legal Text

17. The method of claim 13, wherein the audio signal levels are classified as one of high audio signal levels, medium audio signal levels, and low audio signal levels.

Plain English translation pending...
Claim 18

Original Legal Text

18. The method of claim 13, further comprising processing events with non-distinguishable audio signal levels by the key moments engine for the identification of the key moments.

Plain English translation pending...
Claim 19

Original Legal Text

19. The method of claim 13, further comprising representing the each of the identified key moments using a start time code and an end time code by the key moments engine.

Plain English translation pending...
Claim 20

Original Legal Text

20. The method of claim 19, further comprising adjusting the start time code and the end time code of the each of the identified key moments to boundaries of shots of the media asset by the key moments engine to enhance a visual representation of the identified key moments.

Plain English translation pending...
Claim 21

Original Legal Text

21. The method of claim 19, further comprising discarding, by the key moments engine, fringe shots that end just before the start time code of the each of the identified key moments and fringe shots that start just after the end time code of the each of the identified key moments.

Plain English translation pending...
Claim 22

Original Legal Text

22. The method of claim 13, further comprising creating a dictionary for the each of the identified key moments by the key moments engine for storing the audio signal levels of audio segments in the each of the identified key moments, and a start time code and an end time code of an event defining the each of the identified key moments, wherein the dictionary comprises shots that fall inside the each of the identified key moments.

Plain English translation pending...
Claim 23

Original Legal Text

23. The method of claim 13, wherein the ranking criteria comprise one or more of variations in the audio signal levels of the identified key moments, presence of one or more key actors in the media asset determined using a reference database of key actors, presence of scenes of interest in the media asset, and an average audio signal level of the each of the identified key moments.

Plain English translation pending...
Claim 24

Original Legal Text

24. The method of claim 13, further comprising generating, by the key moments engine, a report comprising an audio plot of the each of the identified key moments and the shots in the each of the identified key moments, the computed rank of the each of the identified key moments, and a start time code and an end time code of the each of the identified key moments.

Plain English translation pending...
Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

June 9, 2021

Publication Date

October 25, 2022

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “System and method for automatically identifying and ranking key moments in media” (US-11482243). https://patentable.app/patents/US-11482243

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/US-11482243. See llms.txt for full attribution policy.