Patentable/Patents/US-11962997
US-11962997

System and method for adaptive audio signal generation, coding and rendering

PublishedApril 16, 2024
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

Embodiments are described for an adaptive audio system that processes audio data comprising a number of independent monophonic audio streams. One or more of the streams has associated with it metadata that specifies whether the stream is a channel-based or object-based stream. Channel-based streams have rendering information encoded by means of channel name; and the object-based streams have location information encoded through location expressions encoded in the associated metadata. A codec packages the independent audio streams into a single serial bitstream that contains all of the audio data. This configuration allows for the sound to be rendered according to an allocentric frame of reference, in which the rendering location of a sound is based on the characteristics of the playback environment (e.g., room size, shape, etc.) to correspond to the mixer's intent. The object position metadata contains the appropriate allocentric frame of reference information required to play the sound correctly using the available speaker positions in a room that is set up to play the adaptive audio content.

Patent Claims
6 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 2

Original Legal Text

2. The system of claim 1, wherein the metadata elements associated with each object-based monophonic audio stream further indicate spatial parameters controlling the playback of a corresponding sound component comprising one or more of: sound position, sound width, and sound velocity.

Plain English Translation

This invention relates to audio processing systems that handle object-based monophonic audio streams, focusing on enhancing spatial audio playback. The system captures and processes metadata elements linked to each audio stream, where these metadata elements define spatial parameters that control how individual sound components are reproduced. These parameters include sound position, sound width, and sound velocity, enabling precise spatial rendering of audio objects in a playback environment. The system dynamically adjusts these parameters to create immersive audio experiences, allowing sound sources to be positioned accurately in a three-dimensional space, adjusted for perceived width, and moved with controlled velocity. This approach improves the realism and interactivity of audio playback, particularly in applications like virtual reality, gaming, and spatial audio reproduction. The metadata-driven spatial control ensures that each sound component is rendered with the intended spatial characteristics, enhancing the overall listening experience. The system may integrate with existing audio processing frameworks to provide flexible and scalable spatial audio solutions.

Claim 3

Original Legal Text

3. The system of claim 1, wherein the playback location for each of the plurality of object-based monophonic audio streams is independently specified with respect to either an egocentric frame of reference or an allocentric frame of reference, wherein the egocentric frame of reference is taken in relation to a listener in the playback environment, and wherein the allocentric frame of reference is taken with respect to a characteristic of the playback environment.

Plain English Translation

This invention relates to an audio playback system designed for object-based monophonic audio streams, addressing the challenge of dynamically positioning audio sources in a playback environment. The system allows each audio stream to be independently positioned using either an egocentric or allocentric frame of reference. The egocentric frame of reference positions audio relative to a listener, ensuring personalized spatial perception. The allocentric frame of reference positions audio relative to fixed environmental features, such as walls or furniture, enabling consistent spatial relationships regardless of listener movement. The system dynamically adjusts playback locations based on real-time listener position data, ensuring accurate audio localization. This approach enhances immersive audio experiences by providing flexible, context-aware positioning of monophonic audio objects. The invention improves upon traditional fixed-position audio systems by adapting to both listener movement and environmental characteristics, offering greater precision and adaptability in spatial audio rendering.

Claim 5

Original Legal Text

5. A non-transitory computer readable storage medium comprising a sequence of instructions, wherein, when executed by a system for processing audio signals, the sequence of instructions causes the system to perform the method of claim 4.

Plain English Translation

This invention relates to audio signal processing, specifically improving the accuracy and efficiency of audio analysis and enhancement. The problem addressed is the difficulty in accurately detecting and processing audio features in real-time or high-fidelity environments, where traditional methods may suffer from latency, computational overhead, or inaccuracies in feature extraction. The invention provides a computer-readable storage medium containing instructions that, when executed by an audio processing system, enable advanced audio analysis. The system processes audio signals by first capturing an input audio stream, which may be from a microphone, digital recording, or other source. The instructions then analyze the audio stream to identify and extract key features, such as frequency components, amplitude variations, or temporal patterns. These features are processed using machine learning or signal processing algorithms to enhance or modify the audio, such as noise reduction, speech recognition, or audio enhancement. The system may also include adaptive filtering techniques to dynamically adjust processing parameters based on real-time audio conditions, ensuring optimal performance across different environments. Additionally, the instructions may support multi-channel audio processing, allowing synchronization and enhancement of stereo or surround sound inputs. The result is a more accurate, efficient, and flexible audio processing solution that improves real-time performance and audio quality.

Claim 7

Original Legal Text

7. The method of claim 6, wherein the metadata elements associated with each object-based monophonic audio stream further indicate spatial parameters controlling the playback of a corresponding sound component comprising one or more of: sound position, sound width, and sound velocity.

Plain English Translation

This invention relates to object-based audio processing, specifically enhancing spatial audio rendering by associating metadata with monophonic audio streams to control playback characteristics. The technology addresses the challenge of dynamically adjusting sound components in immersive audio environments, such as virtual reality or spatial audio systems, where precise control over sound positioning and movement is critical. Each object-based monophonic audio stream is linked to metadata elements that define spatial parameters for rendering the corresponding sound component. These parameters include sound position, sound width, and sound velocity. Sound position determines the perceived location of the audio source in a 3D space, allowing for accurate placement relative to the listener. Sound width adjusts the perceived spread or diffusion of the sound, enabling effects like directional focus or ambient broadening. Sound velocity controls the movement speed of the sound component, enabling dynamic transitions or simulated motion effects. By encoding these spatial parameters in metadata, the system enables real-time adjustments to audio playback without modifying the original audio data. This approach improves flexibility in audio rendering, supporting applications like interactive media, gaming, and spatial audio production where dynamic sound positioning is essential. The metadata-driven control ensures consistent and precise spatial audio experiences across different playback systems.

Claim 8

Original Legal Text

8. The method of claim 6, wherein the playback location for each of the plurality of object-based monophonic audio streams comprises a spatial position relative to a screen within a playback environment, or a surface that encloses the playback environment, and wherein the surface comprises a front plane, a back plane, a left plane, right plane, an upper plane, and a lower plane, and/or is independently specified with respect to either an egocentric frame of reference or an allocentric frame of reference, wherein the egocentric frame of reference is taken in relation to a listener in the playback environment, and wherein the allocentric frame of reference is taken with respect to a characteristic of the playback environment.

Plain English Translation

This invention relates to spatial audio playback systems for object-based monophonic audio streams, addressing the challenge of accurately positioning audio sources in immersive environments. The method involves determining playback locations for multiple monophonic audio streams, where each location is defined by a spatial position relative to a screen within the playback environment or a surface enclosing the environment. The enclosing surface includes multiple planes: front, back, left, right, upper, and lower, allowing audio objects to be positioned anywhere within or around the listener. The spatial positioning can be specified using either an egocentric frame of reference, which is relative to the listener's position, or an allocentric frame of reference, which is tied to fixed characteristics of the playback environment, such as the screen or room geometry. This approach enables precise and flexible audio object placement, enhancing immersive audio experiences in applications like virtual reality, augmented reality, and home theater systems. The method ensures that audio objects are rendered accurately in three-dimensional space, improving spatial audio fidelity and listener engagement.

Claim 9

Original Legal Text

9. A non-transitory computer readable storage medium comprising a sequence of instructions, wherein, when executed by a system for processing audio signals, the sequence of instructions causes the system to perform the method of claim 6.

Plain English Translation

This invention relates to audio signal processing, specifically improving the accuracy and efficiency of audio signal analysis. The problem addressed is the difficulty in accurately detecting and classifying audio events in real-time or near-real-time applications, such as speech recognition, environmental sound monitoring, or audio event detection in multimedia systems. Existing methods often struggle with computational complexity, latency, or accuracy in noisy or complex acoustic environments. The invention provides a computer-readable storage medium containing instructions that, when executed by an audio processing system, enable the system to analyze audio signals with improved performance. The method involves preprocessing the audio signal to enhance relevant features, such as filtering out noise or normalizing amplitude levels. The processed signal is then divided into overlapping or non-overlapping frames, each representing a short segment of the audio. These frames are analyzed using machine learning models, such as neural networks or support vector machines, to detect and classify specific audio events. The system may also apply post-processing techniques, such as smoothing or thresholding, to refine the detection results. Additionally, the system may adapt its parameters dynamically based on environmental conditions or user feedback to improve accuracy over time. The invention aims to provide a robust, scalable solution for real-time audio event detection in various applications.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

August 8, 2022

Publication Date

April 16, 2024

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “System and method for adaptive audio signal generation, coding and rendering” (US-11962997). https://patentable.app/patents/US-11962997

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/US-11962997. See llms.txt for full attribution policy.

System and method for adaptive audio signal generation, coding and rendering