Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A computer-implemented method comprising: receiving, at a server device, a first media stream comprising first media data from a first media object, wherein the first media data was captured by one of: the server device; or a first client device; receiving, at the server device, a second media stream comprising second media data from a second media object, wherein the second media data was captured by one of: the server device; the first client device; or a second client device; mixing, at the server device, the first media data and the second media data into a third media stream; compiling, while mixing the third media stream, a metadata stream comprising information enabling: client-side unmixing of the first media data and the second media data from the third media stream; and client-side processing of the first media data separate from the second media data; transmitting, from the server device to one or more additional devices, the third media stream; and transmitting, from the server device to the one or more additional devices, the metadata stream to enable each of the one or more additional devices to: unmix the first media data and the second media data from the third media stream; and process the first media data separate from the second media data.
This invention relates to a system for capturing, mixing, and transmitting multiple media streams while preserving the ability to separate and process the individual streams on the client side. The problem addressed is the need to efficiently combine multiple media sources (such as video, audio, or other data) into a single stream for transmission while allowing recipients to reconstruct and process the original streams independently. The system involves a server device that receives at least two media streams from different sources, which may include the server itself, a first client device, or a second client device. The server mixes these streams into a single composite stream. During mixing, the server generates a metadata stream that contains information necessary for clients to later unmix the original streams from the composite stream. This metadata enables clients to separate the first and second media data and process them independently, such as applying different effects, filters, or analyses to each stream. The composite stream and metadata stream are transmitted to one or more client devices, which use the metadata to reconstruct the original streams. This approach reduces bandwidth usage by transmitting a single mixed stream while maintaining flexibility for client-side processing. The system is applicable in applications like video conferencing, live broadcasting, or multimedia editing, where multiple sources must be combined but individual stream processing is required.
2. The computer-implemented method of claim 1 , further comprising: receiving, at the one or more additional devices, the third media stream; receiving, at the one or more additional devices, the metadata stream; using, at the one or more additional devices, the metadata stream to unmix the first media data and the second media data from the third media stream; and differently processing, at the one or more additional devices, the first media data and the second media data.
This invention relates to a computer-implemented method for processing media streams in a distributed system. The problem addressed is the efficient transmission and processing of multiple media streams, such as audio or video, from a central device to additional devices, where each device may need to process the streams differently. The method involves generating a combined media stream (third media stream) by mixing first and second media data, along with a metadata stream that describes how the mixed data can be separated. The combined stream and metadata are transmitted to additional devices. Upon receipt, the additional devices use the metadata to unmix the first and second media data from the combined stream. The unmixed data is then processed differently based on the device's requirements. This approach reduces bandwidth usage by transmitting a single combined stream instead of multiple separate streams while allowing flexible processing at the receiving devices. The method ensures that each device can extract and process the relevant media data independently, improving efficiency in distributed media systems.
3. The computer-implemented method of claim 2 , wherein differently processing the first media data and the second media data comprises spatially localizing, at the one or more additional devices while presenting the first media data and the second media data to a user of the one or more additional devices, the first media data and the second media data at different spatial locations in the user's auditory field.
This invention relates to audio processing techniques for presenting media data to users in a spatially localized manner. The problem addressed is the need to enhance user experience by differentiating audio sources in a multi-device environment, ensuring clear and distinct auditory perception of multiple media streams. The method involves processing first and second media data streams differently to spatially localize them at distinct positions within a user's auditory field. This is achieved using one or more additional devices, such as speakers or headphones, while the media data is being presented to the user. The spatial localization ensures that the first and second media data streams are perceived as originating from different locations, improving clarity and reducing auditory confusion. The technique may involve adjusting audio parameters like phase, amplitude, or delay to create the spatial effect. This approach is particularly useful in multi-user or multi-device environments where multiple audio sources must be distinguished. The method may also include synchronizing the spatially localized audio streams to maintain coherence and prevent disorientation. The invention aims to provide an immersive and intuitive auditory experience by leveraging spatial audio processing to differentiate concurrent media streams.
4. The computer-implemented method of claim 3 , wherein: the one or more additional devices comprise a head-mounted device capable of measuring a head pose of the user; and the first media data and the second media data are spatially localized relative to the head pose of the user.
This invention relates to a computer-implemented method for enhancing user interaction with media content in a virtual or augmented reality environment. The method addresses the challenge of providing spatially localized media content to a user based on their head pose, ensuring immersive and contextually relevant experiences. The method involves using a head-mounted device to measure the user's head pose, which includes tracking the position and orientation of the user's head in three-dimensional space. Based on this head pose data, the system dynamically adjusts the spatial localization of first and second media data streams. The first media data may include primary content, such as a virtual object or environment, while the second media data may include supplementary content, such as annotations, interactive elements, or additional visual or auditory cues. By aligning these media streams relative to the user's head pose, the system ensures that the content appears in the correct spatial position, enhancing realism and interactivity. The method may also involve synchronizing the media data streams with the user's movements, allowing for seamless transitions and interactions as the user changes their viewpoint. This approach improves user engagement by providing a more immersive and responsive experience in virtual or augmented reality applications. The system can be applied in various domains, including gaming, training simulations, and interactive media experiences.
5. The computer-implemented method of claim 2 , wherein differently processing the first media data and the second media data comprises independently adjusting, at the one or more additional devices, a volume level of the first media data and a volume level of the second media data.
This invention relates to audio processing in multi-device media playback systems, addressing the challenge of synchronizing and customizing audio output across multiple devices. The method involves processing first and second media data streams independently at one or more additional devices to enhance user experience. Specifically, the volume levels of the first and second media data streams are adjusted separately at the additional devices, allowing for personalized or context-aware audio control. This enables users to modify the volume of one media stream without affecting the other, which is useful in scenarios like multi-user environments or adaptive audio applications. The method ensures that each device can dynamically adjust the audio output based on individual preferences or environmental conditions, improving flexibility and user satisfaction. The solution is particularly relevant in smart home systems, collaborative workspaces, or entertainment setups where synchronized yet customizable audio playback is desired.
6. The computer-implemented method of claim 1 , wherein: the first media stream is received from the first client device; the first media object is a user of the first client device; the second media stream is received from the second client device; the second media object is a user of the second client device; the server device is a cloud-based server hosting a virtualized conference for a user of the one or more additional devices, the user of the first client device, and the user of the second client device; and the computer-implemented method further comprises: capturing, at the first client device, the first media data from the user of the first client device; and capturing, at the second client device, the second media data from the user of the second client device.
This invention relates to cloud-based virtualized conferencing systems that facilitate real-time communication between multiple participants. The problem addressed is the need for efficient and scalable media streaming in virtual conferences, where participants from different client devices interact in a shared virtual environment. The method involves a cloud-based server that hosts a virtual conference for multiple users, including those from a first client device and a second client device. The server receives a first media stream from the first client device, where the media stream includes data representing a user of that device. Similarly, the server receives a second media stream from the second client device, containing data representing its user. The media data for each stream is captured at the respective client devices before transmission to the server. The virtual conference allows participants to interact in a shared virtual space, with the server managing the distribution of media streams between devices. This ensures synchronized and seamless communication among all users, regardless of their physical locations. The system enables real-time collaboration by processing and relaying media data efficiently, supporting features like video, audio, or other interactive elements in a virtualized environment. The cloud-based architecture ensures scalability and reliability for large-scale conferences.
7. The computer-implemented method of claim 1 , further comprising: receiving, at the server device, a fourth media stream comprising third media data from a third media object; using a psychoacoustic model to predict that a user of the one or more additional devices would be unable to perceive the third media data if presented to the user of the one or more additional devices; and refraining, at the server device, from mixing the third media data into the third media stream.
This invention relates to audio processing in multimedia systems, specifically optimizing audio mixing in real-time communication or streaming applications. The problem addressed is inefficient bandwidth and processing resource usage when transmitting audio data that users cannot perceive due to psychoacoustic masking effects. The method involves a server device managing multiple media streams from different sources. It receives a fourth media stream containing audio data from a third media object (e.g., a sound effect or background noise). The system applies a psychoacoustic model to analyze whether a user of additional devices would perceive this audio data if mixed into the stream. If the model predicts the user would not perceive the audio (due to masking by louder sounds or other factors), the server refrains from mixing the data into the stream. This reduces unnecessary data transmission and processing while maintaining audio quality. The method builds on a base system that already synchronizes and mixes media streams from multiple sources (e.g., participants in a video conference or elements in a virtual environment). The psychoacoustic analysis ensures only perceptually relevant audio is transmitted, improving efficiency without degrading the user experience. The approach is particularly useful in bandwidth-constrained environments or systems with limited processing power.
8. The computer-implemented method of claim 1 , wherein: the one or more additional devices comprise a third client device and a fourth client device; and the computer-implemented method further comprises: receiving, at the third client device, the third media stream and the metadata stream; receiving, at the fourth client device, the third media stream and the metadata stream; using, at each of the third client device and the fourth client device, the metadata stream to unmix the first media data and the second media data from the third media stream; and performing, at the third client device, a first operation on the first media data but not the second media data; performing, at the fourth client device, a second operation on the first media data but not the second media data, wherein the first operation and the second operation are different operations.
This invention relates to a system for processing and distributing media streams with embedded metadata to enable selective operations on different media components. The problem addressed is the need for efficient distribution and processing of mixed media streams where different devices require access to distinct portions of the media data for different purposes. The method involves generating a third media stream that combines first media data and second media data, along with a metadata stream that describes how to separate these components. The third media stream and metadata stream are transmitted to multiple client devices, including a third and fourth client device. Each client device uses the metadata stream to unmix the first and second media data from the third media stream. The third client device then performs a first operation on the first media data while ignoring the second media data. Similarly, the fourth client device performs a second, different operation on the first media data, again excluding the second media data. This approach allows different devices to process the same media stream in distinct ways based on their specific requirements, improving efficiency and flexibility in media distribution systems.
9. The computer-implemented method of claim 1 , wherein: the metadata stream further comprises at least one label of the first media data and at least one label of the second media data; the server device comprises a sensor array capable of spatial selectivity; receiving the first media stream comprises capturing, by the server device via the sensor array, the first media data from a first direction in a sound field; receiving the second media stream comprises capturing, by the server device via the sensor array, the second media data from a second direction in the sound field; the at least one label of the first media data comprises the first direction; and the at least one label of the second media data comprises the second direction.
This invention relates to a computer-implemented method for processing media data streams, specifically audio data, using a server device equipped with a sensor array capable of spatial selectivity. The method addresses the challenge of accurately capturing and labeling directional audio data from different sources in a sound field. The server device receives a first media stream containing first media data captured from a first direction in the sound field and a second media stream containing second media data captured from a second direction in the sound field. The sensor array enables spatial selectivity, allowing the device to distinguish between audio sources based on their directional origin. The metadata stream associated with the media data includes at least one label for the first media data, which specifies the first direction, and at least one label for the second media data, which specifies the second direction. This labeling ensures that the directional information of the audio sources is preserved and can be used for further processing, such as spatial audio rendering or source separation. The method enhances the accuracy and utility of directional audio data in applications like virtual reality, teleconferencing, and sound localization.
10. The computer-implemented method of claim 1 , wherein: the metadata stream further comprises at least one label of the first media data and at least one label of the second media data; the first client device comprises a simultaneous mapping and localization subsystem configured to map an environment of the first client device and localize the first client device within the environment; receiving the first media stream comprises receiving the first media stream from the first client device; receiving the second media stream comprises receiving the second media stream from the first client device; the computer-implemented method further comprises: capturing, by the first client device, the first media data from a first object in the environment; and capturing, by the first client device, the second media data from a second object in the environment; the at least one label of the first media data comprises an attribute of the first object; and the at least one label of the second media data comprises an attribute of the second object.
This invention relates to a computer-implemented method for processing media data streams from a client device equipped with simultaneous mapping and localization (SLAM) capabilities. The method addresses the challenge of accurately capturing and labeling environmental data from multiple objects within a mapped environment. The method involves receiving a first media stream and a second media stream from the client device, where each stream contains media data from distinct objects in the environment. The client device captures the first media data from a first object and the second media data from a second object. A metadata stream accompanies the media streams, containing labels for each set of media data. These labels include attributes of the respective objects, such as their type, position, or other relevant characteristics. The SLAM subsystem on the client device maps the environment and determines the device's position within it, ensuring accurate spatial context for the captured data. This approach enables precise identification and labeling of objects within a mapped environment, facilitating applications in augmented reality, robotics, or environmental monitoring where object recognition and localization are critical. The method leverages the client device's SLAM capabilities to enhance the accuracy and contextual relevance of the captured media data.
11. The computer-implemented method of claim 1 , wherein: mixing the first media data and the second media data into the third media stream comprises performing a convolution operation on the first media data and the second media data to produce the third media stream; the information enabling unmixing of the first media data and the second media data from the third media stream comprises information enabling, at each of the one or more additional devices, a deconvolution operation to be performed on the third media stream to produce the first media data and the second media data; and at least one of the one or more additional devices unmixes the first media data and the second media data from the third media stream by performing the deconvolution operation on the third media stream.
This invention relates to a method for mixing and unmixing media data streams, particularly in distributed computing or communication systems where multiple devices process or transmit media content. The problem addressed is the efficient combination of multiple media streams into a single stream for transmission or processing, while preserving the ability to later separate the original streams without significant data loss or degradation. The method involves mixing first and second media data streams into a third media stream using a convolution operation. Convolution is a mathematical operation that combines the data in a way that allows for later reconstruction. Alongside the mixed stream, information is provided that enables the original streams to be recovered. This information allows one or more additional devices to perform a deconvolution operation on the mixed stream, effectively reversing the convolution process to retrieve the original first and second media data streams. At least one of the additional devices performs this deconvolution to separate the original streams from the mixed stream. This approach is useful in scenarios such as video conferencing, where multiple audio or video streams may be combined for transmission and later separated for individual playback. The use of convolution and deconvolution ensures that the original data can be accurately reconstructed, maintaining quality and integrity. The method is particularly valuable in systems where bandwidth or processing efficiency is a concern, as it reduces the need to transmit or process multiple streams separately.
12. A computer-implemented method comprising: receiving, at one or more client devices from a server device, a media stream comprising first media data from a first media object and second media data from a second media object, wherein: the server device mixed the first media data and the second media data into the media stream; the first media data was captured by one of: the server device; or a first additional client device; and the second media data was captured by one of: the server device; the first additional client device; or a second additional client device; receiving, at the one or more client devices from the server device, a metadata stream comprising information enabling: client-side extraction of the first media data and the second media data from the media stream; and client-side processing of the first media data separate from the second media data; using, at each of the one or more client devices, the metadata stream to extract the first media data and the second media data from the media stream; differently processing, at each of the one or more client devices, the first media data and the second media data; and presenting, to a user of each of the one or more client devices, the first media data or the second media data.
This invention relates to a system for streaming and processing mixed media data from multiple sources. The problem addressed is the efficient transmission and client-side handling of combined media streams where different client devices may need to process or display only specific portions of the content. The solution involves a server device that captures or receives media data from multiple sources, including itself or other client devices, and mixes the data into a single media stream. Alongside the media stream, the server sends a metadata stream containing instructions for client devices to extract and process the individual media components separately. Client devices receive both streams, use the metadata to isolate the first and second media data, and apply different processing techniques to each. For example, one client may display only the first media data while another processes both streams differently. This approach reduces bandwidth usage by transmitting a single mixed stream while allowing flexible client-side customization of the content. The system supports real-time or on-demand media distribution, such as video conferencing, live broadcasts, or interactive multimedia applications.
13. The computer-implemented method of claim 12 , wherein differently processing the first media data and the second media data comprises spatially localizing, at the one or more client devices before presenting the first media data and the second media data, the first media data and the second media data at different spatial locations in the user's auditory field.
This invention relates to audio processing techniques for enhancing spatial localization of media data in a user's auditory field. The problem addressed is the need to improve the perception of distinct audio sources when multiple media streams are presented simultaneously, ensuring clear spatial separation to avoid auditory confusion. The method involves processing first and second media data streams differently to spatially localize them at distinct positions in the user's auditory field before presentation. This is achieved using one or more client devices, which adjust the audio signals to create the illusion of different spatial origins. The technique ensures that the user perceives the media data as originating from separate locations, improving clarity and reducing interference between overlapping audio sources. The method may also include dynamically adjusting the spatial positions based on user interaction or environmental factors to maintain optimal auditory separation. This approach is particularly useful in applications such as virtual reality, augmented reality, or multimedia playback systems where multiple audio streams must be distinctly presented to the user.
14. The computer-implemented method of claim 13 , further comprising measuring a head pose of the user, wherein the first media data and the second media data are spatially localized relative to the head pose of the user.
This invention relates to a computer-implemented method for enhancing user interaction with media content, particularly in augmented or virtual reality environments. The method addresses the challenge of dynamically adjusting media presentation based on a user's head pose to improve spatial awareness and immersion. The method involves capturing first media data, such as a live video feed, and second media data, which may include virtual objects or additional visual elements. These data streams are processed to spatially localize them relative to the user's head pose, ensuring that the media content aligns with the user's perspective in real time. By measuring the head pose—typically using sensors or cameras—the system dynamically adjusts the positioning of the media data to maintain accurate spatial relationships, enhancing the user's perception of depth and interaction. The method also includes generating a composite output by combining the first and second media data, ensuring seamless integration of real and virtual elements. This composite output is then displayed to the user, providing an immersive experience where media content appears naturally positioned in the user's environment. The spatial localization based on head pose ensures that the media remains correctly aligned as the user moves, reducing disorientation and improving engagement. This approach is particularly useful in applications like augmented reality, virtual reality, and mixed reality, where accurate spatial mapping is critical for a convincing user experience. The method dynamically adapts to the user's movements, ensuring that media content remains contextually relevant and visually coherent.
15. The computer-implemented method of claim 12 , wherein differently processing the first media data and the second media data comprises independently adjusting, at the one or more client devices, a volume level of the first media data and a volume level of the second media data.
This invention relates to audio processing in multimedia systems, specifically for independently adjusting volume levels of different media streams at client devices. The problem addressed is the need for flexible audio control in environments where multiple media sources are played simultaneously, such as in video conferencing, gaming, or multimedia playback, where users may want to prioritize or balance audio from different sources without affecting the overall system settings. The method involves processing first and second media data streams at one or more client devices. The key innovation is the ability to independently adjust the volume levels of these streams at the client side, allowing users to customize their audio experience without requiring changes to the source or server-side configurations. This enables scenarios where, for example, a user can increase the volume of a video stream while reducing the volume of background music or vice versa, all while maintaining the original audio balance for other users. The solution leverages client-side processing to dynamically modify audio levels, ensuring that adjustments are localized to individual devices. This approach avoids the need for centralized control, reducing latency and complexity. The method is particularly useful in collaborative environments where different participants may have varying audio preferences or requirements. By enabling granular volume control at the client level, the invention enhances user experience and adaptability in multimedia applications.
16. A system comprising: at least one physical processor; and physical memory comprising computer-executable instructions that, when executed by the physical processor, cause the physical processor to: receive, at a server device, a first media stream comprising first media data from a first media object, wherein the first media data was captured by one of: the server device; or a first client device; receive, at the server device, a second media stream comprising second media data from a second media object, wherein the second media data was captured by one of: the server device; the first client device; or a second client device; mix, at the server device, the first media data and the second media data into a third media stream; compile, while mixing the third media stream, a metadata stream comprising information enabling: client-side unmixing of the first media data and the second media data from the third media stream; and client-side processing of the first media data separate from the second media data; transmit, from the server device to one or more additional devices, the third media stream; and transmit, from the server device to the one or more additional devices, the metadata stream to enable each of the one or more additional devices to: unmix the first media data and the second media data from the third media stream; and process the first media data separate from the second media data.
The system relates to real-time media processing and streaming, specifically for combining multiple media sources into a single stream while preserving the ability to separate and process individual components on the client side. The problem addressed is the need to efficiently transmit multiple media streams (e.g., video, audio) from different sources while allowing recipients to selectively access or manipulate individual streams without requiring separate transmissions. The system includes a server device with at least one physical processor and memory storing executable instructions. The server receives a first media stream from a first media object, captured either by the server or a first client device, and a second media stream from a second media object, captured by the server, the first client device, or a second client device. The server mixes the first and second media data into a single third media stream. Simultaneously, it compiles a metadata stream containing information that enables client devices to unmix the original streams and process them independently. The server then transmits both the mixed media stream and the metadata stream to one or more additional devices. Recipients can use the metadata to separate the original media components and handle them separately, such as applying different effects, filters, or routing them to different outputs. This approach reduces bandwidth usage by transmitting a single combined stream while maintaining flexibility for downstream processing.
17. The system of claim 16 , wherein: the first media stream is received from the first client device; the first media object is a user of the first client device; the second media stream is received from the second client device; the second media object is a user of the second client device; and the server device is a cloud-based server hosting a virtualized conference for a user of the one or more additional devices, the user of the first client device, and the user of the second client device.
This invention relates to a cloud-based video conferencing system that processes and displays media streams from multiple participants. The system addresses the challenge of efficiently managing and presenting media streams in a virtualized conference environment, ensuring seamless interaction between users. The system includes a server device that receives a first media stream from a first client device, where the first media stream represents a user of that device. Similarly, the server receives a second media stream from a second client device, representing its user. The server also processes media streams from one or more additional devices, all participating in the same virtualized conference. The server dynamically adjusts the display of these media streams, ensuring that the users of the first and second client devices, along with users of the additional devices, can interact effectively. The system may prioritize certain media streams based on user activity, ensuring that the most relevant content is prominently displayed. Additionally, the server may apply real-time processing to enhance video quality, reduce latency, or optimize bandwidth usage. The invention improves the user experience in virtual conferences by providing a scalable and adaptive solution for handling multiple media streams in a cloud-based environment.
18. The system of claim 16 , wherein the physical memory further comprises additional computer-executable instructions that, when executed by the physical processor, cause the physical processor to: receive, at the server device, a fourth media stream comprising third media data from a third media object; use a psychoacoustic model to predict that a user of the one or more additional devices would be unable to perceive the third media data if presented to the user of the one or more additional devices; and refrain, at the server device, from mixing the third media data into the third media stream.
This invention relates to audio processing systems that optimize media streaming by selectively excluding imperceptible audio data to reduce computational and bandwidth demands. The system operates within a networked environment where a server device processes multiple media streams from different sources, such as microphones or audio devices, and distributes them to one or more client devices. The core problem addressed is the inefficient use of resources when transmitting audio data that users cannot perceive due to psychoacoustic masking effects, where louder sounds mask quieter ones. The system includes a physical processor and memory storing executable instructions. When executed, these instructions enable the processor to receive a media stream containing audio data from a media object, such as a microphone or audio file. The system then applies a psychoacoustic model to analyze the audio data and predict whether a user of a receiving device would perceive the data. If the model determines the audio is imperceptible—due to masking by louder sounds or other factors—the system refrains from mixing that data into the output stream. This selective exclusion reduces unnecessary processing and bandwidth usage while maintaining audio quality for perceptible sounds. The system may also handle multiple media streams, applying the same psychoacoustic analysis to each. For example, if a third media stream contains audio data that would be imperceptible to the user, the system skips mixing that data, further optimizing resource usage. This approach is particularly useful in real-time applications like video conferencing or live audio broadcasting, where minimizing latency and bandwidth is critical.
19. The system of claim 16 , wherein: the metadata stream further comprises at least one label of the first media data and at least one label of the second media data; the server device comprises a sensor array capable of spatial selectivity; the sensor array receives the first media stream from a first direction in a sound field; the sensor array receives the second media stream from a second direction in the sound field; the at least one label of the first media data comprises the first direction; and the at least one label of the second media data comprises the second direction.
A system for processing audio data includes a server device with a sensor array that selectively captures sound from different directions in a sound field. The system receives a first media stream from a first direction and a second media stream from a second direction. The server device generates a metadata stream that includes labels for the first and second media data, where the labels indicate the respective directions from which the media streams were captured. This allows the system to distinguish between audio sources based on their spatial origin, enabling applications such as directional audio analysis, source separation, or spatial audio mapping. The sensor array's spatial selectivity ensures accurate direction identification, improving the precision of audio processing tasks. The metadata stream may also include additional labels for further classification or identification of the media data. This system is useful in environments where distinguishing between multiple audio sources is critical, such as surveillance, conference systems, or immersive audio applications.
20. The system of claim 16 , wherein: the metadata stream further comprises at least one label of the first media data and at least one label of the second media data; the first client device comprises a simultaneous mapping and localization subsystem configured to map an environment of the first client device and localize the first client device within the environment; the first media data is captured by the first client device from a first object in the environment; the second media data is captured by the first client device from a second object in the environment; the at least one label of the first media data comprises an attribute of the first object; and the at least one label of the second media data comprises an attribute of the second object.
A system enhances media capture and processing by integrating metadata streams with environmental mapping and object labeling. The system operates in the domain of augmented reality, computer vision, or spatial computing, addressing challenges in accurately mapping environments and identifying objects within them. The system includes a first client device that captures media data from multiple objects in an environment. The device generates a metadata stream containing labels for the captured media data, where each label describes an attribute of the corresponding object. For example, the first media data may include an image or video of a first object, and its metadata label may specify the object's type, location, or other relevant characteristics. Similarly, the second media data, captured from a second object, includes metadata labels describing its attributes. The first client device also includes a simultaneous mapping and localization subsystem that constructs a spatial map of the environment and determines the device's position within it. This enables precise tracking of objects and their attributes in real-time, improving applications such as augmented reality overlays, object recognition, or environmental navigation. The system ensures accurate contextual data is associated with captured media, enhancing usability in dynamic environments.
Unknown
October 27, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.