Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A computer system comprising: an input module configured to: receive geometry information that is indicative of a variation in geometry of a subject over a time period, the time period comprising a plurality of time instants; and receive a plurality of images of the subject at each time instant of the plurality of time instants, each image associated with a respective viewpoint of the subject, and each image comprising a view-dependent texture map of the subject that is dependent on the respective viewpoint of the subject; an autoencoder that is configured to: jointly encode texture information and the geometry information to provide a latent vector; and infer, using the latent vector: an inferred geometry of the subject for a predicted viewpoint; and an inferred view-dependent texture of the subject for the predicted viewpoint; and a rendering module that is configured to render a reconstructed image of the subject for the predicted viewpoint using the inferred geometry and the inferred view-dependent texture.
This invention relates to a computer system for reconstructing and rendering dynamic 3D models of subjects, addressing challenges in capturing and visualizing time-varying geometry and appearance from multiple viewpoints. The system receives geometry data representing changes in a subject's shape over time, along with multiple images captured at different viewpoints for each time instant. Each image includes a view-dependent texture map that varies based on the viewpoint. An autoencoder processes this input by jointly encoding both texture and geometry information into a compact latent vector. Using this latent vector, the system infers the subject's geometry and view-dependent texture for a new, predicted viewpoint. A rendering module then generates a reconstructed image of the subject from this predicted viewpoint, combining the inferred geometry and texture. The approach enables dynamic 3D reconstruction and novel view synthesis, useful in applications like virtual reality, medical imaging, or animation where accurate representation of time-varying subjects is critical. The system handles view-dependent effects, such as lighting and reflections, by learning texture variations tied to specific viewpoints, improving realism in rendered outputs.
2. The computer system of claim 1 , wherein, for each time instant of the plurality of time instants, the autoencoder is configured to average the respective view-dependent texture maps associated with the plurality of images.
This invention relates to computer systems for processing 3D models, specifically improving the generation of texture maps from multiple images. The problem addressed is the challenge of creating high-quality, view-consistent texture maps from a set of images captured from different viewpoints. Existing methods often produce artifacts or inconsistencies due to variations in lighting, occlusion, or camera angles. The system uses an autoencoder neural network to process a plurality of images of a 3D model captured at different time instants. For each time instant, the autoencoder averages the view-dependent texture maps derived from the images. This averaging step helps reduce noise and inconsistencies caused by varying viewpoints. The autoencoder is trained to reconstruct the 3D model's surface texture while preserving fine details and ensuring visual coherence across different views. The system may also include a neural network that generates a 3D mesh from the images, which is then used to align and project the texture maps onto the model's surface. The invention improves upon prior art by leveraging deep learning to automatically handle view-dependent variations, resulting in more accurate and visually consistent texture maps. The averaging of texture maps at each time instant ensures robustness against outliers and improves the overall quality of the reconstructed 3D model. This approach is particularly useful in applications such as 3D scanning, virtual reality, and computer graphics, where high-fidelity texture reconstruction is essential.
3. The computer system of claim 1 , wherein the autoencoder is a conditional autoencoder that is configured to condition at least one variable that is associated with each image, and the latent vector does not contain any information about the at least one variable.
This invention relates to a computer system for processing images using a conditional autoencoder. The system addresses the challenge of extracting meaningful latent representations from images while ensuring that certain variables associated with the images do not influence the latent vector. The autoencoder is designed to condition on at least one variable linked to each image, such as metadata or labels, but the resulting latent vector is independent of these variables. This allows the system to generate a latent representation that captures only the intrinsic features of the image, free from external biases or conditions. The autoencoder consists of an encoder that compresses the input image into a latent vector and a decoder that reconstructs the image from this vector. The conditioning mechanism ensures that the latent vector remains unaffected by the specified variables, enabling applications like unsupervised learning, data generation, or feature extraction where variable independence is critical. The system may be used in domains such as medical imaging, where patient metadata should not influence diagnostic features, or in generative models where controlled variable separation is required. The invention improves upon traditional autoencoders by explicitly enforcing variable independence in the latent space, enhancing the robustness and interpretability of the learned representations.
4. The computer system of claim 3 , wherein the at least one variable comprises the respective viewpoint associated with each image, and the latent vector does not contain any viewpoint information.
This invention relates to computer vision systems that process images from multiple viewpoints to generate a latent vector representation. The system addresses the challenge of extracting viewpoint-invariant features from images captured from different perspectives, which is critical for tasks like object recognition, scene understanding, or 3D reconstruction. The system includes a neural network that processes input images and produces a latent vector, which is a compact numerical representation of the image content. The latent vector is designed to exclude viewpoint information, ensuring that the representation remains consistent regardless of the camera angle or position. The system also includes a mechanism to track or adjust the viewpoint associated with each input image, allowing the network to separate viewpoint-dependent features from viewpoint-invariant features. By removing viewpoint information from the latent vector, the system enables more robust and generalizable image analysis, as the representation focuses solely on the intrinsic properties of the objects or scenes in the images. This approach improves performance in applications where viewpoint variations could otherwise introduce noise or inconsistencies.
5. The computer system of claim 4 , wherein the autoencoder is configured to infer the inferred geometry and the inferred view-dependent texture by using the latent vector as well as a view vector of the subject for the predicted viewpoint.
This invention relates to computer vision systems for reconstructing three-dimensional (3D) models of subjects from images, addressing challenges in accurately capturing both geometry and view-dependent textures. The system employs an autoencoder neural network to infer a latent vector representing the subject's 3D structure and appearance. The autoencoder processes input images to generate this latent vector, which encodes the subject's intrinsic properties. To reconstruct the subject from a predicted viewpoint, the system uses the latent vector alongside a view vector specifying the desired viewing angle. The autoencoder then decodes this combined input to produce an inferred 3D geometry and a view-dependent texture map, allowing realistic rendering from novel perspectives. The view-dependent texture accounts for lighting and surface variations that change with viewpoint, enhancing visual fidelity. This approach enables dynamic 3D reconstruction without requiring explicit 3D scanning or multi-view capture, improving efficiency and scalability. The system is particularly useful in applications like virtual reality, augmented reality, and 3D modeling, where accurate and adaptable 3D representations are essential.
6. The computer system of claim 1 , wherein the latent vector comprises a representation of a facial expression of the subject.
This invention relates to computer systems for analyzing facial expressions using latent vectors. The system captures and processes facial data to generate a latent vector that encodes a representation of a subject's facial expression. The latent vector is derived from input data, such as images or video frames, and is used to identify, classify, or analyze the subject's emotional state or facial movements. The system may include components for data acquisition, feature extraction, and machine learning models to generate and interpret the latent vector. The latent vector can be used in applications such as emotion recognition, human-computer interaction, or biometric authentication. The invention addresses the challenge of accurately capturing and interpreting subtle facial expressions in real-time or from static images, improving the reliability of expression-based systems. The latent vector representation allows for efficient storage, comparison, and analysis of facial expressions across different contexts.
7. The computer system of claim 1 , wherein the geometry information comprises a three-dimensional mesh of the subject that is tracked over the time period.
This invention relates to computer systems for tracking the geometry of a subject, such as a human or object, over time. The system captures and processes three-dimensional (3D) mesh data representing the subject's geometry, enabling dynamic tracking of its shape and movement. The 3D mesh consists of interconnected vertices, edges, and faces that form a detailed surface representation of the subject. By analyzing this mesh over a time period, the system can monitor changes in the subject's geometry, such as deformations, movements, or structural variations. This technology is useful in applications like motion capture, medical imaging, virtual reality, and robotics, where accurate tracking of 3D geometry is essential. The system may include additional components for data acquisition, processing, and visualization to support real-time or post-processing analysis of the tracked geometry. The invention improves upon existing methods by providing a more precise and comprehensive representation of the subject's geometry over time, enhancing accuracy in applications requiring dynamic 3D tracking.
8. The computer system of claim 7 , wherein, for each image: the image has a plurality of pixels, each pixel having an associated color; and the computer system is configured to unwrap the view-dependent texture map by casting rays through each pixel and assigning an intersected texture coordinate to the associated color of each respective pixel.
This invention relates to computer graphics systems that process view-dependent texture maps, which are textures that change appearance based on the viewer's perspective. The problem addressed is efficiently unwrapping these textures to generate a consistent 2D representation from a 3D model, ensuring accurate color mapping regardless of the viewing angle. The system processes 3D models with view-dependent textures, where each texture's appearance varies based on the observer's position. For each image in the texture map, the system analyzes the image's pixels, each of which has an associated color value. To unwrap the texture, the system casts rays through each pixel in the image. These rays intersect with the 3D model, and the system records the texture coordinates at the intersection points. These coordinates are then assigned to the corresponding pixel colors, effectively mapping the 3D texture data to a 2D image while preserving the view-dependent color information. This approach ensures that the unwrapped texture maintains visual consistency, allowing for accurate rendering from different perspectives. The method is particularly useful in applications requiring high-fidelity texture mapping, such as virtual reality, gaming, and 3D modeling, where dynamic lighting and perspective changes are common. The system's ability to handle view-dependent textures efficiently improves rendering quality and performance in real-time applications.
9. The computer system of claim 1 , wherein: the computer system is configured to infer the inferred geometry and the inferred view-dependent texture in real time and render an animated series of reconstructed images of the subject in real time; and the animated series of reconstructed images comprises a virtual reality (VR) or augmented reality (AR) presentation for display on a VR or AR headset.
This invention relates to real-time 3D reconstruction and rendering of subjects for virtual reality (VR) or augmented reality (AR) applications. The system addresses the challenge of dynamically generating high-fidelity 3D models with accurate geometry and view-dependent textures in real time, enabling immersive VR/AR experiences. The computer system captures input data, such as images or sensor measurements, of a subject and processes this data to infer both the geometry and view-dependent texture of the subject. The inferred geometry represents the 3D shape of the subject, while the view-dependent texture captures how the subject's appearance changes with different viewing angles or lighting conditions. The system performs these inferences in real time, allowing for immediate rendering of the reconstructed 3D model. The rendered output is an animated series of reconstructed images, which can be displayed on a VR or AR headset. This enables users to interact with a dynamically generated 3D representation of the subject in an immersive environment. The system ensures that the reconstructed images maintain visual fidelity and realism, adapting to changes in the subject's appearance as needed. This technology is particularly useful for applications requiring real-time 3D reconstruction, such as virtual try-on, telepresence, or interactive AR/VR experiences.
10. A computer-implemented method, comprising: receiving geometry information that is indicative of a variation in geometry of a subject over a time period, the time period comprising a plurality of time instants; receiving a plurality of images of the subject at each time instant of the plurality of time instants, each image associated with a respective viewpoint of the subject, and each image comprising a view-dependent texture map of the subject that is dependent on a respective viewpoint of the subject; jointly encoding, by an autoencoder, texture information and the geometry information to provide a latent vector; and inferring, using the latent vector: an inferred geometry of the subject for a predicted viewpoint; and an inferred view-dependent texture of the subject for the predicted viewpoint; and rendering a reconstructed image of the subject for the predicted viewpoint using the inferred geometry and the inferred view-dependent texture.
This invention relates to computer vision and 3D reconstruction, specifically addressing the challenge of dynamically reconstructing and rendering subjects with varying geometry and view-dependent textures over time. The method captures geometry information representing changes in a subject's shape across multiple time instants and acquires multiple images of the subject from different viewpoints at each time instant. Each image includes a view-dependent texture map, meaning the texture appearance varies based on the viewpoint. An autoencoder jointly encodes both the texture and geometry data into a compact latent vector. This latent vector is then used to infer the subject's geometry and view-dependent texture for a new, predicted viewpoint. Finally, the system renders a reconstructed image of the subject from this predicted viewpoint by combining the inferred geometry and texture. The approach enables dynamic 3D reconstruction and rendering of subjects with time-varying shapes and appearance, useful in applications like virtual reality, animation, and medical imaging. The method leverages deep learning to efficiently encode and decode complex spatiotemporal data, allowing for realistic reconstructions from novel viewpoints.
11. The computer-implemented method of claim 10 , further comprising conditioning the respective viewpoint associated with each image, wherein the latent vector does not contain any viewpoint information.
This invention relates to computer vision and image processing, specifically addressing the challenge of removing viewpoint information from latent vectors in neural network-based image generation or analysis systems. The method involves processing images captured from different viewpoints to ensure that the latent vector representation of these images does not encode viewpoint-specific data. This is particularly useful in applications where viewpoint invariance is required, such as object recognition, 3D reconstruction, or generative modeling, where the system should focus on intrinsic features rather than the perspective from which the image was taken. The method first extracts latent vectors from input images, which may contain viewpoint-dependent information. These latent vectors are then processed to remove or neutralize viewpoint-specific data, ensuring that the resulting representation is viewpoint-invariant. This conditioning step may involve techniques such as normalization, adversarial training, or additional neural network layers designed to strip out viewpoint-related features. The conditioned latent vectors can then be used for downstream tasks like image synthesis, classification, or reconstruction without being influenced by the original viewpoint. The approach improves the robustness and generalization of image processing systems by ensuring that viewpoint variations do not affect the learned representations.
12. The computer-implemented method of claim 10 , wherein the latent vector comprises a representation of a facial expression of the subject.
This invention relates to computer-implemented methods for analyzing facial expressions using latent vectors. The method involves generating a latent vector that encodes a representation of a subject's facial expression. This latent vector is derived from input data, such as images or video frames, and is used to capture key features of the expression in a compact, numerical form. The latent vector can then be processed to extract or classify the facial expression, enabling applications in emotion recognition, human-computer interaction, or biometric authentication. The method may also include steps to refine or transform the latent vector to improve accuracy or adapt to different conditions. By representing facial expressions in this way, the system can efficiently analyze and interpret expressions without relying on raw image data, reducing computational overhead and improving scalability. The approach is particularly useful in real-time applications where quick and accurate expression analysis is required.
13. The computer-implemented method of claim 10 , wherein the geometry information comprises a three-dimensional mesh of the subject that is tracked over the time period.
This invention relates to computer-implemented methods for tracking the geometry of a subject over time, particularly in applications such as medical imaging, augmented reality, or motion capture. The method addresses the challenge of accurately capturing and representing the dynamic changes in a subject's geometry, which is essential for applications requiring precise spatial and temporal tracking. The method involves generating a three-dimensional mesh representation of the subject, which is a structured collection of vertices, edges, and faces that define the subject's surface geometry. This mesh is continuously updated over a defined time period to reflect movements or deformations of the subject. The tracking process ensures that the mesh remains accurate and consistent, allowing for real-time or post-processing analysis of the subject's geometry. The three-dimensional mesh may be derived from various input sources, such as depth sensors, cameras, or other imaging modalities, and is processed to maintain geometric integrity while adapting to changes in the subject's position or shape. This approach enables applications such as medical diagnostics, where tracking anatomical structures is critical, or in virtual reality, where realistic avatars require precise motion tracking. The method ensures that the mesh remains topologically consistent, avoiding distortions or artifacts that could compromise the accuracy of the tracking results.
14. The computer-implemented method of claim 10 , comprising: inferring the inferred geometry and the inferred view-dependent texture in real time; and rendering an animated series of reconstructed images of the subject in real time, wherein the rendered animated series of reconstructed images comprises a virtual reality (VR) or augmented reality (AR) presentation for display on a VR or AR headset.
This invention relates to real-time 3D reconstruction and rendering for virtual reality (VR) or augmented reality (AR) applications. The method addresses the challenge of dynamically generating high-quality, view-dependent textures and geometry for immersive displays, such as VR or AR headsets, without requiring pre-captured data or extensive computational delays. The method involves inferring both the geometry and view-dependent texture of a subject in real time. The inferred geometry represents the 3D structure of the subject, while the inferred view-dependent texture captures surface details that vary with the observer's perspective. These inferences are used to render an animated series of reconstructed images, which are displayed in real time on a VR or AR headset. The rendered images provide a seamless, immersive experience by dynamically adjusting to the user's movements and viewpoint. The approach eliminates the need for pre-processing or static models, enabling real-time interaction with dynamically reconstructed scenes. This is particularly useful for applications requiring live 3D capture and rendering, such as virtual try-on, telepresence, or interactive AR/VR environments. The system ensures low-latency performance, making it suitable for headset-based displays where responsiveness is critical.
15. A computer system, comprising: a multi-camera setup comprising a plurality of cameras arranged proximate a subject, each camera configured to capture an image of a subject that is associated with a respective viewpoint of the subject; an input module configured to: receive geometry information that is indicative of a variation in geometry of the subject over a time period, the time period comprising a plurality of time instants; and receive, from the plurality of cameras, a plurality of images of the subject at each time instant of the plurality of time instants, each image associated with the respective viewpoint of the subject, and each image comprising a view-dependent texture map of the subject that is dependent on the respective viewpoint of the subject; an autoencoder that is configured to: jointly encode texture information and the geometry information to provide a first latent vector; and infer, using the first latent vector: an inferred geometry of the subject for a predicted viewpoint; and an inferred view-dependent texture of the subject for the predicted viewpoint; and a rendering module that is configured to render a reconstructed image of the subject for the predicted viewpoint using the inferred geometry and the inferred view-dependent texture.
This invention relates to a computer system for capturing, processing, and rendering dynamic 3D representations of a subject using a multi-camera setup. The system addresses the challenge of accurately reconstructing and rendering a subject from multiple viewpoints over time, accounting for changes in both geometry and appearance. The system includes a multi-camera setup where multiple cameras capture images of a subject from different viewpoints simultaneously. Each camera generates a view-dependent texture map of the subject, meaning the appearance varies based on the camera's perspective. Additionally, the system receives geometry information that describes how the subject's shape changes over time. An autoencoder processes this data by jointly encoding the texture and geometry information into a compact latent vector. This latent vector is then used to infer the subject's geometry and appearance for a new, predicted viewpoint that may not have been directly captured by any of the original cameras. The inferred geometry and texture are combined to render a reconstructed image of the subject from this new viewpoint. The system enables dynamic 3D reconstruction and novel view synthesis, allowing for realistic rendering of a subject from arbitrary perspectives, even when the subject's geometry and appearance are changing over time. This is useful in applications such as virtual reality, augmented reality, and 3D modeling.
16. The computer system of claim 15 , wherein the computer system is configured to infer the inferred geometry and the inferred view-dependent texture in real time and render an animated series of reconstructed images of the subject in real time.
This invention relates to computer vision and real-time 3D reconstruction, addressing the challenge of accurately capturing and rendering dynamic scenes with both geometric and texture details. The system reconstructs a 3D model of a subject from input images, inferring both the underlying geometry and view-dependent textures. The inferred geometry represents the three-dimensional structure of the subject, while the inferred view-dependent texture accounts for variations in appearance due to lighting, material properties, or other environmental factors. The system processes input images to extract these features, then generates a reconstructed 3D model that can be rendered from arbitrary viewpoints. A key aspect is the ability to perform these computations in real time, allowing for the generation of an animated series of reconstructed images as the subject moves or changes. This enables applications in virtual reality, augmented reality, real-time surveillance, and interactive media, where low-latency rendering of dynamic scenes is critical. The system dynamically updates the inferred geometry and textures to maintain accuracy as the subject's appearance evolves, ensuring smooth and realistic reconstructions.
17. The computer system of claim 16 , wherein: the rendered animated series of reconstructed images comprise a virtual reality (VR) or augmented reality (AR) presentation for display on a VR or AR headset; and the computer system is configured to learn correspondence between the plurality of images from the multi-camera setup and images from cameras mounted on the VR or AR headset.
This invention relates to computer systems for generating animated series of reconstructed images, particularly for virtual reality (VR) or augmented reality (AR) applications. The system addresses the challenge of accurately mapping and synchronizing images from multiple cameras to create immersive VR or AR experiences. The system includes a multi-camera setup that captures images from different perspectives, which are then processed to reconstruct a three-dimensional scene. The reconstructed images are rendered as an animated series, forming the basis for VR or AR presentations displayed on headsets. A key feature is the system's ability to learn and establish correspondence between the images from the multi-camera setup and images captured by cameras mounted on the VR or AR headset. This correspondence ensures seamless integration of real-world and virtual elements, enhancing the realism and interactivity of the VR or AR experience. The system dynamically adjusts the reconstructed images based on the headset's camera inputs, allowing for real-time adjustments and improved user immersion. This approach improves the accuracy and fluidity of VR or AR environments by leveraging multi-camera data and headset-mounted cameras to create a cohesive visual experience.
18. The computer system of claim 15 , wherein the computer system is further configured to: use the reconstructed image to re-render a plurality of simulated headset images of the subject, each simulated headset image being associated with a viewpoint of a plurality of simulated VR or AR headset cameras; receive a plurality of received headset images of the subject from a plurality of VR or AR headset cameras; and jointly encode the plurality of simulated headset images and the plurality of received headset images to provide a second latent vector.
This invention relates to computer systems for processing images in virtual reality (VR) or augmented reality (AR) applications. The system addresses the challenge of efficiently encoding and reconstructing images from multiple viewpoints to improve rendering quality and reduce data transmission requirements in VR/AR environments. The system reconstructs an image of a subject from a set of input images, then uses this reconstructed image to generate multiple simulated headset images corresponding to different viewpoints of VR or AR headset cameras. These simulated images are combined with actual received headset images captured by real VR or AR headset cameras. The system then jointly encodes both the simulated and received images to produce a second latent vector, which represents a compressed, low-dimensional representation of the combined data. This approach enhances image fidelity and reduces bandwidth usage by leveraging both simulated and real-world data. The system may also include a neural network trained to reconstruct the image from the input images and another neural network trained to encode the simulated and received headset images into the latent vector. The encoding process ensures that the latent vector retains essential visual information while minimizing redundancy. This method is particularly useful for real-time VR/AR applications where low latency and high-quality rendering are critical.
19. The computer system of claim 18 , wherein the computer system is configured to condition the plurality of simulated headset images and the plurality of received headset images such that the second latent vector does not contain information indicating whether a received headset image is a simulated headset image or a received headset image.
The invention relates to computer systems for processing headset images, particularly in scenarios where both simulated and real headset images are involved. The system addresses the challenge of distinguishing between simulated and real headset images, which can introduce biases or inaccuracies in analysis. The system generates a plurality of simulated headset images and receives a plurality of real headset images. These images are processed to extract latent vectors, which are compact numerical representations capturing key features. The system conditions the processing of both simulated and real images such that the resulting latent vectors do not contain information indicating whether an image is simulated or real. This ensures that downstream tasks, such as classification or recognition, are not influenced by the origin of the images. The conditioning may involve techniques like domain adaptation, adversarial training, or normalization to align the distributions of simulated and real images in the latent space. By removing this distinguishing information, the system improves the robustness and fairness of image analysis, particularly in applications like virtual reality, augmented reality, or medical imaging where synthetic data is commonly used. The invention enhances the reliability of systems that rely on mixed datasets of real and simulated images.
20. The computer system of claim 15 , wherein the first latent vector comprises a representation of a facial expression of the subject.
A computer system analyzes facial expressions by generating a first latent vector representing a subject's facial expression. This system includes a camera capturing an image of the subject's face and a neural network processing the image to extract facial features. The neural network converts these features into the first latent vector, which encodes the facial expression in a compact, numerical format. The system may also generate a second latent vector representing a reference facial expression for comparison. By analyzing the relationship between the first and second latent vectors, the system can determine similarities or differences in facial expressions, enabling applications such as emotion recognition, authentication, or user interaction analysis. The neural network is trained on a dataset of facial images labeled with corresponding expressions to ensure accurate representation in the latent vectors. This approach allows for efficient and scalable analysis of facial expressions in real-time applications.
Unknown
March 10, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.