Techniques are disclosed for generating digital faces. In some examples, a style-based generator receives as inputs initial tensor(s) and style vector(s) corresponding to user-selected semantic attribute styles, such as the desired expression, gender, age, identity, and/or ethnicity of a digital face. The style-based generator is trained to process such inputs and output low-resolution appearance map(s) for the digital face, such as a texture map, a normal map, and/or a specular roughness map. The low-resolution appearance map(s) are further processed using a super-resolution generator that is trained to take the low-resolution appearance map(s) and low-resolution 3D geometry of the digital face as inputs and output high-resolution appearance map(s) that align with high-resolution 3D geometry of the digital face. Such high-resolution appearance map(s) and high-resolution 3D geometry can then be used to render standalone images or the frames of a video that include the digital face.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A computer-implemented method for rendering one or more images of a digital face, the method comprising: generating, via a first machine learning model, one or more first appearance maps based on a user selection of one or more styles associated with one or more attributes of a digital face; generating, via a second machine learning model, one or more second appearance maps and a first three-dimensional (3D) geometry associated with the digital face based on jointly upsampling the one or more first appearance maps and a second 3D geometry associated with the digital face, wherein the one or more second appearance maps are aligned with the first 3D geometry; and rendering one or more images including the digital face based on the one or more second appearance maps and the first 3D geometry.
This invention relates to computer-implemented methods for generating and rendering digital facial images with customizable styles. The technology addresses the challenge of creating high-quality, stylized digital faces by combining machine learning models to produce realistic and visually appealing results. The method involves generating appearance maps that define visual attributes of a digital face, such as texture, color, or stylistic features, based on user-selected styles. A first machine learning model produces initial appearance maps from these style selections. A second machine learning model then refines these maps and generates a 3D geometry of the face by jointly upsampling the initial appearance maps with an existing 3D geometry. The upsampling process ensures that the refined appearance maps align with the updated 3D geometry. Finally, the system renders images of the digital face using the refined appearance maps and the 3D geometry, resulting in a stylized, high-resolution output. This approach enables dynamic customization of facial appearances while maintaining realism and coherence in the rendered images. The method leverages deep learning to automate the generation of stylized digital faces, reducing manual effort and improving efficiency in applications such as virtual avatars, gaming, and digital art.
2. The computer-implemented method of claim 1 , wherein the one or more first appearance maps have a lower resolution than the one or more second appearance maps, and the second 3D geometry has a lower resolution than the first 3D geometry.
This invention relates to computer-implemented methods for generating and using 3D appearance and geometry representations of objects. The method addresses the challenge of efficiently capturing and processing detailed visual and structural information of objects while balancing computational resources and accuracy. The technique involves creating multiple appearance maps and 3D geometry models at different resolutions to optimize performance and storage requirements. The method generates one or more first appearance maps and one or more second appearance maps, where the first appearance maps have a lower resolution than the second appearance maps. Similarly, a second 3D geometry model is generated with a lower resolution than a first 3D geometry model. The lower-resolution representations are used for initial processing or coarse-level analysis, while the higher-resolution versions provide detailed information when needed. This multi-resolution approach allows for efficient rendering, reconstruction, or analysis of 3D objects by dynamically selecting the appropriate resolution based on the application requirements. The method may be applied in fields such as computer vision, augmented reality, 3D modeling, and virtual reality, where balancing detail and computational efficiency is critical.
3. The computer-implemented method of claim 1 , wherein generating the one or more first appearance maps based on the user selection of one or more styles comprises controlling one or more adaptive instance normalization (AdaIN) operations based on the user selection of one or more styles.
This invention relates to computer-implemented methods for generating appearance maps in image processing, particularly for style transfer or image synthesis tasks. The problem addressed is the need for precise control over style application in image generation, allowing users to selectively apply different styles to different regions of an image. The method involves generating one or more first appearance maps based on user-selected styles. These appearance maps define how styles are applied spatially across an image. The key innovation is controlling one or more adaptive instance normalization (AdaIN) operations based on the user's style selection. AdaIN is a technique that adjusts image features to match the statistical properties of a target style. By controlling these operations, the method enables fine-grained style transfer, where different styles can be applied to different parts of an image according to user input. This allows for more flexible and customized image generation compared to uniform style application. The method may also involve generating second appearance maps for additional styles and combining them with the first appearance maps to produce a final output image. The combination process ensures that the selected styles are applied in the intended regions while maintaining coherence in the generated image. This approach is useful in applications like artistic image synthesis, where users may want to apply multiple styles to different parts of an image.
4. The computer-implemented method of claim 3 , wherein the one or more AdaIN operations are performed in conjunction with one or more convolution operations.
This invention relates to computer-implemented methods for enhancing image processing, particularly in the domain of neural network-based image generation or style transfer. The method addresses the challenge of efficiently combining adaptive instance normalization (AdaIN) operations with convolution operations to improve computational efficiency and performance in generative models. The method involves performing one or more AdaIN operations, which adjust the mean and variance of feature maps in a neural network to transfer style characteristics from one image to another. These operations are integrated with convolution operations, which apply learned filters to extract and transform features from input images. By combining these operations, the method achieves more efficient feature transformation while maintaining or improving the quality of generated or stylized images. The convolution operations may include standard convolutional layers or depthwise separable convolutions, which reduce computational complexity. The integration of AdaIN and convolution operations can be implemented in various architectures, such as generative adversarial networks (GANs) or autoencoders, to enhance their ability to generate high-quality images with desired stylistic attributes. This approach optimizes the balance between computational efficiency and output quality, making it suitable for real-time applications in image synthesis and style transfer.
5. The computer-implemented method of claim 4 , wherein the first machine learning model comprises a plurality of semantics transfer blocks, and performing the one or more AdaIN operations in conjunction with the one or more convolution operations comprises, for each semantics transfer block included in the plurality of semantics transfer blocks: performing multiple sets of convolution operations and AdaIN operations in parallel; determining a weighted sum of outputs of the multiple sets of convolution operations and AdaIN operations; and upscaling the weighted sum of outputs.
This invention relates to computer-implemented methods for enhancing machine learning models, particularly in the domain of image or data transformation using adaptive instance normalization (AdaIN) and convolution operations. The problem addressed involves improving the efficiency and effectiveness of semantics transfer in machine learning models, which is crucial for tasks like style transfer, domain adaptation, or feature extraction. The method involves a machine learning model with multiple semantics transfer blocks, each designed to process input data through parallel operations. Within each block, multiple sets of convolution operations and AdaIN operations are performed simultaneously. AdaIN operations adjust the input data's mean and variance to match target statistics, while convolution operations extract spatial features. The outputs of these parallel operations are then combined into a weighted sum, which is subsequently upscaled to maintain or enhance resolution. This approach allows for efficient feature transformation while preserving semantic consistency, making it suitable for applications requiring real-time processing or high-fidelity output. The parallel execution of operations and weighted summation optimizes computational efficiency, reducing latency and improving scalability. The upscaling step ensures that the transformed data retains or improves its spatial resolution, which is critical for tasks like image synthesis or feature mapping.
6. The computer-implemented method of claim 1 , wherein the one or more second appearance maps include at least one of a texture map, a normal map, or a specular roughness map.
This invention relates to computer graphics and rendering techniques, specifically improving the generation and use of appearance maps in digital content creation. The problem addressed is the need for more detailed and realistic surface representations in 3D modeling, particularly for textures, surface normals, and material properties like specular roughness. These maps are essential for achieving photorealistic visual effects but often require complex processing. The method involves generating one or more second appearance maps that enhance the visual fidelity of 3D models. These maps include at least one of a texture map, a normal map, or a specular roughness map. A texture map defines surface color variations, a normal map encodes fine surface details like bumps or scratches, and a specular roughness map controls how light reflects off surfaces, affecting glossiness. By incorporating these maps, the system can simulate intricate surface characteristics more accurately, improving realism in rendered images. The technique likely integrates with existing rendering pipelines, allowing artists and developers to apply these maps to 3D models for high-quality visual output. This approach streamlines the creation of detailed surface representations, reducing manual effort while enhancing visual quality.
7. The computer-implemented method of claim 1 , wherein the one or more attributes of the digital face include at least one of expression, gender, age, identity, or ethnicity.
This invention relates to computer-implemented methods for analyzing digital faces, addressing the need for accurate and comprehensive facial attribute recognition in digital images or video. The method involves extracting and identifying multiple attributes of a digital face, including facial expressions, gender, age, identity, and ethnicity. These attributes are derived from digital face data, which may be obtained from images, video frames, or other digital representations of human faces. The system processes this data to detect and classify these attributes, enabling applications such as biometric authentication, demographic analysis, or personalized user interactions. The method ensures robust recognition by leveraging machine learning or computer vision techniques to analyze facial features and contextual information. By identifying multiple attributes simultaneously, the system provides a detailed and nuanced understanding of the face, improving accuracy in applications requiring facial recognition or analysis. The invention enhances existing facial recognition systems by expanding the range of detectable attributes, making it useful in security, marketing, healthcare, and human-computer interaction fields.
8. The computer-implemented method of claim 1 , wherein the second machine learning model comprises a super-resolution generator.
The invention relates to a computer-implemented method for enhancing image resolution using machine learning models. The method addresses the problem of low-resolution images by leveraging deep learning techniques to improve visual quality. The system employs a first machine learning model to analyze input images and a second machine learning model, specifically a super-resolution generator, to upscale and refine the images. The super-resolution generator is trained to reconstruct high-resolution details from low-resolution inputs, producing sharper and more detailed outputs. The method may also include preprocessing steps to condition the input data and post-processing steps to further enhance the generated images. The super-resolution generator operates by learning intricate mappings between low-resolution and high-resolution image pairs, utilizing convolutional neural networks or similar architectures. The invention aims to provide an efficient and accurate solution for image super-resolution, applicable in fields such as medical imaging, surveillance, and digital photography. The method ensures that the generated high-resolution images retain natural textures and avoid artifacts commonly associated with traditional interpolation techniques.
9. The computer-implemented method of claim 1 , wherein the first 3D geometry associated with the digital face conveys a non-neutral facial expression.
This invention relates to computer-implemented methods for generating or processing digital facial representations, specifically focusing on conveying non-neutral facial expressions. The method involves creating or modifying a first 3D geometry associated with a digital face, where this geometry is designed to represent a non-neutral facial expression. Non-neutral expressions include any deviation from a neutral or resting facial state, such as smiling, frowning, or other expressive configurations. The method may also involve generating or adjusting additional 3D geometries to represent different expressions or transitions between expressions. The system may use these geometries to animate the digital face, ensuring realistic and dynamic expression rendering. The approach may integrate with facial recognition, animation, or virtual avatar systems, enhancing the ability to depict nuanced emotions or reactions in digital environments. The invention addresses the challenge of accurately capturing and reproducing expressive facial movements in digital representations, improving realism and emotional conveyance in applications like gaming, virtual reality, or communication platforms.
10. A non-transitory computer-readable storage medium including instructions that, when executed by a processing unit, cause the processing unit to perform steps for rendering one or more images of a digital face, the steps comprising: generating, via a first machine learning model, one or more first appearance maps based on a user selection of one or more styles associated with one or more attributes of a digital face; generating, via a second machine learning model, one or more second appearance maps and a first three-dimensional (3D) geometry associated with the digital face based on jointly upsampling the one or more first appearance maps and a second 3D geometry associated with the digital face, wherein the one or more second appearance maps are aligned with the first 3D geometry; and rendering one or more images including the digital face based on the one or more second appearance maps and the first 3D geometry.
This invention relates to digital face rendering using machine learning models to generate stylized and geometrically accurate 3D facial images. The problem addressed is the difficulty in creating high-quality, stylized digital faces that maintain realistic geometry while allowing customization of appearance attributes. The system uses a first machine learning model to generate appearance maps based on user-selected styles for facial attributes. These appearance maps encode visual features like texture, color, or stylistic variations. A second machine learning model then processes these appearance maps alongside an existing 3D geometry of the face, jointly upsampling them to produce refined appearance maps and an updated 3D geometry. The refined appearance maps are aligned with the new geometry, ensuring consistency between visual details and facial structure. Finally, the system renders images of the digital face using the refined appearance maps and updated 3D geometry, resulting in a high-quality, stylized representation that retains realistic proportions and depth. This approach enables dynamic customization of facial appearances while preserving geometric accuracy, useful in applications like virtual avatars, animation, or digital art. The use of separate models for appearance and geometry allows for flexible styling without compromising structural integrity.
11. The computer-readable storage medium of claim 10 , wherein generating the one or more first appearance maps based on the user selection of one or more styles comprises controlling one or more adaptive instance normalization (AdaIN) operations based on the user selection of one or more styles.
This invention relates to computer-generated image style transfer, specifically improving the process of applying artistic styles to images based on user input. The problem addressed is the lack of precise control over style application in existing methods, where users often cannot selectively apply or adjust styles to achieve desired visual effects. The solution involves generating appearance maps that define how styles are applied to different regions of an image, with the ability to modify these maps based on user selections. The system uses adaptive instance normalization (AdaIN) operations, which adjust image features to match the statistical properties of selected styles. By controlling these operations, the method allows users to fine-tune style transfer by selecting specific styles or regions for modification. The approach ensures that the applied styles are dynamically adjusted to maintain coherence with the original image structure while allowing customization. This technique enhances user control over artistic transformations, making style transfer more flexible and intuitive. The invention is particularly useful in applications like digital art, photo editing, and content creation where precise style manipulation is required.
12. The computer-readable storage medium of claim 11 , wherein the first machine learning model comprises a plurality of semantics transfer blocks, and generating the one or more first appearance maps based on the user selection of one or more styles includes, for each semantics transfer block included in the plurality of semantics transfer blocks: performing multiple sets of convolution operations and AdaIN operations in parallel; determining a weighted sum of outputs of the multiple sets of convolution operations and AdaIN operations; and upscaling the weighted sum of outputs.
This invention relates to computer vision and machine learning techniques for style transfer in images. The problem addressed is efficiently applying user-selected artistic styles to images while preserving semantic content. The solution involves a machine learning model with multiple semantics transfer blocks, each performing parallel convolution and AdaIN (Adaptive Instance Normalization) operations to blend style and content features. Each block computes a weighted sum of these operations and upscales the result, allowing progressive refinement of the stylized output. The model processes user-selected styles to generate appearance maps that guide the transformation, ensuring consistent and high-quality style application. This approach improves over prior methods by using parallel processing and weighted fusion to enhance computational efficiency and visual coherence. The invention is particularly useful in applications requiring real-time or high-fidelity artistic image synthesis, such as digital art tools, design software, and content creation platforms. The system automates style transfer while maintaining semantic integrity, reducing manual adjustments and improving user experience.
13. The computer-readable storage medium of claim 12 , wherein one or more weights used in the weighted sum of outputs, the multiple sets of convolution operations and AdaIN operations, one or more initial tensors, and one or more style vectors associated with the one or more attributes of the digital face are determined while training the first machine learning model.
This invention relates to machine learning techniques for digital face manipulation, specifically using a trained model to modify facial attributes while preserving identity. The problem addressed is the challenge of altering specific facial features (e.g., age, expression, or style) without distorting the underlying identity or introducing artifacts. The solution involves a neural network architecture that combines convolutional operations, adaptive instance normalization (AdaIN), and weighted summation of outputs to achieve controlled attribute modifications. During training, the model learns optimal weights for combining intermediate outputs, initial tensors, and style vectors that encode the desired facial attributes. The trained model can then generate modified digital faces by applying these learned parameters, ensuring realistic and identity-preserving transformations. The approach leverages multiple sets of convolutional and AdaIN operations to capture diverse facial variations while maintaining coherence. The key innovation lies in the joint optimization of these components during training, enabling precise and stable attribute editing. This method is particularly useful in applications like virtual avatars, facial animation, and digital content creation where maintaining identity consistency is critical.
14. The computer-readable storage medium of claim 10 , wherein the first machine learning model is trained using a progressive training technique.
A system and method for training machine learning models using a progressive training technique to improve accuracy and efficiency in predictive tasks. The system involves a first machine learning model that is incrementally trained using a progressive training approach, where the model is updated in stages rather than in a single batch. This technique allows the model to adapt to new data more effectively, reducing the risk of overfitting and improving generalization. The progressive training method may involve adjusting the model's parameters in small, controlled steps, incorporating feedback from intermediate evaluations to refine the training process. The system may also include a second machine learning model that operates in conjunction with the first model, where the second model may be trained using a different technique or dataset to complement the first model's predictions. The combined output from both models may be used to generate more accurate and reliable results. This approach is particularly useful in applications where data is continuously updated or where real-time predictions are required, such as in financial forecasting, healthcare diagnostics, or autonomous systems. The progressive training technique ensures that the model remains up-to-date with the latest data trends while maintaining high performance.
15. The computer-readable storage medium of claim 10 , wherein the second machine learning model is trained using ground truth and adversarial learning techniques.
This invention relates to machine learning systems, specifically improving the robustness of machine learning models against adversarial attacks. The problem addressed is the vulnerability of machine learning models to adversarial inputs, which are carefully crafted inputs designed to cause the model to make incorrect predictions. Traditional models trained only on ground truth data often fail under such attacks, leading to security and reliability issues in applications like autonomous systems, fraud detection, and cybersecurity. The invention involves a computer-readable storage medium storing instructions for training a second machine learning model using both ground truth data and adversarial learning techniques. The second model is trained to recognize and resist adversarial inputs by incorporating adversarial examples during training. This approach enhances the model's robustness, making it less susceptible to manipulation. The system may also include a first machine learning model trained on ground truth data alone, which serves as a baseline for comparison or as part of a multi-model ensemble. The adversarial learning process involves generating adversarial examples by perturbing input data to deceive the model and then training the second model to correctly classify these perturbed examples. This dual-training approach improves the model's ability to generalize and maintain accuracy under attack conditions. The invention is particularly useful in security-critical applications where model reliability is paramount.
16. The computer-readable storage medium of claim 10 , wherein the one or more first appearance maps have a lower resolution than the one or more second appearance maps, and the second 3D geometry has a lower resolution than the first 3D geometry.
This invention relates to computer graphics and 3D modeling, specifically addressing the challenge of efficiently representing and rendering 3D objects with varying levels of detail. The system generates multiple appearance maps and 3D geometry representations for a 3D object, where the appearance maps capture visual characteristics such as texture, color, or material properties. The system produces at least one first appearance map and one second appearance map, along with corresponding 3D geometries. The first appearance maps and geometries are higher resolution, providing detailed visual and structural information, while the second appearance maps and geometries are lower resolution, offering a simplified representation. This dual-resolution approach allows for adaptive rendering, where the system can switch between high and low-resolution representations based on factors like viewing distance, computational resources, or rendering requirements. The lower-resolution maps and geometries reduce processing demands while maintaining acceptable visual quality, improving performance in applications like real-time rendering, virtual reality, or interactive simulations. The invention enables efficient storage and rendering of 3D objects by dynamically adjusting detail levels, balancing visual fidelity and computational efficiency.
17. The computer-readable storage medium of claim 10 , wherein the one or more second appearance maps include at least one of a texture map, a normal map, or a specular roughness map.
This invention relates to computer graphics and rendering, specifically improving the efficiency and quality of rendering three-dimensional (3D) models by generating and utilizing appearance maps. The problem addressed is the computational cost and complexity of real-time rendering, particularly when dealing with detailed surface properties like textures, normals, and specular roughness. These properties are essential for realistic lighting and shading but traditionally require significant processing power. The invention involves a method for generating and applying appearance maps to enhance rendering performance. These maps include texture maps, normal maps, and specular roughness maps, which encode surface details such as color, surface orientation, and material properties. By precomputing these maps, the rendering process can efficiently access this data during real-time rendering, reducing the need for on-the-fly calculations. The system generates these maps based on input data, such as 3D model geometry and material properties, and stores them in a computer-readable storage medium. During rendering, the system retrieves and applies these maps to the 3D model, improving visual fidelity while minimizing computational overhead. This approach is particularly useful in applications like video games, virtual reality, and real-time simulations where performance and visual quality are critical.
18. The computer-readable storage medium of claim 10 , wherein the one or more attributes of the digital face include at least one of expression, gender, age, identity, or ethnicity.
This invention relates to digital face analysis, specifically a system for extracting and processing attributes from digital face images. The technology addresses the challenge of accurately identifying and categorizing facial features in digital images, which is critical for applications like security, biometrics, and user authentication. The system captures a digital face image and analyzes it to determine one or more attributes, such as facial expression, gender, age, identity, or ethnicity. These attributes are then used to generate a digital representation of the face, which can be stored, compared, or further processed. The analysis may involve machine learning models or other computational techniques to extract and classify these attributes with high precision. The system ensures that the digital representation retains the key characteristics of the original face while enabling efficient storage and retrieval. This approach improves the accuracy and reliability of facial recognition systems, making them more suitable for real-world applications where identity verification and personalization are essential. The invention enhances existing face analysis methods by expanding the range of detectable attributes, allowing for more comprehensive and nuanced facial data extraction.
19. The computer-readable storage medium of claim 10 , wherein the second machine learning model comprises a super-resolution generator.
A system and method for enhancing image resolution using machine learning techniques addresses the challenge of improving image quality in low-resolution inputs. The invention involves a first machine learning model that processes an input image to generate a feature representation, which is then used by a second machine learning model to produce a high-resolution output. The second model includes a super-resolution generator, a neural network architecture specifically designed to upscale images while preserving or enhancing details. This generator may employ techniques such as convolutional neural networks (CNNs) or generative adversarial networks (GANs) to reconstruct fine textures and reduce artifacts. The system may also incorporate additional components, such as a discriminator network to refine the output or a loss function that balances perceptual quality with fidelity to the original image. The invention is particularly useful in applications like medical imaging, surveillance, and digital photography, where high-resolution images are critical but hardware limitations or data constraints make traditional upscaling methods inadequate. By leveraging deep learning, the system achieves superior results compared to interpolation-based approaches.
20. A computing device comprising: a memory storing an application; and a processor coupled to the memory, wherein when executed by the processor, the application causes the processor to: generate, via a first machine learning model, one or more first appearance maps based on a user selection of one or more styles associated with one or more attributes of a digital face; generate, via a second machine learning model, one or more second appearance maps and a first three-dimensional (3D) geometry associated with the digital face based on jointly upsampling the one or more first appearance maps and a second 3D geometry associated with the digital face, wherein the one or more second appearance maps are aligned with the first 3D geometry; and render one or more images including the digital face based on the one or more second appearance maps and the first 3D geometry.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
June 8, 2020
February 22, 2022
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.