Patentable/Patents/US-20260057597-A1

US-20260057597-A1

Generation of Texture Data Based on Pairs of Multi-View Digital Images

PublishedFebruary 26, 2026

Assigneenot available in USPTO data we have

InventorsRomain Rouffet Vladimir Kim Valentin Deschaintre Thibault Groueix Rosalie Martin+2 more

Technical Abstract

A texture data generation computing system generates texture data for 3D digital objects based on pairs of multi-view digital images. A rendering engine generates a multi-view rendered image including a set of rendered views depicting a 3D digital object. A diffusion image generation model generates a multi-view diffusion-generated image including a set of diffusion-generated views depicting the 3D digital object with a visual appearance. In addition, the diffusion image generation model determines, for each diffusion-generated view, a respective cross-frame attention feature set describing additional diffusion-generated views. Based on a texture depicted in the set of diffusion-generated views, the texture data generation computing system modifies a texture data object. In some cases, the texture data generation computing system provides the modified texture data object to an additional computing system configured to modify a digital graphical environment based on the texture data object.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

receiving appearance input data and a three-dimensional (“3D”) mesh describing a digital object; rendering, via a rendering engine, a first multi-view rendered image of the digital object, wherein the first multi-view rendered image includes a first set of multiple rendered views depicting the digital object and excluding the appearance input data; generating, via a trained neural network implementing a diffusion model, a second multi-view diffusion-generated image of the digital object, wherein the second multi-view diffusion-generated image includes a second set of multiple diffusion-generated views depicting the digital object having an initial texture, wherein the trained neural network generates the second multi-view diffusion-generated image of the digital object based on a combination of the first multi-view rendered image and the appearance input data; performing a first modification to a texture data object to describe the initial texture depicted in the second multi-view diffusion-generated image, wherein the first modified texture data object includes first data values that are calculated based on the initial texture; and providing the first modified texture data object to an additional computing component configured to, responsive to receiving the first modified texture data object, render the digital object having the initial texture described by the first modified texture data object. . A method for generating a texture data object, the method comprising:

claim 1 generating a mask image based on the first multi-view rendered image, wherein the mask image includes multiple mask regions; and generating a noisy image, wherein the noisy image includes multiple noisy regions, wherein, in the first multi-view rendered image, each particular rendered view included in the first set of multiple rendered views corresponds to i) a respective mask region of the multiple mask regions and ii) a respective noisy region of the multiple noisy regions, wherein the trained neural network implementing the diffusion model is further configured for: determining, for each respective noisy region of the multiple noisy regions, a respective set of cross-frame attention features, the respective set of cross-frame attention features including at least one cross-frame attention feature for one or more additional noisy region of the multiple noisy regions; and modifying each respective noisy region based on the respective set of cross-frame attention features, wherein, in the second multi-view diffusion-generated image, each particular diffusion-generated view included in the second set of multiple diffusion-generated views depicts a respective initial texture that is generated based on a corresponding set of cross-frame attention features for a corresponding noisy region of the multiple noisy regions. . The method of, further comprising:

claim 1 determining a respective texture data value describing a respective initial texture depicted by the particular diffusion-generated view; and calculating a respective average data value that is based on a combination of i) the respective texture data value associated with the particular diffusion-generated view and ii) at least one additional respective texture data value associated with at least one additional particular diffusion-generated view in the second set of multiple diffusion-generated views, for each particular diffusion-generated view in the second set of multiple diffusion-generated views: wherein the first data values are calculated based on the respective average data value for each particular diffusion-generated view in the second set of multiple diffusion-generated views. . The method of, wherein performing the first modification to the texture data object further comprises:

claim 1 rendering, via the rendering engine, a third multi-view rendered image of the digital object, wherein the third multi-view rendered image includes a third set of multiple rendered views depicting the digital object having the initial texture described by the first modified texture data object; generating, via the trained neural network implementing the diffusion model, a fourth multi-view diffusion-generated image of the digital object, wherein the fourth multi-view diffusion-generated image includes a fourth set of multiple diffusion-generated views depicting the digital object having a refined texture; and performing a second modification to the first modified texture data object to describe the refined texture depicted in the fourth multi-view diffusion-generated image, wherein the second modified texture data object includes second data values that are calculated based on the refined texture. . The method of, further comprising:

claim 4 . The method of, wherein the trained neural network generates the fourth multi-view diffusion-generated image of the digital object based on a denoising technique applied to the third multi-view rendered image.

claim 4 . The method of, wherein the rendering engine is configured to render the first set of multiple rendered views and the third set of multiple rendered views using a same set of viewpoints of the 3D mesh.

claim 4 identifying, from the fourth set of multiple diffusion-generated views, a particular diffusion-generated view having a viewing direction that is within a similarity threshold to a normal of the particular triangle; and determining a respective texture data value describing a respective refined texture depicted by the particular diffusion-generated view, for each particular triangle included in the 3D mesh: wherein the second data values are calculated based on the respective texture data value for each particular triangle included in the 3D mesh. . The method of, wherein performing the second modification to the first modified texture data object further comprises:

claim 4 rendering, via the rendering engine, a sampling set of multiple rendered views depicting the 3D mesh for the digital object having the refined texture described by the second modified texture data object; selecting, from the sampling set, at least one rendered view that is identified as omitting the refined texture; generating, via the trained neural network implementing the diffusion model, an additional image depicting an additional diffusion-generated view, the additional diffusion-generated view depicting the digital object having an additional texture, wherein the trained neural network generates the additional image based on a combination of the at least one rendered view and the refined texture; and performing a third modification to the second modified texture data object to describe the additional texture depicted in the additional image, wherein the third modified texture data object includes third data values that are calculated based on the additional texture. . The method of, further comprising:

rendering a first multi-view rendered image of a digital object described by a three-dimensional (“3D”) mesh, wherein the first multi-view rendered image includes a first set of multiple rendered views depicting the digital object; and a rendering engine configured for: generating a second multi-view diffusion-generated image of the digital object, wherein the second multi-view diffusion-generated image includes a second set of multiple diffusion-generated views depicting the digital object having an initial texture, wherein the trained neural network generates the second multi-view diffusion-generated image of the digital object based on a combination of the first multi-view rendered image and appearance input data; a trained neural network implementing a diffusion model, the trained neural network configured for: performing a first modification to a texture data object to describe the initial texture depicted in the second multi-view diffusion-generated image, wherein the first modified texture data object includes first data values that are calculated based on the initial texture; and providing the first modified texture data object to an additional computing component configured to, responsive to receiving the first modified texture data object, render the digital object having the initial texture described by the first modified texture data object. the system being configured for: . A system for generating a texture data object, the system comprising:

claim 9 generating a mask image based on the first multi-view rendered image, wherein the mask image includes multiple mask regions; and generating a noisy image, wherein the noisy image includes multiple noisy regions, wherein, in the first multi-view rendered image, each particular rendered view included in the first set of multiple rendered views corresponds to i) a respective mask region of the multiple mask regions and ii) a respective noisy region of the multiple noisy regions, wherein the trained neural network implementing the diffusion model is further configured for: determining, for each respective noisy region of the multiple noisy regions, a respective set of cross-frame attention features, the respective set of cross-frame attention features including at least one cross-frame attention feature for one or more additional noisy region of the multiple noisy regions; and modifying each respective noisy region based on the respective set of cross-frame attention features, wherein, in the second multi-view diffusion-generated image, each particular diffusion-generated view included in the second set of multiple diffusion-generated views depicts a respective initial texture that is generated based on a corresponding set of cross-frame attention features for a corresponding noisy region of the multiple noisy regions. . The system of, the system being further configured for:

claim 9 determining a respective texture data value describing a respective initial texture depicted by the particular diffusion-generated view; and calculating a respective average data value that is based on a combination of i) the respective texture data value associated with the particular diffusion-generated view and ii) at least one additional respective texture data value associated with at least one additional particular diffusion-generated view in the second set of multiple diffusion-generated views, for each particular diffusion-generated view in the second set of multiple diffusion-generated views: wherein the first data values are calculated based on the respective average data value for each particular diffusion-generated view in the second set of multiple diffusion-generated views. . The system of, wherein performing the first modification to the texture data object further comprises:

claim 9 the rendering engine is further configured for rendering a third multi-view rendered image of the digital object, wherein the third multi-view rendered image includes a third set of multiple rendered views depicting the digital object having the initial texture described by the first modified texture data object; the trained neural network implementing the diffusion model is further configured for generating a fourth multi-view diffusion-generated image of the digital object, wherein the fourth multi-view diffusion-generated image includes a fourth set of multiple diffusion-generated views depicting the digital object having a refined texture; and the system is further configured for performing a second modification to the first modified texture data object to describe the refined texture depicted in the fourth multi-view diffusion-generated image, wherein the second modified texture data object includes second data values that are calculated based on the refined texture. . The system of, wherein:

claim 12 identifying, from the fourth set of multiple diffusion-generated views, a particular diffusion-generated view having a viewing direction that is within a similarity threshold to a normal of the particular triangle; and determining a respective texture data value describing a respective refined texture depicted by the particular diffusion-generated view, for each particular triangle included in the 3D mesh: wherein the second data values are calculated based on the respective texture data value for each particular triangle included in the 3D mesh. . The system of, wherein performing the second modification to the first modified texture data object further comprises:

claim 12 the rendering engine is further configured for rendering a sampling set of multiple rendered views depicting the 3D mesh for the digital object having the refined texture described by the second modified texture data object; the system is further configured for selecting, from the sampling set, at least one rendered view that is identified as omitting the refined texture; the trained neural network implementing the diffusion model is further configured for generating an additional image depicting an additional diffusion-generated view, the additional diffusion-generated view depicting the digital object having an additional texture, wherein the trained neural network generates the additional image based on a combination of the at least one rendered view and the refined texture; and the system is further configured for performing a third modification to the second modified texture data object to describe the additional texture depicted in the additional image, wherein the third modified texture data object includes third data values that are calculated based on the additional texture. . The system of, wherein:

receiving appearance input data and a three-dimensional (“3D”) mesh describing a digital object; rendering, via a rendering engine, a first multi-view rendered image of the digital object, wherein the first multi-view rendered image includes a first set of multiple rendered views depicting the digital object and excluding the appearance input data; generating, via a trained neural network implementing a diffusion model, a second multi-view diffusion-generated image of the digital object, wherein the second multi-view diffusion-generated image includes a second set of multiple diffusion-generated views depicting the digital object having an initial texture, wherein the trained neural network generates the second multi-view diffusion-generated image of the digital object based on a combination of the first multi-view rendered image and the appearance input data; performing a first modification to a texture data object to describe the initial texture depicted in the second multi-view diffusion-generated image, wherein the first modified texture data object includes first data values that are calculated based on the initial texture; and providing the first modified texture data object to an additional computing component configured to, responsive to receiving the first modified texture data object, render the digital object having the initial texture described by the first modified texture data object. . A non-transitory computer-readable medium embodying program code for generating a texture data object, the program code comprising instructions which, when executed by a processor, cause the processor to perform:

claim 15 generating a mask image based on the first multi-view rendered image, wherein the mask image includes multiple mask regions; and generating a noisy image, wherein the noisy image includes multiple noisy regions, wherein, in the first multi-view rendered image, each particular rendered view included in the first set of multiple rendered views corresponds to i) a respective mask region of the multiple mask regions and ii) a respective noisy region of the multiple noisy regions, wherein the trained neural network implementing the diffusion model is further configured for: determining, for each respective noisy region of the multiple noisy regions, a respective set of cross-frame attention features, the respective set of cross-frame attention features including at least one cross-frame attention feature for one or more additional noisy region of the multiple noisy regions; and modifying each respective noisy region based on the respective set of cross-frame attention features, wherein, in the second multi-view diffusion-generated image, each particular diffusion-generated view included in the second set of multiple diffusion-generated views depicts a respective initial texture that is generated based on a corresponding set of cross-frame attention features for a corresponding noisy region of the multiple noisy regions. . The non-transitory computer-readable medium of, the program code further comprising instructions which cause the processor to perform:

claim 15 determining a respective texture data value describing a respective initial texture depicted by the particular diffusion-generated view; and calculating a respective average data value that is based on a combination of i) the respective texture data value associated with the particular diffusion-generated view and ii) at least one additional respective texture data value associated with at least one additional particular diffusion-generated view in the second set of multiple diffusion-generated views, for each particular diffusion-generated view in the second set of multiple diffusion-generated views: wherein the first data values are calculated based on the respective average data value for each particular diffusion-generated view in the second set of multiple diffusion-generated views. . The non-transitory computer-readable medium of, wherein performing the first modification to the texture data object further comprises:

claim 15 rendering, via the rendering engine, a third multi-view rendered image of the digital object, wherein the third multi-view rendered image includes a third set of multiple rendered views depicting the digital object having the initial texture described by the first modified texture data object; generating, via the trained neural network implementing the diffusion model, a fourth multi-view diffusion-generated image of the digital object, wherein the fourth multi-view diffusion-generated image includes a fourth set of multiple diffusion-generated views depicting the digital object having a refined texture; and performing a second modification to the first modified texture data object to describe the refined texture depicted in the fourth multi-view diffusion-generated image, wherein the second modified texture data object includes second data values that are calculated based on the refined texture. . The non-transitory computer-readable medium of, the program code further comprising instructions which cause the processor to perform:

claim 18 identifying, from the fourth set of multiple diffusion-generated views, a particular diffusion-generated view having a viewing direction that is within a similarity threshold to a normal of the particular triangle; and determining a respective texture data value describing a respective refined texture depicted by the particular diffusion-generated view, for each particular triangle included in the 3D mesh: wherein the second data values are calculated based on the respective texture data value for each particular triangle included in the 3D mesh. . The non-transitory computer-readable medium of, wherein performing the second modification to the first modified texture data object further comprises:

claim 18 rendering, via the rendering engine, a sampling set of multiple rendered views depicting the 3D mesh for the digital object having the refined texture described by the second modified texture data object; selecting, from the sampling set, at least one rendered view that is identified as omitting the refined texture; generating, via the trained neural network implementing the diffusion model, an additional image depicting an additional diffusion-generated view, the additional diffusion-generated view depicting the digital object having an additional texture, wherein the trained neural network generates the additional image based on a combination of the at least one rendered view and the refined texture; and performing a third modification to the second modified texture data object to describe the additional texture depicted in the additional image, wherein the third modified texture data object includes third data values that are calculated based on the additional texture. . The non-transitory computer-readable medium of, the program code further comprising instructions which cause the processor to perform:

Detailed Description

Complete technical specification and implementation details from the patent document.

This disclosure relates generally to the field of texturing three-dimensional digital objects, and more specifically relates to generating texture maps via neural network models.

A three-dimensional (“3D”) digital object includes texture data, which provides an appearance for the 3D digital object. The texture data, such as a texture map, can be applied to the 3D digital object during generation, e.g., rendering, of the 3D digital object. In some cases, the 3D digital object with the texture data is included in a digital graphical environment, such as a computer-implemented game, a virtual reality (“VR”) environment, or other types of digital graphical environments.

In some cases, it is desirable for a 3D digital object to have high-quality texture data that provides a particular appearance, such as a realistic appearance or an appearance with a particular artistic style. Contemporary techniques for generating high-quality texture data often rely on extensive manual effort to modify a texture map or other texture data, such as manual “painting” techniques for modifying individual areas of a texture map. However, using manual effort to generate high-quality texture data can be inefficient, requiring a large expenditure of time by one or more highly trained specialists, such as a graphical design specialist.

According to certain embodiments, a texture data generation computing system generates texture data for 3D digital objects. A rendering engine included in the texture data generation computing system generates at least one multi-view rendered digital image including a set of rendered views depicting a 3D digital object. Based on the multi-view rendered digital image, a diffusion image generation model included in the texture data generation computing system generates at least one multi-view diffusion-generated digital image including a set of diffusion-generated views depicting the 3D digital object with a requested visual appearance. In addition, the diffusion image generation model determines, for each diffusion-generated view, a respective cross-frame attention feature set that describes relationships among additional diffusion-generated views in the set. Based on the at least one multi-view diffusion-generated digital image, the texture data generation computing system generates or modifies a texture data object, such as by calculating texture data values based on a texture depicted in the set of diffusion-generated views of the 3D digital object. In some cases, the texture data generation computing system provides the modified texture data object, or a textured 3D digital object based on the modified texture data object, to an additional computing system that is configured to modify a digital graphical environment based on the modified texture data object or the textured 3D digital object. In some cases, the texture data generation computing system performs multiple passes of texture-generation techniques based on pairs of multi-view digital images, such as multiple pairs of multi-view rendered digital images with corresponding multi-view diffusion-generated digital images.

These illustrative embodiments are mentioned not to limit or define the disclosure, but to provide examples to aid understanding thereof. Additional embodiments are discussed in the Detailed Description, and further description is provided there.

As discussed above, prior techniques for generating texture data for three-dimensional (“3D”) digital objects are inefficient, relying on extensive manual effort. In some cases, such manual effort is often provided by human specialists, such as highly skilled specialists who are trained in graphical design, 3D texture mapping, or other skill sets that are related to generating texture data. In addition, utilizing manual effort is costly, and may require a relatively large expenditure of financial resources (e.g., payment for the highly skilled specialists) and computing resources (e.g., individual computing workstations for the highly skilled specialists). Additionally or alternatively, contemporary approaches for generating texture data have attempted to generate two-dimensional (“2D”) images that could be applied to various regions, e.g., “stitched” regions, on a 3D digital object. However, contemporary approaches using stitched 2D images may fail to eliminate visual boundaries between regions. For example, a 3D digital object generated using the contemporary texture map with visual boundaries may have a poor appearance in digital graphical environments, such as a poor appearance that includes visible lines in inappropriate locations on the 3D digital object or inconsistent colors at the edges of the stitched 2D images.

Certain embodiments described herein provide for a texture data generation computing system that generates texture data for 3D digital objects based on one or more pairs of multi-view digital images, such as an image pair including a multi-view rendered digital image and a corresponding multi-view diffusion-generated digital image. In this example, the multi-view rendered digital image depicts multiple rendered views of an untextured 3D digital object, e.g., the 3D digital object lacks texture data. A trained neural network included in the texture data generation computing system, such as a diffusion image generation model, is configured to generate the multi-view diffusion-generated digital image based on the multi-view rendered digital image, or a combination of the multi-view rendered digital image with additional data. In some cases, the trained neural network generates the multi-view diffusion-generated digital image based on the multi-view rendered digital image combined with appearance input data that describes a requested visual appearance for the 3D digital object. In addition, the trained neural network determines one or more sets of cross-frame attention features for the multi-view diffusion-generated digital image, such as a particular cross-frame attention feature set for each diffusion-generated view corresponding to a particular rendered view. In some cases, determining the sets of cross-frame attention features improves consistency among the multiple diffusion-generated views, such as by improving consistent calculation of data values (e.g., color data values, brightness data values) that depict the appearance of the 3D digital object in the diffusion-generated views. Based on the multi-view diffusion-generated digital image, the texture data generation computing system generates or modifies a texture data object to include texture data indicating the appearance of the diffusion-generated views, such as a texture map that indicates color or other texture values calculated from the diffusion-generated views. In some cases, generating or modifying a texture data object based on the multi-view diffusion-generated digital image improves consistency of the generated texture data, such as by improving consistent calculation of texture data values included in the texture data object. For example, the texture data object can include texture elements (“texels”) having texture data values (e.g., texel color values). In addition, the texture data generation computing system modifies, or causes a modification to, one or more digital graphical environments based on the texture data object. For example, the texture data generation computing system provides the texture data object to one or more additional computing systems that are configured to generate, for presentation in a digital graphical environment, at least one textured 3D digital object having the texture described by the texture data object. Additionally or alternatively, the texture data generation computing system generates at least one textured 3D digital object having the texture described by the texture data object and provides the textured 3D digital object to one or more additional computing systems that are configured to modify a digital graphical environment to include the textured 3D digital object. In some cases, the example texture data generation computing system provides the texture data object with improved appearance, e.g., as compared to contemporary approaches using stitched 2D images, while reducing expenditure of resources, e.g., computing or financial resources related to manual efforts to generate texture data.

The following examples are provided to introduce certain embodiments of the present disclosure. A texture data generation computing system receives 3D mesh data describing a 3D digital object, and appearance input data describing a requested visual appearance of the 3D digital object, such as text data indicating a requested color, shininess, or other appearance characteristics. Based on the 3D mesh data, a rendering engine included in the texture data generation computing system generates a 2D multi-view rendered digital image that includes a set of multiple rendered views of the 3D digital object. Each of the rendered view in the set depicts the 3D digital object from a particular viewpoint, such that each rendered view depicts the 3D digital object from a different viewing angle and from a similar viewing distance. In a first pass of texture-generation techniques by the texture data generation computing system, the multi-view rendered digital image depicts the 3D digital object as untextured, e.g., without texture data.

Continuing with this example, a trained diffusion image generation model included in the texture data generation computing system generates a 2D multi-view diffusion-generated digital image that includes an additional set of multiple diffusion-generated views of the 3D digital object having the requested visual appearance. For example, the trained diffusion image generation model generates the multi-view diffusion-generated digital image based on a combination of the appearance input data with the multi-view rendered digital image. In some cases, the trained diffusion image generation model applies one or more diffusion image generation techniques to generate the multi-view diffusion-generated digital image, such as diffusion techniques to denoise a noisy image. Examples of diffusion techniques can include stable diffusion, blended diffusion, or other diffusion techniques to generate images. Based on the multiple rendered views, the trained diffusion image generation model modifies corresponding noisy regions of a noisy image, such as modifications via one or more denoising techniques. In addition, each of the modified noisy regions corresponding to a particular rendered view is modified (e.g., iterative denoising modifications) to depict a particular corresponding diffusion-generated view. For example, each rendered view in the set of multiple rendered views corresponds to a particular noisy region and a particular diffusion-generated view in the set of multiple diffusion-generated views. In some cases, the trained diffusion image generation model determines multiple cross-frame attention feature sets that describe relationships among features (e.g., image features) of the multiple diffusion-generated views or corresponding noisy regions, such that each cross-frame attention feature set corresponds to a particular diffusion-generated view and corresponding noisy region. In addition, the trained diffusion image generation model generates each of the cross-frame attention feature sets based on features of additional diffusion-generated views from the set, such that for a particular diffusion-generated view, the trained diffusion image generation model generates the corresponding cross-frame attention feature set based on image features of one or more additional diffusion-generated views or additional noisy regions that exclude the particular diffusion-generated view and particular corresponding noisy region.

Based on the multi-view diffusion-generated digital image, the example texture data generation computing system generates or modifies a texture data object, such as a texture map. For example, the texture data generation computing system calculates one or more texture data values that describe the visual appearance depicted in one or more of the multiple diffusion-generated views in the multi-view diffusion-generated digital image. The texture data values are calculated based on the visual appearance of each of the multiple diffusion-generated views. In addition, the calculated texture data values can describe texture for various regions of the 3D digital object, such as the regions depicted in the multiple rendered views from various viewpoints of the untextured 3D digital object. In some cases, the texture data generation computing system modifies one or more texels in the texture data object based on the calculated texture data values.

Continuing with this example, the texture data generation computing system modifies, or causes a modification to, one or more digital graphical environments based on the generated or modified texture data object. For example, the texture data generation computing system provides the texture data object to one or more additional computing systems. Responsive to receiving the texture data object, the one or more additional computing systems are configured to generate at least one textured 3D digital object having the texture described by the texture data object, such as for presentation in a digital graphical environment. Additionally or alternatively, the texture data generation computing system generates at least one textured 3D digital object having the texture described by the texture data object and provides the textured 3D digital object to one or more additional computing systems. Responsive to receiving the textured 3D digital object, the one or more additional computing systems are configured to modify a digital graphical environment to include the textured 3D digital object.

Certain embodiments described herein provide improvements to techniques for generating texture data for 3D digital object and improvements for computing systems for generating texture data. For example, a texture data generation computing system described herein applies particular rules to determine features of multiple image regions in a digital image, such as noisy regions in a noisy image or regions depicting respective views in a multi-view digital image (e.g., rendered or diffusion-generated). Additionally or alternatively, a texture data generation computing system described herein generates, for each particular image region in a digital image, a respective set of cross-frame attention features by applying additional particular rules to determine a set or subset of additional image regions from which the respective set of cross-frame attention features is determined. By applying the particular rules or the additional particular rules, a texture data generation computing system described herein generates or modifies multiple data structures related to a computer-implemented field of generating textured 3D digital objects, such as texture maps, texels, data values describing 3D digital objects, or other data structures or data values related to generating textured 3D digital objects. In some cases, the application of these rules by the texture data generation computing system achieves an improved technological result, such as improving consistency of appearance (or image data depicting appearance) among multiple image regions in a digital image, such as improved consistency among multiple diffusion-generated views in a multi-view diffusion-generated digital image. Additionally or alternatively, the application of these rules by the texture data generation computing system achieves an improved technological result by reducing expenditure of resources for generating texture data, such as reducing expenditure of time, financial, and computing resources related to manual efforts to generate texture data. Furthermore, the application of these rules by the texture data generation computing system achieves an improved outcome in a technical field, such as an improvement in visual appearance in a technical field of generating textured 3D digital objects.

In some cases, the described techniques for generating texture data improve efficiency of a computing system that implements one or more of the techniques, such as reducing usage of computing resources as compared to contemporary techniques for sequentially determining texture data for multiple portions of a 3D digital object. For example, a texture data generation computing system described herein determines texture data for multiple portions of a 3D digital object by utilizing memory and processing resources to analyze a particular pair of a multi-view rendered digital image and a multi-view diffusion-generated digital image. The memory and processing resources used by the described texture data generation computing system are reduced as compared to a contemporary computing system configured for sequential analysis of multiple images of the portions of the 3D digital object, which expends additional memory and processing resources to analyze at least one additional image for every additional portion (e.g., view) of the 3D digital object.

1 FIG. 1 FIG. 100 100 110 110 120 140 100 190 195 110 190 100 110 120 140 Referring now to the drawings,is a diagram depicting an example of a computing environment in which texture data for 3D digital objects is generated, such as a computing environment. The computing environmentincludes a texture data generation computing system. In addition, the texture data generation computing systemincludes a rendering engineand a neural network module. In some cases, the computing environmentincludes one or more additional computing systems, such as an additional computing devicethat includes a user interface. In addition, one or more of the texture data generation computing system, the additional computing device, and additional computing systems included in the computing environmentare configured to exchange data via one or more computing networks, such as a local or global area network.depicts the texture data generation computing systemas including the rendering engineand the neural network module, but other implementations are possible. For example, a texture data generation computing system could be configured to communicate with one or more of an external rendering engine or an external neural network, e.g., external components that are implemented by one or more additional computing systems.

190 190 190 195 190 110 110 190 193 195 193 110 115 110 190 115 115 115 190 115 In some cases, the additional computing deviceis configured to implement one or more digital graphical environments that are capable of presenting, e.g., to a user of the additional computing device, one or more 3D digital objects. For example, the additional computing deviceconfigures at least one display device, such as a display device included in the user interface, to display image data describing a digital graphical environment (or a local instance thereof), such as a development environment for 3D digital objects, an interactive game environment, a VR collaboration environment, or other types of digital graphical environments. In addition, the additional computing deviceis configured to implement the one or more digital graphical environments based on information received from (or provided to) the texture data generation computing system. For example, the texture data generation computing systemcould receive, from the additional computing device, request datathat describes a requested 3D digital object having a requested appearance, such as request data provided via an input device included in the user interface. Based on the request data, the texture data generation computing systemgenerates one or more texture data objects, such as a texture data object, that include texture data describing the requested appearance. In addition, the texture data generation computing systemprovides, to the additional computing device, one or more of the texture data objector a 3D digital object having the texture described by the texture data object. Responsive to receiving the texture data objector the 3D digital object, the additional computing devicecan modify the digital graphical environment (or local instance thereof) to include the 3D digital object having the texture described by the texture data object.

100 110 115 110 193 110 110 115 193 In the computing environment, the texture data generation computing systemgenerates the texture data objectbased on one or more pairs of multi-view images, such as a multi-view image pair that includes a multi-view image with rendered image data and an additional multi-view image with diffusion-generated image data, e.g., image data generated via one or more trained neural network models. In some cases, the texture data generation computing systemgenerates one or more of the multi-view images based on a visual appearance described by the request data. In addition, the texture data generation computing systemgenerates the multi-view image pair based on cross-frame attention features, e.g., cross-frame attention features identified via one or more trained neural network models. Based on the one or more multi-view image pairs and the cross-frame attention features, the texture data generation computing systemgenerates (or modifies) the texture data objectto include one or more texture data values, such as texture data values that describe the requested appearance from the request data.

1 FIG. 1 FIG. 110 105 107 105 107 105 110 105 107 193 193 110 193 193 110 105 105 110 193 193 110 107 110 107 107 190 107 105 In, the texture data generation computing systemreceives one or more of a 3D meshor appearance input data. In some cases, the 3D meshincludes data describing a 3D digital object, such as data describing multiple triangles (or other polygon types suitable for 3D mesh data) that are included in a surface of the 3D digital object. In addition, the appearance input dataincludes data, such as text data, describing a requested appearance for the 3D digital object described by the 3D mesh. In some cases, the texture data generation computing systemgenerates or otherwise identifies one or more of the 3D meshor the appearance input databased on the request data. For instance, the request datacould indicate one or more of a particular 3D digital object or a requested appearance for the particular 3D digital object. As an example, the texture data generation computing systemdetermines that the request dataindicates a particular 3D digital object resembling an apple. Based on the request data, the texture data generation computing systemidentifies the 3D mesh, such as by determining that the 3D meshis a triangle mesh for a 3D digital object resembling an apple. Continuing with this example, texture data generation computing systemalso determines that the request datafurther indicates, for the apple object, a requested appearance of “red with a green spot” and “shiny.” Based on the request data, the texture data generation computing systemgenerates (or otherwise determines) the appearance input data. For example, the texture data generation computing systemcan generate the appearance input datato include text data describing “red,” “green spot,” and “shiny.” In, the appearance input dataincludes text data that provides a verbal description of the requested appearance, but other implementations are possible. For example, a texture data generation computing system could generate (or otherwise determine) appearance input data that includes text data, a 2D image provided with request data (e.g., an image provided by a user of the additional computing device), a 2D image selected from a texture data repository, or other types of data that describe visual appearance. In some cases, the appearance input dataexcludes texture data structures configured to be included on a surface of the 3D digital object, e.g., texels that can be applied during rendering of the 3D mesh.

100 110 120 105 120 105 120 In the computing environment, the texture data generation computing systemgenerates, via the rendering engine, a first multi-view digital image based on the 3D mesh. For example, the rendering enginegenerates multiple views of the 3D digital object described by the 3D mesh. In addition, the rendering enginerenders (e.g., generates) 2D images based on the views, such as a respective 2D rendered image for each particular view of the multiple views. In some cases, the multiple views are selected at different viewpoints, such that each particular view depicts a different portion of the 3D digital object (e.g., top portions, bottom portions, side portions). Additionally or alternatively, the multiple views are selected at viewpoints having different viewing angles of the 3D digital object and same (or similar) distances from the 3D digital object, such that each particular view depicts a different portion of the 3D digital object from a same (or similar) viewpoint distance.

1 FIG. 120 110 125 125 125 125 125 120 105 125 In, the rendering engine(or another component of the texture data generation computing system) generates a multi-view rendered imagethat is based on a combination of the 2D rendered images. In some cases, the multi-view rendered imageis a 2D composite image that includes the 2D rendered images (or a subset thereof) for the multiple views. In addition, the multi-view rendered imageincludes the 2D rendered images arranged as a grid, such that each image region of the multi-view rendered imagedepicts a particular view from the multiple views. In addition, the multi-view rendered imageincludes particular portions of the 2D rendered images, such as a cropped portion that includes the respective view of each 2D rendered image and omits additional (e.g., background) portions of each 2D rendered image. Continuing with the above example of a 3D digital object resembling an apple, the rendering enginerenders twenty-five views of the 3D mesh(e.g., different viewpoint angles at a same or similar viewpoint distance). Based on the example twenty-five rendered views, the multi-view rendered imageincludes a 5×5 array of the twenty-five views, each particular region of the array depicting a respective view of the apple object. Additional examples of a multi-view rendered image generated by a texture data generation computing system can include grids (or other organizational formats) of an n×m array of views (e.g., sixteen views in a 4×4 array), an n×m array of views (e.g., twenty-four views in a 4×6 array), irregular arrays (e.g., rows or columns having various lengths), or other types of presentations for multiple views of a 3D digital object.

125 110 140 140 150 150 150 155 125 107 150 107 150 125 107 150 155 125 107 150 1 FIG. Based on the multi-view rendered image, the texture data generation computing systemgenerates, via the neural network module, a second multi-view digital image. For example, the neural network moduleincludes one or more models trained to generate 2D digital images, such as a trained diffusion image generation model(also referred to herein as the “diffusion model”). The trained diffusion modelgenerates a multi-view diffusion-generated image, based on one or more of the multi-view rendered imageor the appearance input data. For example, the trained diffusion modeldetermines, based on the appearance input data, one or more visual appearance characteristics that are requested, such as the example appearance described by “red,” “green spot,” and “shiny.” In addition, the trained diffusion modelmodifies a noisy image, such as via a denoising image generation technique, based on a combination of data included in the multi-view rendered imageand the appearance input data. For example, the trained diffusion modelgenerates the multi-view diffusion-generated imageby modifying the noisy image (e.g., via iterative denoising operations) to include images of the multiple views depicted in the multi-view rendered imagehaving the visual appearance indicated by the appearance input data. In, the trained diffusion modelis a particular trained diffusion model, but other implementations are possible, such as a combination of multiple diffusion image generation models (or multiple types of diffusion image generation models).

155 125 155 125 155 In some cases, the multi-view diffusion-generated imageis a 2D image that includes the 2D diffusion-generated views that correspond to the multiple views depicted in the multi-view rendered image. Continuing with the above example of the requested apple and appearance, the multi-view diffusion-generated imageincludes multiple diffusion-generated views of the apple object having the requested appearance of “red with a green spot” and “shiny.” Corresponding to the example twenty-five rendered views included in the multi-view rendered image, the multi-view diffusion-generated imageincludes an additional 5×5 array of twenty-five diffusion-generated views, each particular region of the additional array depicting a respective diffusion-generated view of the apple object having an appearance of shiny and red with a green spot. In this example, various views in the additional array can depict various portions of the apple object's appearance, such as a first view from a first viewing angle in which the green spot is visible and a second view from a second viewing angle in which the green spot is occluded (e.g., not visible).

1 FIG. 150 155 125 155 125 150 125 155 150 125 150 150 125 In, the trained diffusion modelgenerates the multi-view diffusion-generated imagebased on cross-frame attention features, such as image features identified from one or more of the multi-view rendered imageor the noisy image from which the multi-view diffusion-generated imageis generated (e.g., features identified during iterative denoising operations). Examples of image features can include vector representations (or other digital representations, including digital representations not intended for human interpretation) that describe relationships among pixels (or other elements) in a digital image, such as mathematical relationships among pixels in the multi-view rendered imageor the noisy image. In some cases, the trained diffusion modeldetermines (or otherwise receives) a set of cross-frame attention features that describe image features among the multiple views, or a subset of the multiple views, from the imagesor. For example, the trained diffusion modeldetermines, for each region of the noisy image, a corresponding view from the multiple rendered views depicted in the multi-view rendered image. In addition, the trained diffusion modeldetermines, for each particular region of the noisy image, a respective set of cross-frame attention features. For example, during iterative denoising of the noisy image, the trained diffusion modeldetermines the respective set of cross-frame attention features for each particular region based on one or more additional regions of the noisy image. In some cases, such as during one or more initial denoising iterations (e.g., a startup phase), the respective set of cross-frame attention features is determined based on one or more additional rendered views depicted in the multi-view rendered image(e.g., excluding the rendered view corresponding to the particular noisy region).

125 150 150 Continuing with the example twenty-five rendered views in the multi-view rendered image, the trained diffusion modeldetermines a first noisy region of the noisy image corresponding to a first rendered view and an additional noisy region corresponding to each of the additional twenty-four rendered views. In addition, the trained diffusion modelcalculates, for the first noisy region, a respective set of cross-frame attention features from some or all of the additional noisy regions (e.g., excluding the first noisy region).

155 155 In some implementations, generating the diffusion-generated views in the multi-view diffusion-generated imagebased on cross-frame attention features improves consistency of appearance among the diffusion-generated views. For example, denoising each particular region of the noisy image based on cross-frame attention features from additional regions of the noisy image can generate more consistent data values that describe the diffusion-generated views (or iterations thereof) that are depicted in the noisy image. In some implementations, generating the diffusion-generated views in the multi-view diffusion-generated imagebased on cross-frame attention features calculated from one or more multi-view images improves efficient use of computing resources (e.g., reduced usage of processing or memory resources) for generating the diffusion-generated views. For example, generating a multi-view diffusion-generated image during a particular denoising operation can provide a visual appearance for multiple views of a 3D digital object more efficiently as compared to generating a particular visual appearance for a particular view of the 3D digital object based on multiple denoising operations, e.g., applying a denoising operation to each view individually.

155 110 115 110 155 110 115 115 110 115 155 110 155 155 105 105 110 115 1 FIG. Based on the multi-view diffusion-generated image, the texture data generation computing systemgenerates or modifies the texture data object. For example, the texture data generation computing systemcalculates one or more texture data values that describe the visual appearance depicted in the multi-view diffusion-generated imageacross the multiple diffusion-generated views. In addition, the texture data generation computing systemmodifies the texture data objectto include the texture data values. In some cases, the texture data objectis a texture map, such as a 2D digital image that includes texture data specialized for application to a 3D digital object. Examples of texture data included in a texture map can include texels that indicate, for instance, a color or other texture characteristics that can be included on a surface of a 3D digital object to provide a particular appearance of the digital object. In, the texture data generation computing systemmodifies the texture data objectto include texels (or other texture data structures) to include the texture data values calculated from the multi-view diffusion-generated image. In some cases, the texture data generation computing systemcalculates the one or more texture data values based on one or more blending techniques. An example of a first blending technique can include averaging data values from a set (or subset) of the multiple diffusion-generated views in the multi-view diffusion-generated image. An example of a second blending technique can include identifying, from the multiple diffusion-generated views in the multi-view diffusion-generated image, one or more diffusion-generated views that are similar to a portion of the 3D mesh, such as a particular diffusion-generated view that is within a similarity threshold to a normal (e.g., perpendicular) of a particular triangle in the 3D mesh. Additional blending techniques (or combinations of techniques) could be used by the texture data generation computing systemto calculate the texture data values or to modify the texture data objectbased on the calculated texture data values.

110 115 110 120 125 105 150 155 110 115 155 110 120 105 115 150 110 115 110 120 105 115 110 150 110 115 In some implementations, the texture data generation computing systemrepeats one or more techniques for generating or modifying the texture data object. For example, in a first pass of multi-view texture generation techniques by the texture data generation computing system, the rendering enginecan create a first multi-view rendered image, e.g., the image, based on rendered views of the 3D meshhaving no texture data applied, or a default texture data applied (e.g., a default color). In addition, the trained diffusion modelcan create a second multi-view diffusion-generated image, e.g., the image, based on the multiple rendered views depicting no texture, or the default texture. Furthermore, the texture data generation computing systemcan perform a first modification to the texture data objectbased on the texture data values calculated from the multi-view diffusion-generated image. In a second pass of the multi-view texture generation techniques by the texture data generation computing system, the rendering enginecan create a third multi-view rendered image that is based on additional rendered views of the 3D meshhaving the first modified texture data from the texture data object. In addition, the trained diffusion modelcan create a fourth multi-view diffusion-generated image based on the additional rendered views depicting the first modified texture data. Furthermore, the texture data generation computing systemcan perform a second modification to the texture data objectbased on additional texture data values calculated from the fourth multi-view diffusion-generated image. In a third pass of the multi-view texture generation techniques by the texture data generation computing system, the rendering enginecan create a sampling set of additional rendered views of the 3D meshhaving the second modified texture data from the texture data object. In some cases, the texture data generation computing systemcan select from the sampling set a particular additional rendered view that lacks the second modified texture data, e.g., the particular additional rendered view was not included in the first or third multi-view rendered images. In addition, the trained diffusion modelcan create one or more additional diffusion-generated views that depict an additional visual appearance for the particular additional rendered view. In some cases, the texture data generation computing systemcould perform additional passes of the multi-view texture generation techniques (or portions of the techniques), such as to refine the modified texture data in the texture data objector to create texture data for additional portions of the 3D digital object that lack texture data.

100 110 115 190 115 190 115 110 115 190 190 105 115 107 190 190 115 1 FIG. In the computing environment, the texture data generation computing systemprovides the texture data objectto one or more additional computing systems, such as the additional computing device. In addition, the one or more additional computing systems are configured to modify at least one digital graphical environment based on the texture data object. For example, the additional computing devicereceives the texture data objectfrom the texture data generation computing system. Responsive to receiving the texture data object, the additional computing devicemodifies one or more digital graphical environments. For example, the additional computing devicegenerates a textured 3D digital object that is based on a combination of the 3D meshand the texture data object, e.g., the example apple object having a texture that is based on the requested visual appearance described by the appearance input data. In addition, the additional computing devicemodifies one or more digital graphical environments to include the textured 3D digital object, such as modifying a VR collaboration environment to include the example apple object with the requested visual appearance.describes the textured 3D digital object as being generated by the additional computing devicebased on the texture data object, but other implementations are possible. For example, a texture data generation computing system could generate a textured 3D digital object based on a combination of a texture data object and a 3D mesh associated with the texture data object. In this example, the texture data generation computing system could provide the textured 3D digital object to one or more additional computing systems, such as additional computing systems configured to implement one or more digital graphical environments.

In some implementations, a texture data generation computing system includes at least one neural network that is trained to generate one or more images depicting a visual appearance of a particular 3D digital object. In some cases, the one or more images are generated based on a multi-view rendered image of the particular 3D digital object and appearance input data that indicates a requested visual appearance of the particular 3D digital object. In addition, the one or more images include diffusion-generated views of the particular 3D digital object, such as diffusion-generated views that each depict a combination of a particular view from the multi-view rendered image having a texture that is based on appearance input data, e.g., depicting the requested visual appearance. In some implementations, the trained neural network includes at least one trained diffusion image generation model. Additionally or alternatively, the trained neural network, e.g., via the included trained diffusion image generation model, generates or otherwise receives a set of cross-frame attention features for at least one rendered view in the multi-view rendered image of the particular 3D digital object. For example, the trained diffusion image generation model determines, for each particular rendered view in the multi-view rendered image, cross-frame attention features that describe relationships among the additional rendered views (or a subset thereof) in the multi-view rendered image.

In some cases, the trained diffusion image generation model improves consistency among the diffusion-generated views of the particular 3D digital object by utilizing one or more of the multi-view rendered image or the cross-frame attention features. For example, based on the cross-frame attention features, the trained diffusion image generation model can calculate image data with high consistency across the diffusion-generated views, such as image data that depicts a consistent visual appearance, e.g., to a human viewer, of the particular 3D digital object. In addition, based on a combination of the multi-view rendered image and the cross-frame attention features, the trained diffusion image generation model can calculate consistent image data across the diffusion-generated views simultaneously (or nearly simultaneously), e.g., all of the diffusion-generated views are modified during a particular application of diffusion-generation techniques by the trained model. In some cases, determining diffusion-generated views simultaneously (or nearly simultaneously) improves consistency by reducing or eliminating changes to sequentially calculated images data, e.g., “drift” of data values calculated over sequential views. Examples of consistent image data can include color data that is similar among multiple diffusion-generated views depicting similar portions of the particular 3D digital object, brightness data that is similar among multiple diffusion-generated views depicting object portions at a similar angle (e.g., lower brightness in an interior of a cardboard box object), color data having a visually coherent gradient among multiple diffusion-generated views depicting object portions with dissimilar colors (e.g., smooth color transitions across a leaf object, sharp color transitions across a beach ball object), or other types of image data depicting a consistent visual appearance among multiple diffusion-generated views depicting portions of the particular 3D digital object.

In some cases, the trained diffusion image generation model improves efficiency of one or more computing resources (e.g., processing resources, memory resources) by utilizing one or more of the multi-view rendered image or the cross-frame attention features. For example, based on the multi-view rendered image, the trained diffusion image generation model can determine the diffusion-generated views for all of the rendered views simultaneously (or nearly simultaneously), e.g., all of the diffusion-generated views are modified during a particular application of diffusion-generation techniques by the trained model. In some cases, determining diffusion-generated views simultaneously (or nearly simultaneously) for multiple portions of the particular 3D digital object reduces usage of computing resources as compared to contemporary approaches for determining diffusion-generated views sequentially for multiple portions of a 3D digital object.

2 FIG. 1 FIG. 210 210 220 240 250 250 250 250 210 215 210 210 190 210 215 215 215 depicts an example of a texture data generation computing systemthat is configured to generate texture data for 3D digital objects. The texture data generation computing systemincludes one or more of a rendering engineor a neural network module. In addition, the neural network module includes at least one diffusion image generation model, such as a trained diffusion image generation model(also referred to herein as the “diffusion model”). The diffusion modelis trained to generate 2D digital images depicting multiple diffusion-generated views of a visual appearance for a 3D digital object. Based on the 2D digital images from the trained diffusion model, the texture data generation computing systemgenerates texture data, such as a texture data object, describing the visual appearance depicted in the multiple diffusion-generated views. In some cases, the texture data generation computing systemis configured to perform multiple passes of texture-generation techniques, such as multiple passes to generate multiple sets of diffusion-generated views or texture data. In some implementations, the texture data generation computing systemis configured to communicate with one or more additional computing systems, such as the additional computing devicedescribed in regard to. For example, the texture data generation computing systemcan provide the texture data object, or a textured 3D digital object having the texture described by the texture data object, to an additional computing system that is configured to modify a digital graphical environment based on the texture data objector the textured 3D digital object.

2 FIG. 2 FIG. 210 205 207 205 207 207 205 210 205 207 In, the texture data generation computing systemreceives one or more of a 3D meshor appearance input data. In some cases, the 3D meshincludes data describing a 3D digital object, such as triangle data describing a surface of the 3D digital object. In addition, the appearance input dataincludes data describing a requested appearance for the 3D digital object, such as text data describing the requested appearance. In, the appearance input dataexcludes texture data structures configured to be included on a surface of the 3D digital object, e.g., texels that can be applied during rendering of the 3D mesh. In some cases, the texture data generation computing systemgenerates or otherwise identifies one or more of the 3D meshor the appearance input databased on request data received from an additional computing system, such as an additional computing system configured to implement a digital graphical environment.

210 220 205 220 225 205 225 225 207 220 225 210 In the texture data generation computing system, the rendering enginegenerates multiple multi-view rendered digital images based on the 3D mesh. For example, the rendering enginegenerates a first multi-view rendered imagethat includes a first set of multiple rendered views of the 3D digital object described by the 3D mesh. In some cases, the first multi-view rendered imageis a 2D composite image in which the first set of rendered views are arranged in an array or another suitable arrangement. In addition, the first multi-view rendered imagedepicts the 3D digital object having no texture, or having a default texture (e.g., default values which are not based on the appearance input data). In some cases, the rendering enginegenerates the first multi-view rendered imageduring a first pass of texture-generation techniques performed by the texture data generation computing system, such as a first pass to generate initial texture data.

225 210 233 233 233 225 225 210 233 225 210 233 Based on the first multi-view rendered image, the texture data generation computing systemgenerates a mask image. In some cases, the mask imageis a 2D image depicting a modification of the first set of multiple rendered views, such as a modification that depicts the first set of views in black and white or greyscale. In addition, the mask imagehas one or more image characteristics, e.g., image size or image resolution, that are based on the first multi-view rendered image. For example, responsive to determining that the first multi-view rendered imagehas an image size of 2000×2000 pixels, the texture data generation computing systemgenerates the mask imagehaving an image size of 2000×2000 pixels. In addition, responsive to determining that the first multi-view rendered imageincludes twenty-five rendered views arranged in a 5×5 array, the texture data generation computing systemgenerates the mask imagehaving twenty-five mask regions arranged in a 5×5 array, such that each particular rendered view has a corresponding mask region.

225 210 235 235 235 210 235 225 210 235 240 250 Based on the first multi-view rendered image, the texture data generation computing systemgenerates a noisy image. In some cases, the noisy imageis a 2D image that depicts digital noise. An example of digital noise for a 2D digital image can include pixel characteristics (e.g., color, brightness) that are determined via a Gaussian distribution (e.g., Gaussian noise). An additional example of digital noise for a 2D digital image can include vector characteristics, such as data values that are determined via a Gaussian distribution and included in a vector representation (e.g., feature space) of a 2D digital image. In some cases, the noisy imageis associated with a vector representation. For example, the texture data generation computing systemcan generate the noisy imageas a solid white image having an image size or resolution based on the first multi-view rendered image, e.g., an image size of 2000×2000 pixels. In addition, the texture data generation computing systemcan generate a vector representation associated with the noisy imagethat includes digital noise, e.g., modified vector values that are determined via a Gaussian distribution, such that the noisy vector representation indicates a white image that has Gaussian noise. In some cases, the vector representation (or noisy vector representation) is generated by or stored in the neural network moduleor the trained diffusion image generation model.

210 250 233 235 207 220 250 250 255 207 250 265 250 255 265 233 207 2 FIG. In the texture data generation computing system, the trained diffusion modelgenerates multiple multi-view diffusion-generated digital images. In, the multi-view diffusion-generated images are generated based on one or more of the mask image, the noisy image, the appearance input data, or one or more multi-view rendered digital images from the rendering engine. In some cases, the multi-view diffusion-generated images are generated based on one or more cross-frame attention features, such as one or more cross-frame attention feature sets determined during diffusion image generation techniques performed by the trained diffusion model. For example, the trained diffusion modelgenerates a first multi-view diffusion-generated imagethat includes a first set of multiple diffusion-generated views of the 3D digital object having a visual appearance based on the appearance input data. In addition, the trained diffusion modelgenerates one or more sets of cross-frame attention features, such as cross-frame attention feature sets. In some cases, such as during the first pass of texture-generation techniques, the trained diffusion modelgenerates one or more of the first multi-view diffusion-generated imageor the cross-frame attention feature setsbased on receiving one or more of the mask imageor the appearance input dataas input.

250 255 235 250 235 250 265 250 In some implementations, the trained diffusion modelgenerates the first multi-view diffusion-generated imageby modifying the noisy imageor the associated noisy vector representation, such as modifications via iterative denoising operations. During one or more iterations of the denoising operations, the trained diffusion modeldetermines a set of cross-frame attention features for one or more diffusion-generated views that are being generated, e.g., cross-frame attention feature sets corresponding to noisy regions of the noisy image. In some cases, the trained diffusion modelgenerates or modifies the cross-frame attention feature setsfor each denoising iteration (or a subset of denoising iterations) during the diffusion image generation techniques, such as by modifying one or more layers of the trained diffusion modelto include one or more current cross-frame attention features calculated during a current denoising iteration.

210 250 265 255 250 235 265 235 250 225 In the texture data generation computing system, the trained diffusion modeldetermines a respective set in the cross-frame attention feature setsfor each particular diffusion-generated view in the first multi-view diffusion-generated image. For example, the trained diffusion modeldetermines, for a particular noisy region of the noisy image, a corresponding cross-frame attention feature set in the sets. In addition, the corresponding cross-frame attention feature set includes data describing one or more relationships among additional mask regions corresponding to additional noisy regions of the noisy image. Based on the corresponding cross-frame attention feature set, the trained diffusion modelmodifies the particular noisy region to include image features that are based, at least in part, on additional image features of the additional mask regions or the additional noisy regions. In some cases, such as during one or more initial denoising iterations (e.g., a startup phase), the corresponding cross-frame attention feature set is determined based on one or more additional rendered views depicted in the first multi-view rendered image(e.g., excluding a particular rendered view corresponding to the particular noisy region).

2 FIG. 255 225 233 225 255 225 233 235 250 255 210 207 225 233 255 207 In, the first multi-view diffusion-generated imageis a 2D image that includes 2D diffusion-generated views corresponding to the multiple rendered views depicted in the first multi-view rendered image, such as twenty-five diffusion-generated views arranged in a 5×5 array, such that each particular diffusion-generated view has a corresponding mask region in the mask imageand a corresponding rendered view in the first multi-view rendered image. In addition, the first multi-view diffusion-generated imagehas one or more image characteristics that are based on the first multi-view rendered image, the mask image, or the noisy image, such an image size of 2000×2000 pixels. In some cases, the trained diffusion modelgenerates the first multi-view diffusion-generated imageduring the first pass of texture-generation techniques performed by the texture data generation computing system. For example, based on a combination of the appearance input datawith the first multi-view rendered image(or the mask image), the first set of multiple diffusion-generated views in the first multi-view diffusion-generated imagedepicts the 3D digital object having an initial texture that is based on the visual appearance described by the appearance input data.

255 210 215 210 255 210 215 255 210 215 210 255 210 215 255 Based on the first multi-view diffusion-generated image, the texture data generation computing systemperforms a first modification to the texture data object. For example, the texture data generation computing systemcalculates first texture data values that describe the initial texture appearance depicted in the first multi-view diffusion-generated imageacross the first set of multiple diffusion-generated views. In addition, the texture data generation computing systemmodifies the texture data objectto include texels (or other texture data structures) that are based on the first texture data values calculated from the first multi-view diffusion-generated image. In some cases, the texture data generation computing systemperforms the first modification to the texture data objectduring the first pass of texture-generation techniques performed by the texture data generation computing system. For example, based on a first blending technique for averaging data values from a set (or subset) of the multiple diffusion-generated views in the first multi-view diffusion-generated image, the texture data generation computing systemcalculates the first texture data values. In addition, based on the first pass techniques, the first modified texture data objectincludes the first texture data values that describe the initial texture depicted in the first multi-view diffusion-generated image.

2 FIG. 210 215 220 223 205 223 215 223 220 223 210 In, the texture data generation computing systemperforms one or more additional passes of texture-generation techniques, such as a second pass that is based on the first modified texture data object. For example, the rendering enginegenerates a second multi-view rendered imagethat includes a second set of multiple rendered views of the 3D digital object described by the 3D mesh. In addition, the second multi-view rendered imagedepicts the 3D digital object having the initial texture described by the first modified texture data object. In addition, the second multi-view rendered imageis a 2D composite image in which the second set of rendered views are arranged in an array or another suitable arrangement. In some cases, the rendering enginegenerates the second multi-view rendered imageduring a second pass of texture-generation techniques performed by the texture data generation computing system, such as a second pass to generate refined texture data.

210 225 223 205 225 223 225 215 223 225 223 225 223 225 223 2 FIG. In the texture data generation computing system, the multiple multi-view rendered imagesandare generated based on a particular set of views for the 3D mesh, such that the multi-view rendered imagesanddepict a same set of views for the 3D digital object having different textures, e.g., no texture or default texture in the first multi-view rendered image, the initial texture from the first modified texture data objectin the second multi-view rendered image. In some cases, the particular set of views includes views selected at different viewpoints, such that each particular view depicts a different portion of the 3D digital object. Additionally or alternatively, the selected different viewpoints have different viewing angles of the 3D digital object and same (or similar) distances from the 3D digital object, such that each view in the particular set of views depicts a different portion of the 3D digital object from a same (or similar) viewpoint distance. Based on the multi-view rendered imagesanddepicting the same set of views, the imagesandhave some image characteristics that are the same, such as a same image size of 2000×2000 pixels and a same view arrangement of a 5×5 array. In, the multi-view rendered imagesandeach depict a respective set of twenty-five rendered views, in which the depicted textures are different between the respective sets.

210 210 235 235 210 235 223 210 235 In some cases, during the second pass of texture-generation techniques performed by the texture data generation computing system, the texture data generation computing systemmodifies the noisy imageor generates an additional version of the noisy image. During the second pass, the texture data generation computing systemgenerates or modifies the additional noisy imageas an additional solid white image having an image size or resolution based on the second multi-view rendered image, e.g., an image size of 2000×2000 pixels. In addition, the texture data generation computing systemgenerates or modifies an additional noisy vector representation associated with the additional noisy image.

223 235 207 250 253 223 223 207 250 265 250 253 235 250 265 253 250 235 265 250 253 265 223 207 Based on one or more of the second multi-view rendered image, the additional noisy image, or the appearance input data, the trained diffusion modelgenerates a second multi-view diffusion-generated imagethat includes a second set of multiple diffusion-generated views of the 3D digital object having a visual appearance based on the second multi-view rendered imageor a combination of the second multi-view rendered imageand the appearance input data. In addition, the trained diffusion modelgenerates or modifies one or more sets of cross-frame attention features, such as modifying the cross-frame attention feature sets. In some cases, the trained diffusion modelgenerates the second multi-view diffusion-generated imageby modifying the additional noisy imageor the associated additional noisy vector representation, such as modifications via iterative denoising operations. In addition, the trained diffusion modeldetermines a respective set in the cross-frame attention feature setsfor each particular diffusion-generated view in the second multi-view diffusion-generated image. For example, the trained diffusion modeldetermines, for a particular noisy region of the additional noisy image, a corresponding cross-frame attention feature set in the sets. In some cases, such as during the second pass of texture-generation techniques, the trained diffusion modelgenerates or modifies one or more of the second multi-view diffusion-generated imageor the cross-frame attention feature setsbased on receiving one or more of the second multi-view rendered imageor the appearance input dataas input.

2 FIG. 253 223 223 253 223 235 250 253 210 207 223 253 207 223 In, the second multi-view diffusion-generated imageis a 2D image that includes 2D diffusion-generated views corresponding to the multiple rendered views depicted in the second multi-view rendered image, such as twenty-five diffusion-generated views arranged in a 5×5 array, such that each particular diffusion-generated view has a corresponding rendered view in the second multi-view rendered image. In addition, the second multi-view diffusion-generated imagehas one or more image characteristics that are based on the second multi-view rendered imageor the additional noisy image, such an image size of 2000×2000 pixels. In some cases, the trained diffusion modelgenerates the second multi-view diffusion-generated imageduring the second pass of texture-generation techniques performed by the texture data generation computing system. For example, based on a combination of the appearance input datawith the second multi-view rendered image, the second set of multiple diffusion-generated views in the second multi-view diffusion-generated imagedepicts the 3D digital object having a refined texture that is based on a combination of the visual appearance described by the appearance input dataand the initial texture that is depicted in the second multi-view rendered image.

253 210 215 210 253 210 215 253 210 215 210 205 210 253 205 215 253 Based on the second multi-view diffusion-generated image, the texture data generation computing systemperforms a second modification to the texture data object. For example, the texture data generation computing systemcalculates second texture data values that describe the refined texture appearance depicted in the second multi-view diffusion-generated imageacross the second set of multiple diffusion-generated views. In addition, the texture data generation computing systemmodifies the texture data objectto include texels (or other texture data structures) that are based on the second texture data values calculated from the second multi-view diffusion-generated image. In some cases, the texture data generation computing systemperforms the second modification to the texture data objectduring the second pass of texture-generation techniques performed by the texture data generation computing system. For example, based on a second blending technique for identifying a similarity between a particular diffusion-generated view and a particular triangle in the 3D mesh, the texture data generation computing systemcalculates the second texture data values. In some cases, the second blending technique includes identifying, from the multiple diffusion-generated views in the second multi-view diffusion-generated image, one or more particular diffusion-generated views that are within a similarity threshold to a normal (e.g., perpendicular) of the particular triangle in the 3D mesh. In addition, based on the second pass techniques, the second modified texture data objectincludes the second texture data values that describe the refined texture depicted in the second multi-view diffusion-generated image.

2 FIG. 210 215 220 227 210 227 205 227 215 227 220 210 227 205 225 223 220 227 220 205 220 215 205 220 227 210 In, the texture data generation computing systemperforms an additional pass of texture-generation techniques, such as a third pass that is based on the second modified texture data object. For example, the rendering enginegenerates a sampling set of rendered digital images, such as sample rendered images. In the texture data generation computing system, the sample rendered imagesinclude multiple digital images, each of which depicts a rendered view of the 3D digital object described by the 3D mesh. In addition, the sample rendered imagesdepict the 3D digital object having the refined texture described by the second modified texture data object. In some cases, the set of rendered digital images in the sample rendered imagesincludes sampling rendered images that each depict a respective rendered view (e.g., the set excludes multi-view rendered images). In addition, the rendering engineor the texture data generation computing systemselects, for inclusion in the sample rendered images, sampling rendered views that are different from the particular set of views for the 3D meshfrom which the multiple multi-view rendered imagesandare generated. For example, the rendering enginemay generate a group of multiple potential viewpoints for the sample rendered images. In some cases, the rendering enginemay modify the group of potential viewpoints to exclude one or more potential viewpoints that are within a threshold similarity of the particular set of views for the 3D mesh, such as a potential viewpoint having a viewing angle that is within a similarity threshold of an additional viewing angle in the particular set of views. Additionally or alternatively, the rendering enginemay modify the group of potential viewpoints to include one or more potential viewpoints that lack the second texture data values included in the second modified texture data object, such as a potential viewpoint in which the refined texture from the second pass is omitted from the 3D mesh. In some cases, the rendering enginegenerates the sample rendered imagesduring the third pass of texture-generation techniques performed by the texture data generation computing system, such as a third pass to generate infilled texture data, e.g., additional texture data generated to infill regions of the 3D digital object for which initial texture or refined texture was not generated during the first or second passes.

210 210 235 235 210 235 227 210 235 210 227 In some cases, during the third pass of texture-generation techniques performed by the texture data generation computing system, the texture data generation computing systemmodifies the noisy imageor generates an additional version of the noisy image. During the third pass, the texture data generation computing systemgenerates or modifies the additional noisy imageas an additional solid white image having an image size or resolution based on a particular rendered image of the sample rendered images, e.g., an image size that matches an additional size of the particular rendered image. In addition, the texture data generation computing systemgenerates or modifies an additional noisy vector representation associated with the additional noisy image. In some cases, the texture data generation computing systemcould generate respective noisy images or respective noisy vector representations for one or more of the sample rendered images, such as respective noisy images for a subset of the sampling rendered images having a threshold value of untextured appearance.

227 235 207 250 257 257 215 227 257 215 227 2 FIG. Based on one or more of the sample rendered images, the additional noisy image, or the appearance input data, the trained diffusion modelgenerates at least one additional diffusion-generated image. In, the additional diffusion-generated imageincludes an additional diffusion-generated view of the 3D digital object having a visual appearance based on the refined texture described in the second modified texture data object, such as refined texture visible in additional rendered views depicted in the sample rendered images. In addition, the at least one additional diffusion-generated imagedepicts the 3D digital object having an infilled texture that is based on a combination of multiple regions of the refined texture described in the second modified texture data object, e.g., multiple regions of refined texture visible in multiple images of the sample rendered images.

250 227 250 257 210 In some cases, the trained diffusion modelgenerates a respective diffusion-generated image for each rendered image in the sample rendered images(or a subset thereof), such as respective diffusion-generated images that are generated sequentially. In addition, the trained diffusion modelgenerates or modifies one or more additional cross-frame attention features for the additional diffusion-generated image, such as additional cross-frame attention features indicating relationships among sequential respective diffusion-generated images. In some cases, limiting sequentially generated cross-frame attention features to the third pass by the texture data generation computing systemimproves consistency of appearance among the sequentially diffusion-generated images while reducing impact on computing efficiency, e.g., reducing computing resource expenditure for sequentially generated diffusion-generated images.

257 210 215 210 257 210 215 257 210 215 210 215 257 Based on the at least one additional diffusion-generated image, the texture data generation computing systemperforms a third modification to the texture data object. For example, the texture data generation computing systemcalculates third texture data values that describe the infilled texture appearance depicted in the additional diffusion-generated image. In addition, the texture data generation computing systemmodifies the texture data objectto include texels (or other texture data structures) that are based on the third texture data values calculated from the at least one additional diffusion-generated image. In some cases, the texture data generation computing systemperforms the third modification to the texture data objectduring the third pass of texture-generation techniques performed by the texture data generation computing system. In addition, based on the third pass techniques, the third modified texture data objectincludes the third texture data values that describe the infilled texture depicted in the additional diffusion-generated image.

3 FIG. 3 FIG. 3 FIG. 325 335 355 365 210 325 220 355 365 250 335 210 240 is a diagram depicting examples of one or more data structures described herein, such as data structures related to generating texture data for 3D digital objects.includes diagrammatic examples of a multi-view rendered image, a noisy image, a multi-view diffusion-generated image, and cross-frame attention feature sets. In some cases, the example data structures are generated by a texture data generation computing system, such as the texture data generation computing system. For example, the multi-view rendered imageis generated by a rendering engine, such as the rendering engine. In addition, one or more of the multi-view diffusion-generated imageor the cross-frame attention feature setsare generated by a diffusion image generation model, such as the trained diffusion model. In addition, the noisy imageis generated by one or more of a texture data generation computing system or a neural network module, such as the texture data generation computing systemor the neural network module. The data structures depicted inare diagrammatic examples to aid understanding of the techniques described herein. However, other implementations of the described data structures are possible, including data structures not intended for human interpretation.

3 FIG. 325 325 325 325 325 325 325 325 325 325 325 325 325 325 a b c a b c a b c In, the multi-view rendered imageincludes multiple rendered views of an example 3D digital object, such as a jack-o-lantern object. In addition, the multi-view rendered imageis a 2D composite image of the multiple rendered views. For example, the multi-view rendered imageincludes a 5×5 array of twenty-five rendered views, each particular region of the array depicting a respective rendered view of the jack-o-lantern object. In addition, the multi-view rendered imageincludes regions depicting a first rendered view, a second rendered view, a third rendered view, and additional regions depicting additional rendered views. In the multi-view rendered image, the multiple rendered views, including the rendered views,, and, depict viewpoints having different viewing angles of the jack-o-lantern object and same (or similar) distances from the jack-o-lantern object, such that each particular rendered view depicts a different portion of the jack-o-lantern object from a same (or similar) viewpoint distance. In some cases, the multiple rendered views, including the views,, and, are generated based on rendered views of a 3D mesh having no texture data applied, such as a untextured 3D mesh for the jack-o-lantern object.

3 FIG. 355 325 355 325 325 335 365 355 325 215 355 355 355 325 355 325 355 325 325 a a b b c c In, the multi-view diffusion-generated imageincludes multiple diffusion-generated views that are generated based on, at least, the multiple rendered views in the multi-view rendered image. In some cases, the multi-view diffusion-generated imageis generated based on a combination of some or all of the multi-view rendered image, a mask image generated based on the multi-view rendered image, the noisy image, the cross-frame attention feature sets, or appearance input data. In addition, the multi-view diffusion-generated imageis a 2D image that includes 2D diffusion-generated views that correspond to the multiple rendered views depicted in the multi-view rendered image, each of the 2D diffusion-generated views having a visual appearance, such as a visual appearance that corresponds to appearance input data or texture data included in a texture data object (such as the texture data objector modified versions thereof). For example, the multi-view diffusion-generated imageincludes a 5×5 array of twenty-five diffusion-generated views, each particular region of the array depicting a respective diffusion-generated view of the jack-o-lantern object having a visual appearance based on a requested visual appearance, such as “orange with a green stem” and “lighted from within.” In addition, the multi-view diffusion-generated imageincludes regions depicting a first diffusion-generated viewcorresponding to the first rendered view, a second diffusion-generated viewcorresponding to the second rendered view, a third diffusion-generated viewcorresponding to the third rendered view, and additional regions depicting additional diffusion-generated views corresponding to additional respective rendered views of the multi-view rendered image.

355 335 365 335 335 335 325 325 335 335 325 335 335 325 335 325 335 325 325 355 335 355 335 355 335 355 335 355 335 335 3 FIG. 3 FIG. 3 FIG. a a b b c c a a b b c c In some implementations, the multi-view diffusion-generated imageis generated based on the noisy imageor the cross-frame attention feature sets. In some cases, the noisy imageis a 2D image that depicts digital noise, such as Gaussian noise. In addition, the noisy imageis associated with a vector representation, such as a noisy vector representation. In, the noisy imageis generated based on the multi-view rendered image. For example, if the multi-view rendered imagehas an image size of 2000×2000 pixels, the noisy imageis generated (or modified) having an image size of 2000×2000 pixels. In addition, the noisy imageincludes noisy regions corresponding to the rendered views in the multi-view rendered image, such as a 5×5 array of twenty-five noisy regions corresponding to the twenty-five rendered views. For example, the noisy imageincludes a first noisy regioncorresponding to the first rendered view, a second noisy regioncorresponding to the second rendered view, a third noisy regioncorresponding to the third rendered view, and additional noisy regions corresponding to additional respective rendered views of the multi-view rendered image. In, the diffusion-generated views in the multi-view diffusion-generated imageare generated via one or more modifications (e.g., iterative denoising operations) to corresponding noisy regions in the noisy image. For example, the first diffusion-generated viewis generated based on the corresponding first noisy region, the second diffusion-generated viewis generated based on the corresponding second noisy region, the third diffusion-generated viewis generated based on the corresponding third noisy region, and additional diffusion-generated views in the multi-view diffusion-generated imageare generated based on corresponding additional noisy regions in the noisy image.depicts the noisy imageas including visual noise that is visible to a human, but other implementations are possible. For example, a noisy image could depict Gaussian noise, white noise, or other types of noise. Additionally or alternatively, a noisy image could depict an image (e.g., a solid white image, a solid black image, a solid color image) that is associated with a noisy vector representation, e.g., a vector representation of Gaussian noise, white noise, or other types of noise.

355 365 365 365 355 365 365 355 365 355 365 355 355 365 325 325 335 365 335 335 325 325 335 325 365 335 335 325 325 335 325 365 335 335 325 325 335 325 355 355 355 355 365 365 365 3 FIG. a a b b c c a b c b c a a b a c a c b b c a b a b c c a b c a b c. In some implementations, the multi-view diffusion-generated imageis generated based on the cross-frame attention feature sets. In some cases, each cross-frame attention feature set in the setsdescribes image features, or relationships among image features, in a corresponding image region. In, each cross-frame attention feature set in the setscorresponds to a particular region in the multi-view diffusion-generated image, each region depicting a particular diffusion-generated view. For example, the cross-frame attention feature setsincludes a first cross-frame attention feature setcorresponding to the first diffusion-generated view, a second cross-frame attention feature setcorresponding to the second diffusion-generated view, a third cross-frame attention feature setcorresponding to the third diffusion-generated view, and additional cross-frame attention feature sets corresponding to additional respective diffusion-generated views of the multi-view diffusion-generated image. In addition, each cross-frame attention feature set in the setsis generated based on additional regions, e.g., not including the corresponding region, from one or more of the multi-view rendered image, a mask image corresponding to the multi-view rendered image, or the noisy image. For example, the first cross-frame attention feature setincludes features that are determined based on additional features of additional image regions, such as cross-frame attention features of the noisy regionsandor corresponding mask regions, or cross-frame attention features of the rendered viewsand, e.g., excluding the corresponding noisy regionor the corresponding rendered view. In addition, the second cross-frame attention feature setincludes features determined based on additional features of additional image regions, such as cross-frame attention features of the noisy regionsandor the rendered viewsand, e.g., excluding the corresponding noisy regionor the corresponding rendered view. Furthermore, the third cross-frame attention feature setincludes features determined based on additional features of additional image regions, such as cross-frame attention features of the noisy regionsandor the rendered viewsand, e.g., excluding the corresponding noisy regionor the corresponding rendered view. In some implementations, generating the diffusion-generated views in the multi-view diffusion-generated imagebased on cross-frame attention features improves consistency of appearance among the diffusion-generated views, e.g., the diffusion-generated views,, andhave improved visual consistency based on the respective cross-frame attention feature sets,, and

4 FIG. 1 3 FIGS.- 4 FIG. 1 3 FIGS.- 400 400 is a flow chart depicting an example of a processfor generating texture data for 3D digital objects, such as via one or more pairs of multi-view digital images. In some embodiments, such as described in regards to, a computing device executing a texture data generation computing system implements operations described in, by executing suitable program code. For illustrative purposes, the processis described with reference to the examples depicted in. Other implementations, however, are possible.

410 400 210 205 207 207 205 At block, the processinvolves receiving, by a texture data generation computing system, one or more of appearance input data and 3D mesh data. In some cases, the 3D mesh data describes a 3D digital object. For example, the 3D mesh data can describe triangles that define a surface of the 3D digital object, such as vertices, planes, normals, or other triangle characteristics. In some cases, the appearance input data describes a visual appearance of the 3D digital object, such as a requested visual appearance provided by a user of the texture data generation computing system, e.g., a user of an additional computing device in communication with the texture data generation computing system. For example, the appearance input data can include text data, a 2D digital image (e.g., a digital photograph), or other types of non-texture data describing a requested visual appearance. In addition, the appearance input data can exclude texture data indicating texture characteristics that can be included on a surface of the 3D digital object, such as excluding texels, a texture map, or other texture data structures that are configured to be applied during rendering. For example, the texture data generation computing systemreceives one or more of the 3D meshor the appearance input data. In addition, the appearance input dataincludes text data describing a requested visual appearance and excludes texture data structures configured for application to the 3D meshduring rendering.

420 400 220 205 225 At block, the processinvolves rendering a first multi-view rendered digital image, such as by a rendering engine included in the texture data generation computing system. In addition, the first multi-view rendered digital image includes a first set of multiple rendered views depicting the 3D digital object, such as rendered views that exclude the visual appearance described by the appearance input data. For example, the rendering enginegenerates, based on the 3D mesh, the multi-view rendered imageincluding a set of multiple rendered views. In some cases, the first set of multiple rendered views are untextured, such as rendered views of the 3D digital object having no texture applied, or a default texture that is uncorrelated with the appearance input data. For example, the rendering engine renders the first set of multiple rendered views based on the 3D mesh data having no texture data applied during rendering. In some cases, the first multi-view rendered digital image is a 2D composite image in which the first set of multiple rendered views are arranged in an array or other suitable arrangement.

430 400 250 255 225 207 At block, the processinvolves generating a second multi-view diffusion-generated digital image, such as by a trained diffusion image generation model included in the texture data generation computing system. In addition, the second multi-view diffusion-generated digital image includes a second set of multiple diffusion-generated views depicting the 3D digital object, such as diffusion-generated views that depict an initial texture of the 3D digital object. In some cases, the trained diffusion image generation model generates the second multi-view diffusion-generated digital image based on a combination of the first multi-view rendered digital image and the appearance input data. For example, the trained diffusion modelgenerates the multi-view diffusion-generated imagebased on a combination of the multi-view rendered imageand the appearance input data. In some cases, the trained diffusion image generation model generates the second set of multiple diffusion-generated views based on one or more cross-frame attention features, such as respective sets of cross-frame attention features for each view in the second set of multiple diffusion-generated views. In some cases, the second multi-view diffusion-generated digital image is a 2D image generated via the trained diffusion image generation model, in which the second set of multiple diffusion-generated views are arranged in an array or other suitable arrangement, which corresponds to the arrangement of the first set of multiple rendered views.

440 400 210 255 210 215 At block, the processinvolves modifying a texture data object, such as a first modification that is performed by the texture data generation computing system, to describe the initial texture depicted in the second multi-view diffusion-generated digital image. In some cases, the first modification to the texture data object includes calculating one or more first texture data values that describe the initial texture. In addition, the first modification to the texture data object includes modifying one or more data structures of the texture data object, such as texels or other texture data structures, based on the first texture data values. For example, the texture data generation computing systemcalculates one or more first texture data values that describe the initial texture depicted in the multi-view diffusion-generated image. In addition, the texture data generation computing systemmodifies the texture data objectto include texels (or other texture data structures) that are based on the calculated first texture data values. In some cases, the texture data object is, or includes, a texture map. In some cases, the texture data generation computing system calculates the first texture data values via one or more blending techniques, such as a first blending technique involving averaging data values from some or all diffusion-generated views from the second set of multiple diffusion-generated views, or a second blending technique involving identifying, from the second set of multiple diffusion-generated views, one or more diffusion-generated views that are similar to a particular triangle in the 3D mesh data.

450 400 210 215 220 215 220 205 223 110 115 190 115 190 115 At block, the processinvolves providing, by the texture data generation computing system, the first modified texture data object to at least one additional computing component. Based on the first modified texture data object, the additional computing component is configured to render at least one 3D digital object, such as the 3D digital object described by the 3D mesh data received by the texture data generation computing system. In some cases, the at least one additional computing component is a computing component that is included in the texture data generation computing system. For example, the texture data generation computing systemprovides the first modified texture data objectto the rendering engine, such as to perform one or more additional passes of texture-generation techniques. Responsive to receiving the first modified texture data object, the rendering enginerenders at least one additional view of the 3D digital object described by the 3D mesh, such as rendered views included in the multi-view rendered image. In some cases, the at least one additional computing component is a computing component that is included in one or more additional computing systems. For example, the texture data generation computing systemprovides the modified texture data objectto the additional computing device. Responsive to receiving the modified texture data object, the additional computing devicemodifies a digital graphical environment (or a local instance thereof) to include one or more 3D digital objects having the texture described by the texture data object.

5 FIG. 1 4 FIGS.- 5 FIG. 1 4 FIGS.- 500 500 is a flow chart depicting an example of a processfor generating refined texture data or infilled texture data for 3D digital objects, such as via multiple pairs of multi-view digital images. In some embodiments, such as described in regards to, a computing device executing a texture data generation computing system implements operations described in, by executing suitable program code. For illustrative purposes, the processis described with reference to the examples depicted in. Other implementations, however, are possible.

500 440 450 4 FIG. 4 FIG. 4 FIG. 5 FIG. 4 FIG. In some implementations, one or more operations involved in the processare performed by a texture data generation computing system, such as the example texture data generation computing system described in regard to. For example, the texture data generation computing system can generate the first modified texture data object as described in regard to block. In addition, the texture data generation computing system can provide the first modified texture data object to an additional computing component, such as the rendering engine, as described in regard to block. In some cases, one or more operations described in regard toare associated with a first pass of texture-generation techniques performed by the example texture data generation computing system, such as a first pass to generate initial texture data. In some cases, the example texture data generation computing system is configured to perform one or more additional passes of texture-generation techniques, such as one or more of a second pass to generate refined texture data or a third pass to generate infilled texture data. For example, the example texture data generation computing system described in(or an additional texture data generation computing system) can be configured to perform one or more additional operations described in regard to, such as subsequent to one or more operations described in regard to.

510 500 215 205 220 223 223 205 215 At block, the processinvolves rendering a third multi-view rendered digital image, such as by the rendering engine included in the example texture data generation computing system (or another rendering engine). In addition, the third multi-view rendered digital image includes a third set of multiple rendered views depicting the 3D digital object described by the 3D mesh data. In some cases, the rendering engine generates the third multi-view rendered digital image based on the first modified texture data object. In addition, the third set of multiple rendered views depict the 3D digital object having the initial texture described by the first modified texture data object. For example, based on a combination of the first modified texture data objectand the 3D mesh, the rendering enginegenerates the multi-view rendered imageincluding a set of multiple rendered views. In addition, the rendered views in the multi-view rendered imagedepict the 3D digital object described by the 3D meshhaving the initial texture described by the first modified texture data object. In some cases, the first multi-view rendered digital image and the third multi-view rendered digital image are generated based on a particular set of viewpoints, such as a set of viewpoints determined by the rendering engine.

520 500 250 253 223 207 At block, the processinvolves generating a fourth multi-view diffusion-generated digital image, such as by the trained diffusion image generation model included in the texture data generation computing system (or another trained diffusion image generation model). In addition, the fourth multi-view diffusion-generated digital image includes a fourth set of multiple diffusion-generated views depicting the 3D digital object, such as diffusion-generated views that depict a refined texture of the 3D digital object. In some cases, the trained diffusion image generation model generates the fourth multi-view diffusion-generated digital image based on a combination of the third multi-view rendered digital image and the appearance input data. For example, the trained diffusion modelgenerates the multi-view diffusion-generated imagebased on a combination of the multi-view rendered imageand the appearance input data. In some cases, the trained diffusion image generation model generates the fourth set of multiple diffusion-generated views based on one or more cross-frame attention features (or modified cross-frame attention features), such as respective sets of cross-frame attention features for each view in the fourth set of multiple diffusion-generated views. In some cases, the fourth multi-view diffusion-generated digital image is a 2D image generated via the trained diffusion image generation model, in which the fourth set of multiple diffusion-generated views are arranged in an array or other suitable arrangement, which corresponds to the arrangement of the third set of multiple rendered views.

530 500 210 253 210 215 440 210 215 255 210 215 253 205 At block, the processinvolves modifying the texture data object, such as a second modification to the first modified texture data object that is performed by the texture data generation computing system, to describe the refined texture depicted in the fourth multi-view diffusion-generated digital image. In some cases, the second modification to the texture data object includes calculating one or more second texture data values that describe the refined texture. In addition, the second modification to the texture data object includes modifying one or more data structures of the texture data object, such as the texels or other data structures, based on the second texture data values. For example, the texture data generation computing systemcalculates one or more second texture data values that describe the refined texture depicted in the multi-view diffusion-generated image. In addition, the texture data generation computing systemmodifies (or further modifies) the first modified texture data objectto include texels (or other texture data structures) that are based on the calculated second texture data values. In some cases, the texture data generation computing system calculates the second texture data values via one or more blending techniques, such as one or more blending techniques described in regard to block. In addition, the texture data generation computing system could calculate the second texture data values based on a blending technique that is the same as, or different from, a blending technique used for calculating the first texture data values. For example, during the first pass of texture-generation techniques, the texture data generation computing systemperforms the first modification to the texture data objectbased on a first blending technique for averaging data values from the first multi-view diffusion-generated image. In addition, during the second pass of texture-generation techniques, the texture data generation computing systemperforms the second modification to the texture data objectbased on identifying, from the second multi-view diffusion-generated image, one or more particular diffusion-generated views that are within a similarity threshold to a normal of a particular triangle in the 3D mesh.

5 FIG. 510 520 530 In some implementations, one or more operations described in regard to, such as blocks,, or, are associated with a second pass of texture-generation techniques performed by the example texture data generation computing system, such as a second pass to generate refined texture data.

540 500 220 227 205 215 205 220 227 227 205 215 At block, the processinvolves rendering a sampling set of multiple rendered views, such as a sampling set of digital images rendered by the rendering engine in the example texture data generation computing system (or another rendering engine). In some cases, each rendered view in the sampling set is depicted by a respective digital image, such that the sampling set of digital images excludes multi-view rendered images. In addition, each rendered view in the sampling set depicts the 3D digital object described by the 3D mesh data. For example, the rendering enginegenerates the sample rendered images, each of which depicts a respective rendered view of the 3D digital object described by the 3D mesh. In some cases, the rendering engine generates the sampling set of multiple rendered views based on the second modified texture data object. For example, each rendered view in the sampling set depicts the 3D digital object having the refined texture described by the second modified texture data object. For example, based on a combination of the second modified texture data objectand the 3D mesh, the rendering enginegenerates the sample rendered images. In addition, the rendered views in the sample rendered imagesdepict the 3D digital object described by the 3D meshhaving the refined texture described by the second modified texture data object.

220 227 215 In some cases, the rendering engine or the texture data generation computing system generates the sampling set of multiple rendered views based on a group of multiple potential viewpoints. For example, the rendering engine selects, for inclusion in the sampling set, rendered views that are different (e.g., different viewing angles) from the views included in the first or third multi-view rendered digital images. In addition, the rendering engine selects, for inclusion in the sampling set of multiple rendered views, a particular rendered view based on a determination that the particular rendered view lacks texture data, e.g., omits the refined texture described by the second modified texture data object and the initial texture described by the first modified texture data object. For example, the rendering enginegenerates the sample rendered imagesbased on a group of potential viewpoints modified to include one or more potential viewpoints that lack the second texture data values included in the second modified texture data object.

550 500 210 227 210 At block, the processinvolves selecting, such as by the rendering engine or the texture data generation computing system, at least one rendered view from the sampling set of multiple rendered views. In addition, the at least one rendered view is identified, such as by the rendering engine or the texture data generation computing system, as lacking the refined texture from the second modified texture data object, or other texture data. For example, the texture data generation computing system determines that the rendered view, or respective digital image depicting the rendered view, depicts the 3D digital object as having at least a portion of untextured surface, e.g., the second modified texture data object omits texture data for the portion of the rendered surface. In addition, the texture data generation computing system determines that the portion of untextured surface in the rendered view fulfills a threshold value, e.g., the rendered view lacks texture data on a threshold portion of the surface visible in the view. For example, the texture data generation computing systemidentifies, from the sample rendered images, one or more sampling rendered images that depict rendered views having a threshold value of untextured appearance. In addition, the texture data generation computing systemgenerates, for the identified sampling rendered images, respective noisy images or noisy vector representations.

560 500 250 257 227 227 At block, the processinvolves generating, such as by the trained diffusion image generation model, at least one additional diffusion-generated digital image, such as a respective additional diffusion-generated digital image for each rendered view selected from the sampling set. In addition, the additional diffusion-generated digital image depicts an additional diffusion-generated view depicting the 3D digital object, such as an additional diffusion-generated view that depicts an additional texture of the 3D digital object. For example, the additional diffusion-generated view that depicts the 3D digital object having an infilled texture, such as a texture that is based on a combination of multiple regions of refined texture visible in the sampling set of multiple rendered views. In some cases, the trained diffusion image generation model generates the at least one additional diffusion-generated digital image based on a combination of the at least one selected rendered view with refined texture from the sampling set of multiple rendered views. For example, the trained diffusion modelgenerates the additional diffusion-generated imagebased on a combination of a particular sampling rendered image from the sample rendered imagesand refined texture visible in additional rendered images from the sample rendered images. In some cases, the trained diffusion image generation model generates the additional diffusion-generated digital image based on one or more cross-frame attention features (or modified cross-frame attention features), such as a respective cross-frame attention feature set for each additional diffusion-generated digital image corresponding to each rendered view selected for generation from the sampling set. In some cases, the additional diffusion-generated digital image is a 2D image generated via the trained diffusion image generation model, such as a particular diffusion-generated digital image depicting a particular diffusion-generated view, e.g., excluding multi-view diffusion-generated images.

570 500 210 257 210 215 At block, the processinvolves modifying the texture data object, such as a third modification to the second modified texture data object performed by the texture data generation computing system, to describe the additional texture depicted in the at least one additional diffusion-generated digital image. In some cases, the third modification to the texture data object includes calculating one or more third texture data values that describe the additional texture, such as the infilled texture from the additional diffusion-generated digital image. In addition, the third modification to the texture data object includes modifying one or more data structures of the texture data object, such as the texels or other data structures, based on the third texture data values. For example, the texture data generation computing systemcalculates one or more third texture data values that describe that describe the infilled texture appearance depicted in the additional diffusion-generated image. In addition, the texture data generation computing systemmodifies (or further modifies) the second modified texture data objectto include texels (or other texture data structures) that are based on the calculated third texture data values.

5 FIG. 540 550 560 570 In some implementations, one or more operations described in regard to, such as blocks,,, or, are associated with a third pass of texture-generation techniques performed by the example texture data generation computing system, such as a third pass to generate infilled texture data.

6 FIG. 1 5 FIGS.- 6 FIG. 1 5 FIGS.- 600 600 is a flow chart depicting an example of a processfor generating texture data for 3D digital objects based on one or more cross-frame attention features, such as cross-frame attention features determined via a trained diffusion image generation model. In some embodiments, such as described in regards to, a computing device executing a texture data generation computing system implements operations described in, by executing suitable program code. For illustrative purposes, the processis described with reference to the examples depicted in. Other implementations, however, are possible.

600 430 520 600 400 500 430 520 560 4 5 FIGS.and/or In some implementations, one or more operations involved in the processare performed by a texture data generation computing system, such as the example texture data generation computing system described in regard to. For example, the example diffusion image generation model included in the texture data generation computing system, such as described in regard to at least blocksor, can be configured to determine one or more sets of cross-frame attention features. In some cases, one or more operations described in regard to the processcan be implemented by the example diffusion image generation model in regard to one or more operations of processesand/or, such as generating one or more diffusion-generated digital images based on determination of one or more cross-frame attention feature sets, as described in regard to one or more of blocks,, or.

610 600 420 520 210 233 225 233 225 225 210 233 At block, the processinvolves generating or modifying at least one mask image, such as by the example texture data generation computing system (or another texture data generation computing system). In addition, the at least one mask image is a 2D digital image generated by the texture data generation computing system (or a component thereof). In some cases, the at least one mask image is generated based on a multi-view rendered image, such as the first multi-view rendered digital image described in regard to blockor the third multi-view rendered digital image described in regard to block. For example, the texture data generation computing systemgenerates the mask imagebased on the multi-view rendered image. In some cases, the at least one mask image has multiple mask regions, each of which corresponds to a respective view in the multi-view rendered image on which the mask image is based. For example, the mask imagehas twenty-five mask regions respectively corresponding to the twenty-five rendered views in the multi-view rendered image. In addition, the at least one mask image has one or more image characteristics that are based on the multi-view rendered image or sampling rendered image. For example, responsive to determining that the multi-view rendered imagehas an image size of 2000×2000 pixels, the texture data generation computing systemgenerates the mask imagehaving an image size of 2000×2000 pixels.

620 600 210 235 233 225 335 335 335 335 325 325 325 325 225 233 210 235 a b c a b c At block, the processinvolves generating or modifying at least one noisy image, such as by the example texture data generation computing system (or another texture data generation computing system). In addition, the at least one noisy image is a 2D digital image generated by the texture data generation computing system (or a component thereof). In some cases, the at least one noisy image depicts digital noise or is associated with a noisy vector representation, such as a Gaussian distribution of digital noise. In some cases, the at least one noisy image has multiple noisy regions, each of which corresponds to a respective mask region in the mask image and to a respective view in the multi-view rendered image on which the mask image is based. For example, the texture data generation computing systemgenerates the noisy imagebased on one or more of the mask imageor the multi-view rendered image. In addition, the noisy imageincludes the noisy regions,, and, each respectively corresponding to the rendered views,, andin the multi-view rendered image. In some cases, the at least one noisy image has one or more image characteristics that are based on the multi-view rendered image or sampling rendered image corresponding to the at least one mask image. For example, responsive to determining that one or more of the multi-view rendered imageor the mask imagehas an image size of 2000×2000 pixels, the texture data generation computing systemgenerates the noisy imagehaving an image size of 2000×2000 pixels.

630 600 430 520 250 265 255 235 250 265 255 235 365 355 335 365 335 335 335 355 355 355 a a a a b c b c At block, the processinvolves determining or modifying one or more sets of cross-frame attention features, such as by the example diffusion image generation model (or another diffusion image generation model). In some cases, the diffusion image generation model determines a respective set of cross-frame attention features for each particular noisy region in the noisy image, or for each particular diffusion-generated view included in a multi-view diffusion-generated image generated based on the noisy image, such as the multi-view diffusion-generated images described in regard to one or more of blocksor. For example, the trained diffusion modeldetermines the set of cross-frame attention featurescorresponding to one or more of the multi-view diffusion-generated imageor the noisy image. In addition, the trained diffusion modeldetermines a respective set, in the cross-frame attention feature sets, for each particular diffusion-generated view in the multi-view diffusion-generated imageor each particular noisy region in the noisy image. In addition, each respective cross-frame attention feature set for a particular noisy region is generated (or modified) based on cross-frame features of one or more additional rendered views from the multi-view rendered digital image, or additional noisy regions from the corresponding noisy image. For example, the first cross-frame attention feature setis generated for the corresponding diffusion-generated viewand corresponding noisy region. In addition, the first cross-frame attention feature setis generated based on the image features of the noisy regions,, and additional corresponding noisy regions in the noisy image(or the corresponding diffusion-generated views,, and additional diffusion-generated views in the multi-view diffusion-generated image).

640 600 250 255 253 235 250 255 253 355 355 335 355 355 335 335 a a b c b c At block, the processinvolves modifying one or more of the noisy regions in the at least one noisy image, such as by the diffusion image generation model. In some cases, the diffusion image generation model modifies each particular one of the noisy regions based on the respective set of cross-frame attention features for the particular noisy region. In some cases, the diffusion image generation model generates, from each particular modified noisy region, a respective diffusion-generated view in the multi-view diffusion-generated image. For example, the trained diffusion modelgenerates one or more of the multi-view diffusion-generated imagesorbased on one or more modifications (e.g., iterative modifications) to the noisy image. In addition, the trained diffusion modelgenerates each particular diffusion-generated view in the multi-view diffusion-generated imagesorvia respective modifications to the corresponding noisy regions. For instance, the diffusion-generated viewin the multi-view diffusion-generated imageis generated based on one or more modifications to the corresponding noisy region. In addition, the diffusion-generated viewsandare respectively generated based on respective one or more modifications to the corresponding noisy regionsand. In some cases, each particular modified noisy region depicts a texture, such as an initial texture or a refined texture, that is based on a set of cross-frame attention features corresponding to the particular noisy region.

600 In some implementations, one or more operations related to the processare repeated, such as for iterative modifications to a noisy region by the example diffusion image generation model. For instance, the example diffusion image generation model could repeat modifications to the noisy image until some or all (e.g., a threshold quantity) of the noisy regions include respective diffusion-generated views depicting the initial texture or refined texture. In some cases, the example diffusion image generation model could repeat modifications to the noisy image until the respective diffusion-generated views have a particular image quality, such as fulfilling a threshold value for image resolution, a threshold quantity of iterations, or other example threshold data values for determining a particular quality for a diffusion-generated digital image.

7 FIG. Any suitable computing system or group of computing systems can be used for performing the operations described herein. For example,is a block diagram depicting a computing system that can be configured to implement a texture data generation computing system, according to certain embodiments.

701 702 704 702 704 702 702 The depicted example of a computing systemincludes one or more processorscommunicatively coupled to one or more memory devices. The processorexecutes computer-executable program code or accesses information stored in the memory device. Examples of processorinclude a microprocessor, an application-specific integrated circuit (“ASIC”), a field-programmable gate array (“FPGA”), or other suitable processing device. The processorcan include any number of processing devices, including one.

704 725 215 250 265 725 125 155 225 255 223 253 7 FIG. The memory deviceincludes any suitable non-transitory computer-readable medium for storing multi-view digital image pairs, the texture data object, the diffusion model, the cross-frame attention feature sets, and other received or determined values or data objects. In, the multi-view digital image pairsincludes at least one pair (or other quantity of corresponding multi-view images) of a multi-view rendered image and a corresponding multi-view diffusion-generated image, such as the multi-view rendered imageand the multi-view diffusion-generated image, the multi-view rendered imageand the multi-view diffusion-generated image, the multi-view rendered imageand the multi-view diffusion-generated image, or other pairs of multi-view digital images described herein. The computer-readable medium can include any electronic, optical, magnetic, or other storage device capable of providing a processor with computer-readable instructions or other program code. Non-limiting examples of a computer-readable medium include a magnetic disk, a memory chip, a ROM, a RAM, an ASIC, optical storage, magnetic tape or other magnetic storage, or any other medium from which a processing device can read instructions. The instructions may include processor-specific instructions generated by a compiler or an interpreter from code written in any suitable computer-programming language, including, for example, C, C++, C#, Visual Basic, Java, Python, Perl, JavaScript, and ActionScript.

701 701 708 706 701 706 701 The computing systemmay also include a number of external or internal devices such as input or output devices. For example, the computing systemis shown with an input/output (“I/O”) interfacethat can receive input from input devices or provide output to output devices. A buscan also be included in the computing system. The buscan communicatively couple one or more components of the computing system.

701 702 725 215 250 265 704 702 725 215 250 265 704 725 215 250 265 1 6 FIGS.- 7 FIG. The computing systemexecutes program code that configures the processorto perform one or more of the operations described above with respect to. The program code includes operations related to, for example, one or more of the multi-view digital image pairs, the texture data object, the diffusion model, the cross-frame attention feature sets, or other suitable applications or memory structures that perform one or more operations described herein. The program code may be resident in the memory deviceor any suitable computer-readable medium and may be executed by the processoror any other suitable processor. In some embodiments, the program code described above, the multi-view digital image pairs, the texture data object, the diffusion model, and the cross-frame attention feature setsare stored in the memory device, as depicted in. In additional or alternative embodiments, one or more of the multi-view digital image pairs, the texture data object, the diffusion model, the cross-frame attention feature sets, or the program code described above are stored in one or more memory devices accessible via a data network, such as a memory device accessible via a cloud service.

701 710 710 712 710 715 701 712 715 701 715 190 710 7 FIG. The computing systemdepicted inalso includes at least one network interface. The network interfaceincludes any device or group of devices suitable for establishing a wired or wireless data connection to one or more data networks. Non-limiting examples of the network interfaceinclude an Ethernet network adapter, a modem, and/or the like. A remote computing systemis connected to the computing systemvia the data networks, and the remote computing systemcan perform some of the operations described herein, such as rendering multi-view rendered digital images or sampling rendered digital images. The computing systemis able to communicate with one or more additional computing systems, such as the remote computing systemand the additional computing device, using the network interface.

Numerous specific details are set forth herein to provide a thorough understanding of the claimed subject matter. However, those skilled in the art will understand that the claimed subject matter may be practiced without these specific details. In other instances, methods, apparatuses, or systems that would be known by one of ordinary skill have not been described in detail so as not to obscure claimed subject matter.

Unless specifically stated otherwise, it is appreciated that throughout this specification discussions utilizing terms such as “processing,” “computing,” “calculating,” “determining,” and “identifying” or the like refer to actions or processes of a computing device, such as one or more computers or a similar electronic computing device or devices, that manipulate or transform data represented as physical electronic or magnetic quantities within memories, registers, or other information storage devices, transmission devices, or display devices of the computing platform.

The system or systems discussed herein are not limited to any particular hardware architecture or configuration. A computing device can include any suitable arrangement of components that provides a result conditioned on one or more inputs. Suitable computing devices include multipurpose microprocessor-based computer systems accessing stored software that programs or configures the computing system from a general purpose computing apparatus to a specialized computing apparatus implementing one or more embodiments of the present subject matter. Any suitable programming, scripting, or other type of language or combinations of languages may be used to implement the teachings contained herein in software to be used in programming or configuring a computing device.

Embodiments of the methods disclosed herein may be performed in the operation of such computing devices. The order of the blocks presented in the examples above can be varied—for example, blocks can be re-ordered, combined, and/or broken into sub-blocks. Certain blocks or processes can be performed in parallel.

The use of “adapted to” or “configured to” herein is meant as open and inclusive language that does not foreclose devices adapted to or configured to perform additional tasks or steps. Additionally, the use of “based on” is meant to be open and inclusive, in that a process, step, calculation, or other action “based on” one or more recited conditions or values may, in practice, be based on additional conditions or values beyond those recited. Headings, lists, and numbering included herein are for ease of explanation only and are not meant to be limiting.

While the present subject matter has been described in detail with respect to specific embodiments thereof, it will be appreciated that those skilled in the art, upon attaining an understanding of the foregoing, may readily produce alterations to, variations of, and equivalents to such embodiments. Accordingly, it should be understood that the present disclosure has been presented for purposes of example rather than limitation, and does not preclude inclusion of such modifications, variations, and/or additions to the present subject matter as would be readily apparent to one of ordinary skill in the art.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06T G06T15/4 G06T5/70 G06T17/205 G06T2207/20084 G06T2210/36

Patent Metadata

Filing Date

August 26, 2024

Publication Date

February 26, 2026

Inventors

Romain Rouffet

Vladimir Kim

Valentin Deschaintre

Thibault Groueix

Rosalie Martin

Duygu Ceylan Aksit

Chun-Hao Huang

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search