Semi-generative artificial intelligence modelling between domains is provided. The method comprises receiving a source image of an object in a first domain and diffusing the source image through a source diffusion model to generate a first Gaussian distribution in the first domain. Embeddings are generated from metadata which provides constraints for image reconstruction. The embeddings are fed into dual diffusion implicit bridges. The first Gaussian distribution is sampled and mapped, through the dual diffusion implicit bridges, from the first Gaussian distribution to a second Gaussian distribution in a second domain. The second Gaussian distribution is then reversed diffused through a target diffusion model to generate a target image of the object in the second domain in accordance with the metadata.
Legal claims defining the scope of protection, as filed with the USPTO.
. A computer-implemented method for semi-generative artificial intelligence modelling between domains, the method comprising:
. The method of, wherein the first domain comprises three-dimensional CAD (computer assisted drawing) image data.
. The method of, wherein the second domain comprises real world image data.
. The method of, wherein the target image comprises a photorealistic image.
. The method of, wherein the metadata comprises at least one of:
. The method of, wherein the metadata comprises information provided by an artificial intelligence design parser that cross-references text and specifications against images and video.
. The method of, wherein the source diffusion model and target diffusion model comprise Schrödinger bridges.
. A system for semi-generative artificial intelligence modelling between domains, the system comprising:
. The system of, wherein the first domain comprises three-dimensional CAD (computer assisted drawing) image data.
. The system of, wherein the second domain comprises real world image data.
. The system of, wherein the target image comprises a photorealistic image.
. The system of, wherein the metadata comprises at least one of:
. The system of, wherein the metadata comprises information provided by an artificial intelligence design parser that cross-references text and specifications against images and video.
. The system of, wherein the source diffusion model and target diffusion model comprise Schrödinger bridges.
. A computer program product for semi-generative artificial intelligence modelling between domains, the computer program product comprising:
. The computer program product of, wherein the first domain comprises three-dimensional CAD (computer assisted drawing) image data.
. The computer program product of, wherein the second domain comprises real world image data.
. The computer program product of, wherein the metadata comprises at least one of:
. The computer program product of, wherein the metadata comprises information provided by an artificial intelligence design parser that cross-references text and specifications against images and video.
. The computer program product of, wherein the source diffusion model and target diffusion model comprise Schrödinger bridges.
Complete technical specification and implementation details from the patent document.
The present disclosure relates generally to artificial intelligence systems, and more specifically to controlling randomized noise in generative AI models.
Object detection systems implementing artificial intelligence are trained to accurately detect target objects using large realistic visual data sets, which have high accuracy, specificity, and diversity. High accuracy is achieved by reducing biases from human labelers. High specificity is achieved by capturing different images of the target object in various environmental conditions. High diversity is achieved by including various images of the target object from various view angles and perspectives. However, developing large realistic visual data sets with high accuracy, specificity, and diversity is a challenge, especially using conventional methods of manually taking pictures of the target objects using cameras, and having a human operator label each image with a ground truth target object class. These limitations have slowed the development and deployment of large scale object detection systems.
An illustrative embodiment provides a method for semi-generative artificial intelligence modelling between domains. The method comprises receiving a source image of an object in a first domain and diffusing the source image through a source diffusion model to generate a first Gaussian distribution in the first domain. Embeddings are generated from metadata which provides constraints for image reconstruction. The embeddings are fed into dual diffusion implicit bridges. The first Gaussian distribution is sampled and mapped, through the dual diffusion implicit bridges, from the first Gaussian distribution to a second Gaussian distribution in a second domain. The second Gaussian distribution is then reversed diffused through a target diffusion model to generate a target image of the object in the second domain in accordance with the metadata.
Another illustrative embodiment provides a system for semi-generative artificial intelligence modelling between domains. The system comprises a storage device that stores program instructions and one or more processors operably connected to the storage device and configured to execute the program instructions to cause the system to: receive a source image of an object in a first domain; diffuse the source image through a source diffusion model to generate a first Gaussian distribution in the first domain; generate embeddings from metadata, wherein the metadata provides constraints for image reconstruction; feed the embeddings into dual diffusion implicit bridges; sample from the first Gaussian distribution; map, through the dual diffusion implicit bridges, the sample from the first Gaussian distribution to a second Gaussian distribution in a second domain; and reverse diffuse the second Gaussian distribution through a target diffusion model to generate a target image of the object in the second domain in accordance with the metadata.
Another illustrative embodiment provides a computer program product for semi-generative artificial intelligence modelling between domains. The computer program product comprise a computer-readable storage medium having program instructions embodied thereon to perform the steps of: receiving a source image of an object in a first domain; diffusing the source image through a source diffusion model to generate a first Gaussian distribution in the first domain; generating embeddings from metadata, wherein the metadata provides constraints for image reconstruction; feeding the embeddings into dual diffusion implicit bridges; sampling from the first Gaussian distribution; mapping, through the dual diffusion implicit bridges, the sample from the first Gaussian distribution to a second Gaussian distribution in a second domain; and reverse diffusing the second Gaussian distribution through a target diffusion model to generate a target image of the object in the second domain in accordance with the metadata.
The features and functions can be achieved independently in various embodiments of the present disclosure or may be combined in yet other embodiments in which further details can be seen with reference to the following description and drawings.
The illustrative embodiments recognize and take into account that generative artificial intelligence (AI) models lack accuracy and validation layers to verify integrity of generated content.
The illustrative embodiments provide a method of semi-generative AI modeling in which adds an additional layer of accuracy and fidelity to generated contents from generative AI models. By utilizing meta data from Design, Production, and Inspection data in manufacturing and production environment the illustrative embodiments control output of the generative AI models through Dual-Fusion techniques.
is a block diagram of a semi-generative system depicted in accordance with an illustrative embodiment. Semi-generative systemutilizes data from a variety of sources including real images and videoof artifacts, three-dimensional (3D) computer assisted design (CAD) data, two-dimensional (2D) CAD drawings, and alternative signal modalities.
Real images and videoof artifacts act as ground truth data for real domain dataset generation. 3D CAD dataprovides another domain for image data. 3D CAD datacan be derived from engineering Product Data Management (PDM) and is used for synthetic domain dataset generation.
2D CAD drawingsprovide production and design engineering metadata (i.e., anything data that is not a 3D object). Alternative signal modalitiesprovide additional metadata such as textual data, audio data, or graph representation describing real images and video. The metadata provides constraints to ensure that AI generated images conform to desired characteristics. When converting image data from one domain to another, the constraints exert control over the output domain (e.g., a specific type of object viewed from a specific angle). Such constraints help ensure that the output image is as close as possible to an original input image in all respects.
The data and metadata from real images and videoof artifacts, three-dimensional (3D) computer assisted design (CAD) data, two-dimensional (2D) CAD drawings, and alternative signal modalitiesare fed into unsupervised/semi-supervised model generator, which comprises an end-to-end object detection system that identifies objects using structured data and unstructured data. Images are cropped and converted to latent space and then clustered according to visual similarity. Clusters are then matched to features of a provided example and labeled.
The different modalities of data are then fed by unsupervised/semi-supervised model generatorinto dual fusion pipeline.
AI design parserreads and interprets information from production references(e.g., engineering drawings) regarding limitations of the target object to be generated to match with real images and videoto provide additional constraints. AI design parseris able to provide information that might be missing from the alternative signal modalities. AI design parsercan cross-reference text, geometry, engineering specifications, and mathematics against real images and video.
Dual fusion pipelinecomprises a source diffusion modelthat is trained on a first domainof data (e.g., 3D CAD). Source diffusion modeluses a source Schrödinger bridgeto generate a first Gaussian distributionin the first domainfrom image data. (See). A diffusion Schrödinger bridge seeks to find a stochastic process that connects two probability distributions in the most probable way within given constraints.
Dual fusion pipelinealso comprises a target diffusion modelthat is trained on a second domainof data (e.g., real world). Target diffusion modeluses target Schrödinger bridgefor reverse diffusion to generate image data in the second domainfrom a second Gaussian distribution.
Dual diffusion implicit bridgesprovide translation between first Gaussian distributionand second Gaussian distributionby mapping from the first domainto the second domain. (See). Dual diffusion implicit bridgesare a back-to-back concatenation of source to latent space and from latent space to target Schrödinger bridge. Dual diffusion implicit bridgescan incorporate embeddingsgenerated from metadata provided by 2D CAD, alternative signal modalities, and AI design parser. The metadata represented by embeddingsprovide constraints when translation between Gaussian distributions of different domains.
The dual fusion pipelinehas the advantage of images with superior FID (Fréchet Inception Distance) scores than other models. FID is a measure of the quality of images created by generative models. FID compares the distribution of generated images with the distribution of a set of real images (which function as ground truth). A lower FID score indicates greater similarity between generated and real images. Diffusion models can generate lower FID scores than other types of models such as Generative Adversarial Networks (GANs).
The output from the dual fusion pipelineis fed into generative AI realistic synthetic data generator, which generates synthetic data for use by object detection models.
Semi-generative systemcan be implemented in software, hardware, firmware, or a combination thereof. When software is used, the operations performed by semi-generative systemcan be implemented in program code configured to run on hardware, such as a processor unit. When firmware is used, the operations performed by semi-generative systemcan be implemented in program code and data and stored in persistent memory to run on a processor unit. When hardware is employed, the hardware can include circuits that operate to perform the operations in semi-generative system.
In the illustrative examples, the hardware can take a form selected from at least one of a circuit system, an integrated circuit, an application specific integrated circuit (ASIC), a programmable logic device, or some other suitable type of hardware configured to perform a number of operations. With a programmable logic device, the device can be configured to perform the number of operations. The device can be reconfigured at a later time or can be permanently configured to perform the number of operations. Programmable logic devices include, for example, a programmable logic array, a programmable array logic, a field programmable logic array, a field programmable gate array, and other suitable hardware devices. Additionally, the processes can be implemented in organic components integrated with inorganic components and can be comprised entirely of organic components excluding a human being. For example, the processes can be implemented as circuits in organic semiconductors.
Computer systemis a physical hardware system and includes one or more data processing systems. When more than one data processing system is present in computer system, those data processing systems are in communication with each other using a communications medium. The communications medium can be a network. The data processing systems can be selected from at least one of a computer, a server computer, a mobile device such as a tablet computer, or some other suitable data processing system.
As depicted, computer systemincludes a number of processor unitsthat are capable of executing program codeimplementing processes in the illustrative examples. As used herein, a processor unit in the number of processor unitsis a hardware device and is comprised of hardware circuits such as those on an integrated circuit that respond and process instructions and program code that operate a computer. When a number of processor unitsexecute program codefor a process, the number of processor unitsis one or more processor units that can be on the same computer or on different computers. In other words, the process can be distributed between processor units on the same or different computers in a computer system. Further, the number of processor unitscan be of the same type or different types of processor units. For example, a number of processor units can be selected from at least one of a single core processor, a dual-core processor, a multi-processor core, a general-purpose central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), or some other type of processor unit.
depicts a diagram illustrating a diffusion model with which the illustrative embodiment can be implemented. Diffusion modelis an example of first and second diffusion models,in.
Diffusion modelis trained by progressively adding noise to an image in stages (i.e., X1, X2, etc.) and then reconstructing that image (reverse diffusion) from each stage. As the diffusion modellearns it is able to add more and more noise to the original image and successfully reconstruct that original image. In the final stage (Z) of training, the diffusion modelgenerates a random Gaussian distribution of noise from which it can reconstruct the original image.
For the diffusion process in generative modeling, the goal is to design algorithms which transform a given distribution Pinto a given data distribution P:
Xis interpreted as a noising process, perturbing the data distribution P. Pis the invariant distribution of the noising process. The process gradually adds noise, and if it is done in small steps for long enough time steps for T>0, Xgets close to P=(0,1/α) with mean of 0 and standard deviation of
For reverse diffusion, in principle the process can (approximately) sample from Pby first sampling from Pand reversing the dynamics of X. This time-reversal operation leads to another diffusion process with explicit drift and diffusion matrix:
where Pt is the density of X. The process is then required to follow Markov chain and discretization of the diffusion process.
The main limitation of generative modeling is its requirement of a large number of step sizes so that the initial forward dynamics is close to the distribution Pand small enough step sizes so that the neural network approximation holds.
Adding a Schrödinger Bridge allows significant reduction in the number of step sizes needed in order to define score-based generative modeling.
Assuming a reference diffusion Xwith distributiondescribing the process adding noise to the data, the goal is to find π* such that π*=Pand π*>=Pand minimize the Kullback-Leibler divergence between π*and:
depicts a diagram illustrating an example of image translation between domains in accordance with an illustrative embodiment. This image-to-image translation relies on source diffusion modeland target diffusion modelthat are trained independently on separate domains A and B, respectively. In the present example, domain A comprises CAD image data, and domain B comprises real world images. Training source diffusion modeland target diffusion modelindependently on each domain allows identification of the Pfor each domain.
Diffusion by source diffusion modelproduces Gaussian distributionwhich represents latent encodings for domain A. These latent encodings for source images with the source diffusion modelare then fed into target diffusion modelto construct target images. In this manner, the Gaussian distributionof domain A is translated into Gaussian distributionof domain B, which contains the latent encodings for domain B. Target diffusion modelthen performs reverse diffusion on Gaussian distributionto construct the target image. Therefore, in the present example a CAD image in domain A is transformed into a photorealistic image in domain B.
The processes of encoding source images and decoding to generate target images are defined via ordinary differential equations (ODEs). Therefore, the process is cycle consistent up to discretization errors of ODS solvers.
depicts a diagram illustrating semi-generative AI image generation using metadata constraints in accordance with an illustrative embodiment. In this example, source diffusion modelgenerates Gaussian distributionfrom source 3D CAD imagein the CAD domain. This first Gaussian distributionis translated to a second Gaussian distributionin the real world domain by dual diffusion implicit bridges
For the second half of the process (from latent space to the second domain) more meta datais added to help the reverse diffusion to the real world domain. Examples of added metadata might include clustering of data, prompt embedding (e.g., black color), material (e.g., steel), 2D drawing information. The metadataacts as constraints on the Schrödinger bridge of the target diffusion model, which generates a photorealistic imagein the real world domain from Gaussian distribution. The constraints provide control that ensures not only that the resultant target image is the same type of image/object as the source image but also is shown facing in the same direction, at the same angle, etc. Such precision and specificity is important for applications such as generating synthetic training data for object detection model training. Therefore, the metadata helps to ensure that the output is as close as possible to the original input.
depicts an example of imaging decoding from a 3D CAD model to a photorealistic model in accordance with an illustrative embodiment.
depicts a flowchart illustrating a process for semi-generative artificial intelligence modelling between domains in accordance with an illustrative embodiment. Processcan be implemented in semi-generative systemin.
Processbegins by receiving a source image of an object in a first domain (operation). The first domain might comprise three-dimensional CAD (computer assisted drawing) image data, and the source image might comprise a CAD image. Processdiffuses the source image through a source diffusion model to generate a first Gaussian distribution in the first domain (operation).
Processthen generates embeddings from metadata, wherein the metadata provides constraints for image reconstruction (operation). The metadata comprises clustering of data, prompt embedding, segmentation mask, two-dimensional drawing information, text description of the target object, audio description of the target object, graph representation of the target object, material, or background. The metadata might also comprise information provided by an artificial intelligence design parser that cross-references text and specifications against images and video.
The embeddings are fed into dual diffusion implicit bridges (operation).
Processsamples from the first Gaussian distribution (operation) and maps, through the dual diffusion implicit bridges, the sample from the first Gaussian distribution to a second Gaussian distribution in a second domain (operation). The second domain might comprise real world image data.
Processthen reverse diffuses the second Gaussian distribution through a target diffusion model to generate a target image of the object in the second domain in accordance with the metadata (operation). In the case of real world image data domain, the target image generated from the reverse diffusion of the second Gaussian distribution comprises a photorealistic image.
Processthen ends.
Turning now to, an illustration of a block diagram of a data processing system is depicted in accordance with an illustrative embodiment. Data processing systemmay be used to implement computer systemin. In this illustrative example, data processing systemincludes communications framework, which provides communications between processor unit, memory, persistent storage, communications unit, input/output (I/O) unit, and display. In this example, communications frameworktakes the form of a bus system.
Unknown
November 27, 2025
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.