Image Transformation with a Hybrid Autoencoder and Generative Adversarial Network Machine Learning Architecture

PublishedOctober 13, 2020

Assigneenot available in USPTO data we have

Technical Abstract

Patent Claims

20 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A system comprising: an encoder artificial neural network (ANN) configured to receive an input image patch and produce a feature vector therefrom, wherein the encoder ANN has been trained with a first plurality of domain training images such that an output image patch visually resembling the input image patch is configured to be generated from the feature vector; and a generator ANN configured to receive the feature vector and produce a generated image patch from the feature vector, wherein the generator ANN has been trained with feature vectors derived from the first plurality of domain training images and a second plurality of generative training images such that the generated image patch visually resembles the input image patch but is constructed of newly-generated image elements visually resembling one or more image patches from the second plurality of generative training images.

Plain English translation pending...

Claim 2

Original Legal Text

2. The system of claim 1 , wherein visual content of each of the second plurality of generative training images adhere to a common theme.

Plain English Translation

The invention relates to a system for generating and training machine learning models using synthetic images. The system addresses the challenge of obtaining sufficient high-quality training data for image recognition tasks, particularly when real-world datasets are limited or expensive to acquire. The system generates a first set of training images using a generative model, then produces a second set of generative training images by modifying the first set. These modifications may include altering visual attributes such as color, texture, or object orientation while preserving the underlying structure. The second set of images adheres to a common visual theme, ensuring consistency in the training data. This approach enhances model robustness by exposing it to varied yet thematically coherent variations of the same content. The system may also include a validation module to assess the quality and diversity of the generated images, ensuring they meet predefined criteria for training effectiveness. By leveraging synthetic data generation, the system reduces reliance on real-world datasets while improving model performance in tasks like object detection, classification, or segmentation.

Claim 3

Original Legal Text

3. The system of claim 2 , wherein the common theme is one of flowers, eyes, stars, galaxies, skulls, numbers, cartoons, or sunsets.

Plain English Translation

This invention relates to a system for generating and displaying visual content, particularly for use in digital displays or interfaces. The system addresses the challenge of creating visually engaging and thematically cohesive content by providing a structured approach to content generation. The system includes a content generation module that produces visual elements based on predefined themes, ensuring consistency and aesthetic appeal. These themes are selected from a predefined set, which includes flowers, eyes, stars, galaxies, skulls, numbers, cartoons, or sunsets. The generated visual elements are then displayed on a user interface, allowing for dynamic and interactive visual experiences. The system may also include a user input module to enable customization of the visual content, such as adjusting the theme or modifying individual elements. The overall goal is to enhance user engagement by providing visually cohesive and thematically relevant content.

Claim 4

Original Legal Text

4. The system of claim 1 , wherein each of the first plurality of domain training images contain a representation of a human face.

Plain English Translation

The invention relates to a machine learning system for training domain-specific image recognition models. The system addresses the challenge of improving model accuracy by using domain-specific training data. The core system includes a first set of domain training images and a second set of general training images. The first set is used to train a domain-specific model, while the second set is used to train a general model. The system then combines these models to enhance recognition performance for domain-specific images while maintaining broad applicability. In this specific embodiment, each image in the first set contains a representation of a human face. This indicates the system is optimized for facial recognition tasks within a specific domain, such as surveillance, biometric authentication, or social media applications. The domain-specific training ensures the model adapts to variations in lighting, pose, or other factors unique to the target domain, improving accuracy compared to a purely general model. The combined approach leverages both specialized and general training data to balance domain-specific precision with broader applicability. This method is particularly useful in applications where facial recognition must perform reliably across diverse conditions while maintaining high accuracy in a specific operational context.

Claim 5

Original Legal Text

5. The system of claim 1 , wherein the feature vector has between 16 and 2048 elements.

Plain English Translation

A system for processing data using machine learning techniques addresses the challenge of efficiently representing and analyzing high-dimensional data. The system generates a feature vector from input data, where the feature vector serves as a compact numerical representation of the input's key characteristics. This feature vector is then used to train or evaluate a machine learning model, enabling tasks such as classification, regression, or clustering. The feature vector's dimensionality is optimized to balance computational efficiency and representational power, with a specified range of 16 to 2048 elements. This range ensures that the feature vector is neither too sparse (which may lose critical information) nor too large (which may introduce redundancy and computational overhead). The system may further include preprocessing steps to normalize or transform the input data before feature extraction, ensuring consistency and improving model performance. Additionally, the system may employ dimensionality reduction techniques to refine the feature vector, enhancing its effectiveness in downstream machine learning tasks. The overall approach aims to improve the accuracy and efficiency of machine learning models by optimizing the feature representation process.

Claim 6

Original Legal Text

6. The system of claim 1 , wherein the input image patch is one of a set of input image patches cropped from an image such that the set of input image patches is configured to be combined to form 80% or more of the image.

Plain English Translation

This invention relates to image processing systems designed to analyze or reconstruct images using multiple cropped image patches. The system addresses the challenge of efficiently processing large images by dividing them into smaller, manageable patches while ensuring comprehensive coverage. The key innovation involves a method where a set of input image patches is cropped from an original image such that these patches can be combined to reconstruct at least 80% of the original image. This approach balances computational efficiency with data integrity, allowing for tasks like image analysis, compression, or reconstruction without significant loss of information. The system likely includes mechanisms to select, process, and reassemble these patches, ensuring that the combined patches retain the essential features of the original image. This technique is particularly useful in applications requiring high-resolution image processing, such as medical imaging, satellite imagery, or high-definition video analysis, where full image processing may be computationally intensive or impractical. The invention ensures that the majority of the image is preserved, minimizing errors or omissions during patch-based processing.

Claim 7

Original Legal Text

7. The system of claim 6 , wherein size and location within the image of the input image patch is randomly selected.

Plain English Translation

The invention relates to image processing systems that analyze input images by selecting and processing image patches. The system addresses the challenge of efficiently extracting meaningful features from images by randomly determining the size and location of an input image patch within the image. This random selection helps improve the robustness and generalization of the system by reducing bias in feature extraction. The system includes a patch selection module that generates a random size and position for the image patch, ensuring variability in the input data. The selected patch is then processed by an analysis module, which may include techniques such as convolutional neural networks or other machine learning models, to extract features or perform classification tasks. The random selection of patch size and location enhances the system's ability to handle diverse image content and improves its performance in tasks like object detection, segmentation, or image recognition. By introducing randomness in patch selection, the system avoids overfitting to specific regions or scales, leading to more reliable and accurate results. The invention is particularly useful in applications where images vary in content, scale, or orientation, such as medical imaging, autonomous driving, or surveillance systems.

Claim 8

Original Legal Text

8. The system of claim 1 , wherein the input image patch is from a frame of a multi-frame video.

Plain English Translation

A system processes input image patches from frames of a multi-frame video to analyze or enhance visual content. The system includes a neural network configured to receive an input image patch and generate an output image patch. The neural network is trained to perform a specific task, such as image denoising, super-resolution, or object detection, by learning from a dataset of training images. The system may also include a preprocessing module to prepare the input image patch before feeding it into the neural network, such as resizing, normalization, or noise reduction. Additionally, the system may include a post-processing module to refine the output image patch, such as applying filters, color correction, or sharpening. The neural network may be a convolutional neural network (CNN), a recurrent neural network (RNN), or a transformer-based model, depending on the application. The system can process sequential frames of a video to maintain temporal consistency, ensuring smooth transitions between frames. This approach improves the quality of video processing tasks by leveraging both spatial and temporal information.

Claim 9

Original Legal Text

9. The system of claim 1 , wherein the first plurality of domain training images consists of photorealistic images.

Plain English Translation

This invention relates to a machine learning system for training domain-specific image recognition models. The system addresses the challenge of improving model accuracy when training on synthetic or non-photorealistic images, which often leads to poor generalization in real-world applications. The system includes a training module that processes a first set of domain training images, which are photorealistic, and a second set of domain training images, which may be synthetic or non-photorealistic. The photorealistic images provide high-fidelity reference data to enhance the model's ability to recognize real-world variations, while the second set of images may include variations or augmentations to improve robustness. The system also includes a model training component that combines these datasets to train a neural network, ensuring the model learns both realistic and generalized features. The trained model is then deployed for tasks such as object detection, classification, or segmentation, with improved accuracy in real-world scenarios. The use of photorealistic images in the training set helps bridge the gap between synthetic and real-world data, reducing the need for extensive real-world labeling while maintaining high performance.

Claim 10

Original Legal Text

10. A computer-implemented method comprising: obtaining, from a memory, an input image patch; applying, by a processor, an encoder artificial neural network (ANN) to the input image patch, wherein the encoder ANN is configured to produce a feature vector from the input image patch, wherein the encoder ANN has been trained with a first plurality of domain training images such that an output image patch visually resembling the input image patch is configured to be generated from the feature vector; applying, by the processor, a generator ANN to the feature vector, wherein the generator ANN is configured to produce a generated image patch from the feature vector, wherein the generator ANN has been trained with feature vectors derived from the first plurality of domain training images and a second plurality of generative training images such that the generated image patch visually resembles the input image patch but is constructed of newly-generated image elements visually resembling one or more image patches from the second plurality of generative training images; and storing, in the memory, the generated image patch.

Plain English Translation

This invention relates to image processing using artificial neural networks (ANNs) to generate new image patches that visually resemble an input image while incorporating elements from a separate training dataset. The method addresses the challenge of creating visually coherent images that blend characteristics from different sources while maintaining structural similarity to the original input. The process begins by obtaining an input image patch from memory. An encoder ANN processes this patch to produce a feature vector, which captures essential visual characteristics. The encoder has been trained on a first set of domain-specific training images to ensure that the feature vector can later reconstruct an image resembling the input. A generator ANN then uses this feature vector to produce a new image patch. The generator is trained on two datasets: the same domain training images used for the encoder and an additional set of generative training images. This dual training enables the generator to create an output that visually resembles the input while incorporating newly generated elements from the generative dataset. The resulting image patch is stored in memory. This approach allows for the synthesis of images that combine structural fidelity to the input with stylistic or compositional elements from another source, useful in applications like artistic style transfer, texture synthesis, or data augmentation.

Claim 11

Original Legal Text

11. The computer-implemented method of claim 10 , wherein visual content of each of the second plurality of generative training images adhere to a common theme.

Plain English Translation

This invention relates to computer-implemented methods for generating and training machine learning models using synthetic images. The problem addressed is the need for large, diverse datasets to train generative models, which are often difficult or expensive to obtain. The solution involves creating a second set of generative training images, where each image in this set adheres to a common visual theme. This ensures consistency in the training data while still allowing for variation within the theme. The method leverages a generative model, such as a generative adversarial network (GAN), to produce these themed images. The model is trained on an initial set of images, and then used to generate the second set, where the generated images share a unified visual style or subject matter. This approach improves the quality and coherence of the generated content, making it more suitable for applications like data augmentation, artistic style transfer, or domain adaptation. The method ensures that the generated images are both diverse and thematically consistent, addressing limitations in traditional training datasets.

Claim 12

Original Legal Text

12. The computer-implemented method of claim 11 , wherein the common theme is one of flowers, eyes, stars, galaxies, skulls, numbers, cartoons, or sunsets.

Plain English Translation

This invention relates to a computer-implemented method for generating or analyzing visual content based on predefined themes. The method addresses the challenge of creating or identifying visual elements that align with specific thematic categories, such as flowers, eyes, stars, galaxies, skulls, numbers, cartoons, or sunsets. The approach involves processing visual data to detect or generate patterns, shapes, or features that correspond to these themes. For example, the method may analyze an image to identify structures resembling flowers or stars, or it may synthesize new visual content that adheres to a selected theme, such as generating cartoon-like elements or galaxy-inspired designs. The method may also include steps to refine or enhance the detected or generated visual elements to improve their thematic consistency. This technique is useful in applications like artistic design, image recognition, or content generation where thematic coherence is important. The invention ensures that the visual output aligns with the chosen theme, whether for creative purposes, automated analysis, or user customization.

Claim 13

Original Legal Text

13. The computer-implemented method of claim 10 , wherein each of the first plurality of domain training images contains a representation of a human face.

Plain English Translation

The invention relates to computer-implemented methods for training machine learning models, specifically in the domain of image recognition involving human faces. The method addresses the challenge of efficiently training models to recognize and process facial features from a dataset of training images. The system generates a first plurality of domain training images, each containing a representation of a human face, and a second plurality of domain training images, each containing a representation of a non-human object. The method then trains a machine learning model using these images to distinguish between human faces and non-human objects. The training process involves extracting features from the images, comparing these features to known patterns, and adjusting the model parameters to improve accuracy. The system may also include a preprocessing step to enhance image quality or normalize input data. The trained model can then be deployed for applications such as facial recognition, security systems, or user authentication. The invention improves upon existing methods by optimizing the training process for better performance in distinguishing between human faces and other objects.

Claim 14

Original Legal Text

14. The computer-implemented method of claim 10 , wherein the feature vector has between 16 and 2048 elements.

Plain English Translation

A computer-implemented method processes data by generating a feature vector with a configurable size between 16 and 2048 elements. This feature vector represents input data in a compact, numerical form suitable for machine learning or data analysis tasks. The method involves extracting relevant features from raw data, such as text, images, or sensor readings, and transforming them into a structured vector format. The vector's dimensionality is adjustable to balance computational efficiency and feature richness, allowing adaptation to different applications. For example, a smaller vector (e.g., 16 elements) may be used for real-time processing where speed is critical, while a larger vector (e.g., 2048 elements) may capture more nuanced patterns in complex datasets. The method ensures compatibility with various machine learning models by standardizing feature representation, enabling tasks like classification, regression, or clustering. The configurable size allows optimization for specific hardware constraints or performance requirements, making the approach versatile for edge devices, cloud computing, or embedded systems. The technique is particularly useful in applications requiring adaptive feature extraction, such as natural language processing, computer vision, or predictive analytics.

Claim 15

Original Legal Text

15. The computer-implemented method of claim 10 , wherein the input image patch is one of a set of input image patches cropped from an image such that the set of input image patches is configured to be combined to form 80% or more of the image.

Plain English Translation

This invention relates to image processing, specifically a method for analyzing an image by dividing it into multiple overlapping or non-overlapping patches. The method addresses the challenge of efficiently processing large images by breaking them down into smaller, manageable segments while ensuring comprehensive coverage. Each input image patch is processed individually, and the results are combined to reconstruct a significant portion of the original image—at least 80% of its content. This approach allows for parallel processing, reducing computational time and resource usage. The method is particularly useful in applications like object detection, image classification, or medical imaging, where detailed analysis of large images is required. By dividing the image into patches, the system can handle high-resolution data more efficiently while maintaining accuracy. The patches may be processed sequentially or in parallel, depending on the system's capabilities. The method ensures that the combined patches retain the necessary information to reconstruct the majority of the original image, minimizing data loss and improving processing efficiency.

Claim 16

Original Legal Text

16. The computer-implemented method of claim 15 , wherein size and location within the image of the input image patch is randomly selected.

Plain English Translation

This invention relates to computer vision and image processing, specifically techniques for analyzing or modifying image patches within a larger image. The problem addressed involves improving the robustness and flexibility of image processing algorithms by introducing randomness in the selection of image patches for analysis or manipulation. Traditional methods often rely on fixed or deterministic patch selection, which can lead to biases or limited adaptability in different image contexts. The invention describes a method where an input image patch is extracted from a larger image, and the size and location of this patch within the image are randomly selected. This random selection helps avoid overfitting to specific image regions or patterns, enhancing the generalizability of the processing technique. The method may be used in applications such as image classification, object detection, or generative modeling, where variability in patch selection can improve model performance or output diversity. The randomness can be constrained by predefined parameters, such as minimum and maximum patch sizes or allowed regions within the image, to ensure meaningful results while maintaining flexibility. This approach can be applied in training machine learning models, data augmentation, or real-time image analysis systems.

Claim 17

Original Legal Text

17. The computer-implemented method of claim 10 , wherein the input image patch is from a frame of a multi-frame video.

Plain English Translation

This invention relates to computer vision techniques for processing video frames. The problem addressed is efficiently analyzing and extracting features from video data, particularly when dealing with sequences of frames where temporal relationships between frames are important. The method involves selecting an input image patch from a frame of a multi-frame video. The input image patch is processed to extract features, which may include spatial and temporal characteristics. The extracted features are then used for further analysis, such as object detection, tracking, or video compression. The method may involve comparing the input image patch with other patches from the same or different frames to identify motion, changes, or patterns over time. The technique is designed to improve accuracy and efficiency in video processing tasks by leveraging both spatial and temporal information. The method may also include steps to reduce computational overhead, such as selecting key frames or regions of interest within the video. The approach is particularly useful in applications like surveillance, autonomous driving, and video streaming, where real-time or near-real-time processing is required. The invention ensures that the extracted features are robust and reliable, even in dynamic or noisy environments.

Claim 18

Original Legal Text

18. The computer-implemented method of claim 10 , wherein the first plurality of domain training images consists of photorealistic images.

Plain English Translation

The invention relates to a computer-implemented method for training machine learning models, specifically addressing the challenge of improving model performance by using photorealistic images in domain-specific training datasets. The method involves generating or selecting a first set of domain training images that are photorealistic, meaning they closely resemble real-world images in terms of visual fidelity and detail. These photorealistic images are used to train a machine learning model, enhancing its ability to generalize and perform accurately in real-world scenarios. The method may also involve a second set of training images, which could be synthetic or less photorealistic, to further refine the model's capabilities. The use of photorealistic images helps mitigate the domain gap between training data and real-world applications, improving the model's robustness and accuracy. This approach is particularly useful in applications where high visual fidelity is critical, such as autonomous vehicles, medical imaging, or augmented reality. The method ensures that the trained model can effectively process and interpret real-world visual data, leading to better performance in practical deployments.

Claim 19

Original Legal Text

19. A system comprising: a first plurality of domain training images; a second plurality of generative training images; an autoencoder including: an encoder artificial neural network (ANN) configured to receive an input image patch from an image of the first plurality of domain training images and produce a first feature vector therefrom, and a decoder ANN configured to receive the first feature vector and produce an output image patch therefrom, wherein the encoder ANN and the decoder ANN are trained based on a first loss function that calculates a first difference between the input image patch and the output image patch; a generative adversarial network including: a generator ANN configured to receive the first feature vector and produce a generated image patch from the first feature vector, and a discriminator ANN configured to receive the generated image patch and a particular generative training image of the second plurality of generative training images, and provide classifications thereof predicting whether the generated image patch belongs to the second plurality of generative training images, wherein the discriminator ANN is trained based on a second loss function that calculates a second difference between a classification of the generated image patch and a classification of the particular generative training image; and wherein the encoder ANN is also configured to receive the generated image patch and produce a second feature vector therefrom, and wherein the generator ANN is trained based on a third loss function that calculates a third difference between (i) the classification of the generated image patch and (ii) a fourth difference between the first feature vector and the second feature vector.

Plain English Translation

The system relates to image processing and machine learning, specifically for domain adaptation and image generation using neural networks. The problem addressed involves transforming images from one domain to another while preserving key features, a common challenge in tasks like style transfer, medical imaging, or satellite imagery analysis. The system uses a combination of an autoencoder and a generative adversarial network (GAN) to achieve this. The autoencoder includes an encoder artificial neural network (ANN) that processes input image patches from a first set of domain training images, producing feature vectors. A decoder ANN reconstructs the input image patches from these feature vectors, with training based on a loss function measuring the difference between input and output patches. The GAN includes a generator ANN that creates generated image patches from the encoder's feature vectors and a discriminator ANN that classifies these patches alongside a second set of generative training images, distinguishing between real and generated images. The discriminator is trained using a loss function comparing classifications of generated and real images. Additionally, the encoder processes the generated image patches to produce new feature vectors, and the generator is trained using a loss function that combines the discriminator's classification results and the difference between the original and new feature vectors. This ensures the generated images align with the target domain while maintaining structural consistency. The system enables high-quality domain adaptation by leveraging both reconstruction and adversarial learning.

Claim 20

Original Legal Text

20. The system of claim 19 , wherein the input image patch is one of a set of input image patches cropped from the image such that the set of input image patches is configured to be combined to form 80% or more of the image.

Plain English Translation

This invention relates to image processing systems designed to analyze or reconstruct images by dividing them into multiple overlapping or non-overlapping patches. The system addresses the challenge of efficiently processing large images by breaking them into smaller, manageable segments while ensuring minimal loss of information. The core innovation involves a method where an input image is divided into a set of image patches, which are then processed individually. The patches are configured such that when combined, they cover at least 80% of the original image, ensuring comprehensive coverage while allowing for parallel or distributed processing. This approach is particularly useful in applications like image recognition, reconstruction, or compression, where maintaining high fidelity is critical. The system may include additional features such as patch selection algorithms, overlap handling, or reconstruction techniques to optimize performance and accuracy. By focusing on patch-based processing, the invention enables scalable and efficient image analysis while preserving essential visual information.

Patent Metadata

Filing Date

Unknown

Publication Date

October 13, 2020

Inventors

Jason Salavon

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search