Training visual language grounding models using separation loss

PublishedApril 9, 2024

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

An example system includes a processor to receive, a randomly generated alpha-map, a pair of training images, and a pair of training texts associated with the pair of training images. The processor is to generate a blended image based on the randomly generated alpha-map and the pair of training images. The processor is to train a visual language grounding model to separate the blended image into a pair of heatmaps identifying portions of the blended image corresponding to each of the training images using a separation loss.

Patent Claims

14 claims

Legal claims defining the scope of protection, as filed with the USPTO.

2. The system of claim 1, wherein the training texts comprise natural free-form texts.

3. The system of claim 1, wherein the visual language grounding model comprises an encoder to generate image encodings based on the alpha-map and the pair of training images, a text conditioner to generate a plurality of text attenuated image encodings based on the image encodings and the pair of training texts, and a decoder to convert the text attenuated image encodings into heatmaps.

4. The system of claim 3, wherein the text conditioner comprises a Bidirectional Encoder Representations from Transformers (BERT) model.

5. The system of claim 4, wherein the text conditioner comprises a plurality of projection modules coupled to the BERT model.

6. The system of claim 1, wherein the visual language grounding model is trained using an unconditioned adversary loss.

7. The system of claim 1, comprising a separately trained detector-based weak supervised grounding network, wherein the separately trained detector-based WSG network is to generate bounding boxes scores based on a received image and the trained visual language grounding model is to generate a first heatmap based on the received image, wherein the bounding box scores are converted to a second heatmap using assignment of the bounding box scores to pixels of the bounding box, and wherein the first heatmap and the second heatmap are averaged to generate a combined heatmap.

10. The computer-implemented method of claim 8, wherein training the visual language grounding model comprises calculating a separation loss for each of the pair of training images as a main training objective.

11. The computer-implemented method of claim 10, wherein training the visual language grounding model comprises calculating an image-to-text loss for text and image feature distribution alignment.

12. The computer-implemented method of claim 8, wherein training the visual language grounding model comprises calculating a negative texts loss based on a third received training text that is unrelated to the pair of training images.

13. The computer-implemented method of claim 8, wherein training the visual language grounding model comprises calculating an unconditioned adversary loss to decrease overfitting on artifacts.

17. The computer program product of claim 15, further comprising program code executable by the processor to calculate a separation loss for each of the pair of training images as a main training objective.

18. The computer program product of claim 15, further comprising program code executable by the processor to calculate an image-to-text loss for text and image feature distribution alignment.

19. The computer program product of claim 15, further comprising program code executable by the processor to calculate a negative texts loss based on a third received training text that is unrelated to the pair of training images.

20. The computer program product of claim 15, further comprising program code executable by the processor to calculate an unconditioned adversary loss to decrease overfitting on artifacts.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06F G06N G06T

Patent Metadata

Filing Date

August 26, 2021

Publication Date

April 9, 2024

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search