US-10783400

Generating searchable text for documents portrayed in a repository of digital images utilizing orientation and text prediction neural networks

PublishedSeptember 22, 2020

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

The present disclosure relates to generating computer searchable text from digital images that depict documents utilizing an orientation neural network and/or text prediction neural network. For example, one or more embodiments detect digital images that depict documents, identify the orientation of the depicted documents, and generate computer searchable text from the depicted documents in the detected digital images. In particular, one or more embodiments train an orientation neural network to identify the orientation of a depicted document in a digital image. Additionally, one or more embodiments train a text prediction neural network to analyze a depicted document in a digital image to generate computer searchable text from the depicted document. By utilizing the identified orientation of the depicted document before analyzing the depicted document with a text prediction neural network, the disclosed systems can efficiently and accurately generate computer searchable text for a digital image that depicts a document.

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method, comprising: identifying a digital image comprising a depiction of a document; utilizing an orientation neural network to detect an orientation of the document within the digital image; cropping a word box from the digital image, wherein the word box comprises a portion of the depiction of the document; and utilizing a text prediction neural network trained with synthetic training data to generate computer searchable text for the portion of the depiction of the document based on the word box and the detected orientation of the document.

2. The method as recited in claim 1 , wherein identifying the digital image comprising the depiction of the document comprises analyzing each digital image in a repository of digital images utilizing a document detection neural network trained to identify digital images portraying documents comprising text.

3. The method as recited in claim 1 , further comprising training the orientation neural network by: analyzing a training document utilizing the orientation neural network to predict an orientation of the training document; and comparing the predicted orientation of the training document with a ground truth orientation of the training document.

4. The method as recited in claim 3 , further comprising generating the training document by: identifying an initial document at a known orientation; and rotating the initial document to generate the training document and the ground truth orientation of the training document.

5. The method as recited in claim 1 , further comprising training the text prediction neural network with the synthetic training data by: receiving the synthetic training data, wherein the synthetic training data comprises a synthetic training digital image comprising a ground truth text label corresponding to the synthetic training digital image; utilizing the text prediction neural network on the synthetic training digital image to predict text depicted by the synthetic training digital image; and comparing the predicted text depicted by the synthetic training digital image with the ground truth text label corresponding to the synthetic training digital image.

6. The method as recited in claim 5 , further comprising generating the synthetic training digital image by: identifying a corpus of words and a set of fonts; selecting a word from the corpus of words and a font from the set of fonts; applying the font to the word to generate a modified word; and generating the synthetic training digital image such that the synthetic training digital image portrays the modified word.

7. The method as recited in claim 6 , further comprising generating the synthetic training digital image by: identifying a set of distortions; selecting a distortion from the set of distortions; and applying the distortion to the word to generate the modified word.

8. The method as recited in claim 6 , wherein the corpus of words and the synthetic training digital image corresponds to a first language and further comprising generating an additional synthetic training digital image corresponding to an additional language by utilizing an additional corpus of words corresponding to the additional language.

9. The method as recited in claim 1 , further comprising indexing the digital image by associating a token with the digital image, the token comprising the computer searchable text.

10. The method as recited in claim 1 , further comprising: utilizing the computer searchable text to identify a document category corresponding to the digital image comprising the depiction of the document; and providing the digital image to a user associated with the document category.

11. A non-transitory computer readable storage medium comprising instructions that, when executed by at least one processor, cause a computing device to: identify a digital image comprising a depiction of a document; utilize an orientation neural network to detect an orientation of the document within the digital image; crop a word box from the digital image, wherein the word box comprises a portion of the depiction of the document; and utilize a text prediction neural network trained with synthetic training data to generate computer searchable text for the portion of the depiction of the document based on the word box and the detected orientation of the document.

12. The non-transitory computer readable storage medium of claim 11 , wherein the instructions, when executed by the at least one processor, cause the computing device to identify the digital image comprising the depiction of the document by analyzing each digital image in a repository of digital images utilizing a document detection neural network trained to identify digital images portraying documents comprising text.

13. The non-transitory computer readable storage medium of claim 11 , further comprising instructions that, when executed by the at least one processor, cause the computing device to train the orientation neural network by: analyzing a training document utilizing the orientation neural network to predict an orientation of the training document; and comparing the predicted orientation of the training document with a ground truth orientation of the training document.

14. The non-transitory computer readable storage medium of claim 13 , further comprising instructions that, when executed by the at least one processor, cause the computing device to generate the training document by: identifying an initial document at a known orientation; and rotating the initial document to generate the training document and the ground truth orientation of the training document.

15. The non-transitory computer readable storage medium of claim 11 , further comprising instructions that, when executed by the at least one processor, cause the computing device to train the text prediction neural network with the synthetic training data by: receiving the synthetic training data, wherein the synthetic training data comprises a synthetic training digital image comprising a ground truth text label corresponding to the synthetic training digital image; utilizing the text prediction neural network on the synthetic training digital image to predict text depicted by the synthetic training digital image; and comparing the predicted text depicted by the synthetic training digital image with the ground truth text label corresponding to the synthetic training digital image.

16. The non-transitory computer readable storage medium of claim 15 , further comprising instructions that, when executed by the at least one processor, cause the computing device to generate the synthetic training digital image by: identify a corpus of words, a set of fonts, and a set of distortions; selecting a word from the corpus of words, a font from the set of fonts, and a distortion from the set of distortions; applying the font and the distortion to the word to generate a modified word; and generating the synthetic training digital image such that the synthetic training digital image portrays the modified word.

17. A system comprising: at least one processor; and a non-transitory computer readable storage medium comprising instructions that, when executed by the at least one processor, cause the system to: identify a digital image comprising a depiction of a document; utilize an orientation neural network to detect an orientation of the document within the digital image; crop a word box from the digital image, wherein the word box comprises a portion of the depiction of the document; and utilize a text prediction neural network trained with synthetic training data to generate computer searchable text for the portion of the depiction of the document based on the word box and the detected orientation of the document.

18. The system of claim 17 , further comprising instructions that, when executed by the at least one processor, cause the system to train the orientation neural network by: analyzing a training document utilizing the orientation neural network to predict an orientation of the training document; and comparing the predicted orientation of the training document with a ground truth orientation of the training document.

19. The system of claim 17 , further comprising instructions that, when executed by the at least one processor, cause the system to train the text prediction neural network with the synthetic training data by: receiving the synthetic training data, wherein the synthetic training data comprises a synthetic training digital image comprising a ground truth text label corresponding to the synthetic training digital image; utilizing the text prediction neural network on the synthetic training digital image to predict text depicted by the synthetic training digital image; and comparing the predicted text depicted by the synthetic training digital image with the ground truth text label corresponding to the synthetic training digital image.

20. The system of claim 19 , further comprising instructions that, when executed by the at least one processor, cause the system to generate the synthetic training digital image by: identifying a corpus of words, a set of fonts, and a set of distortions; selecting a word from the corpus of words, a font from the set of fonts, and a distortion from the set of distortions; applying the font and the distortion to the word to generate a modified word; and generating the synthetic training digital image such that the synthetic training digital image portrays the modified word.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06V G06F

Patent Metadata

Filing Date

December 18, 2018

Publication Date

September 22, 2020

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search