Content Tagging

PublishedMarch 23, 2021

Assigneenot available in USPTO data we have

InventorsXiaoyu Wang Ning Xu Ning Zhang Vitor R. Carvalho Jia Li

Technical Abstract

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method of image tagging on a mobile device comprising: receiving, by one or more processors, a plurality of weights for a deep convolutional neural network (DCNN), wherein the plurality of weights are floating point weights compressed to weight indices; accessing image data for a first image; initiating, by the mobile device; processing of the image data using the DCNN executed by the one or more processors, the DCNN comprising at least a first subgraph and a second subgraph, the first subgraph comprising at least a first layer and a second layer; in response to initiating processing of the image data, decompressing a first set of weight indices; processing the image data using at least the first layer of the first subgraph to generate first intermediate output data and using a corresponding first set of floating point weights as decompressed from the weight indices; processing the first intermediate output data using at least the second layer of the first subgraph to generate first subgraph output data; and in response to determining that processing of each layer associated with the first intermediate output data is completed, causing the first intermediate data to be deleted from the mobile device.

2. The method of claim 1 , further comprising: capturing, using an image sensor of the mobile device, the first image; processing the first image as captured by the image sensor to generate a file comprising the image data at a first pixel resolution associated with a pixel height and a pixel width; and storing the file in a second memory of the mobile device.

3. The method of claim 2 wherein the first layer comprises a convolutional layer; and wherein processing the image data using at least the first layer of the first subgraph comprises convolving at least a first kernel with the image data, wherein the first kernel comprises a kernel pixel height less than the pixel height and a kernel pixel width less than the pixel width.

4. The method of claim 3 wherein the first intermediate output data comprises a plurality of matrixes, each matrix of the plurality of output matrixes generate by convolving an associated kernel of a plurality of kernels of the first layer with the image data; wherein the plurality of kernels comprises the first kernel.

5. The method of claim 2 further comprising: generating a plurality of output values from the DCNN, each output value associated with a corresponding tag; comparing each output value of the plurality of output values with a corresponding threshold; and assigning one or more tags to the first image based on the comparison of each output value associated with the corresponding tag to the corresponding threshold.

6. The method of claim 5 further comprising: capturing a plurality of metadata associated with the first image; processing the assigned one or more tags and the plurality of metadata using a natural language processor to generate a set of extended visual search tags.

7. The method of claim 6 further comprising storing the first image and the set of visual search tags with a plurality of images, each having associated extended visual search tags, in a memory of the mobile device.

8. The method of claim 7 further comprising: receiving, via an input device, a first search term; and generating a set of search results by comparing the first search term with the associated extended visual search tags in the memory of the mobile device.

9. The method of claim 1 further comprising: processing the first subgraph output data using a first layer of the second subgraph to generate second intermediate data; processing the second intermediate data using a second layer of the second subgraph to generate second subgraph output data; and in response to determining that processing of each layer associated with the second intermediate data is completed, causing the second intermediate data to be deleted from the mobile device.

10. The method of claim 9 further comprising: processing the second subgraph output data using a fully connected layer converted to a convolutional layer to generate a dense prediction score map.

11. The method of claim 10 wherein the fully connected layer generates the dense prediction score map using an associated output from each convolution layer.

12. The method of claim 10 further comprising subsampling the dense prediction score map using a max-pooling operating to generate a plurality of output recognition scores, each output recognition score associated with one or more corresponding tags.

13. The method of claim 1 wherein the plurality of weights are 16 bit weights.

14. The device of claim 1 wherein the one or more processors are further configured to comprising: generate a plurality of output values from the DCNN, each output value associated with a corresponding tag; compare each output value of the plurality of output values with a corresponding threshold; and assign one or more tags to the first image based on the comparison of each output value associated with the corresponding tag to the corresponding threshold; capture a plurality of metadata associated with the first image; process the assigned one or more tags and the plurality of metadata using a natural language processor to generate a set of extended visual search tags; store the first image and the set of visual search tags with a plurality of images, each having associated extended visual search tags, in a memory of the mobile device; receive, from an input device of the mobile device, a first search term; and generate a set of search results by comparing the first search term with the associated extended visual search tags in the memory of the mobile device.

15. The method of claim 1 wherein the weight indices are each represented by 8 bits or more and the floating point weights are each represented by 32 or more bits.

16. A mobile device for image tagging comprising: a memory; an image sensor coupled to the memory; and one or more processors coupled to the memory and configured to: receive, by one or more processors of the mobile device, a plurality of weights for a deep convolutional neural network (DCNN), wherein the plurality of weights are floating point weights compressed to weight indices; access image data for a first image; initiate processing of the image data using the DCNN executed by the one or more processors, the DCNN comprising at least a first subgraph and a second subgraph, the first subgraph comprising at least a first layer and a second layer; in response to initiating processing of the image data, decompressing a first set of weight indices; process the image data using at least the first layer of the first subgraph to generate first intermediate output data and using a corresponding first set of floating point weights as decompressed from the weight indices; process the first intermediate output data using at least the second layer of the first subgraph to generate first subgraph output data; and in response to a determination that each layer reliant on the first intermediate output data have completed processing, immediately delete the first intermediate data from the mobile device.

17. The device of claim 16 wherein the one or more processors are further configured to: further comprising: process the first image as captured by the image sensor to generate a file comprising the image data at a first pixel resolution associated with a pixel height and a pixel width; and store the file in a second memory of the mobile device; wherein the first layer comprises a convolutional layer; and wherein processing the image data using at least the first layer of the first subgraph comprises convolving at least a first kernel with the image data, wherein the first kernel comprises a kernel pixel height less than the pixel height and a kernel pixel width less than the pixel width; and wherein the first intermediate output data comprises a plurality of matrixes; each matrix of the plurality of output matrixes generate by convolving an associated kernel of a plurality of kernels of the first layer with the image data.

18. A non-transitory storage medium comprising instructions that, when executed by one or more processors of a mobile device, cause the mobile device to perform operations for local image tagging, the operations comprising: receive, by one or more processors of the mobile device, a plurality of weights for a deep convolutional neural network (DCNN), wherein the plurality of weights are floating point weights compressed to weight indices; capturing, via an image sensor of the mobile device, the first image; processing the first image as captured by the image sensor to generate a file comprising the image data at a first pixel resolution associated with a pixel height and a pixel width; and storing the file in a second memory of the mobile device, accessing image data for a first image; decompressing, by the mobile device, a first set of weight indices in response to initiation of the processing of the image data; initiating processing of the image data using the DCNN executed by the one or more processors, the DCNN comprising at least a first subgraph and a second subgraph, the first subgraph comprising at least a first layer and a second layer; processing, and using a corresponding first set of floating point weights as decompressed from the weight indices, the image data using at least the first layer of the first subgraph to generate first intermediate output data; processing the first intermediate output data using at least the second layer of the first subgraph to generate first subgraph output data; and in response to a determination that each layer reliant on the first intermediate output data have completed processing, immediately deleting the first intermediate data from the mobile device.

19. The non-transitory storage medium of claim 18 wherein the instructions further cause the device to perform operations comprising: processing, by the mobile device, the first subgraph output data using a first layer of the second subgraph to generate second intermediate data; processing, by the mobile device, the second intermediate data using a second layer of the second subgraph to generate second subgraph output data; and in response to determining that processing of each layer associated with the second intermediate data is completed, causing the second intermediate data to be deleted from the mobile device; and processing the second subgraph output data using a fully connected layer converted to a convolutional layer to generate a dense prediction score map.

20. The non-transitory storage medium of claim 19 wherein the instructions further cause the device to perform operations comprising subsampling the dense prediction score map using a max-pooling operating to generate a plurality of output recognition scores, each output recognition score associated with one or more corresponding tags.

Patent Metadata

Filing Date

Unknown

Publication Date

March 23, 2021

Inventors

Xiaoyu Wang

Ning Xu

Ning Zhang

Vitor R. Carvalho

Jia Li

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search