10956793

Content Tagging

PublishedMarch 23, 2021
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
20 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method of image tagging on a mobile device comprising: receiving, by one or more processors, a plurality of weights for a deep convolutional neural network (DCNN), wherein the plurality of weights are floating point weights compressed to weight indices; accessing image data for a first image; initiating, by the mobile device; processing of the image data using the DCNN executed by the one or more processors, the DCNN comprising at least a first subgraph and a second subgraph, the first subgraph comprising at least a first layer and a second layer; in response to initiating processing of the image data, decompressing a first set of weight indices; processing the image data using at least the first layer of the first subgraph to generate first intermediate output data and using a corresponding first set of floating point weights as decompressed from the weight indices; processing the first intermediate output data using at least the second layer of the first subgraph to generate first subgraph output data; and in response to determining that processing of each layer associated with the first intermediate output data is completed, causing the first intermediate data to be deleted from the mobile device.

Plain English Translation

Image processing and tagging on mobile devices. This invention addresses the challenge of performing complex image analysis, specifically deep convolutional neural network (DCNN) based image tagging, on resource-constrained mobile devices. The method involves receiving compressed floating-point weights for a DCNN, represented as weight indices. When processing an image, the mobile device accesses image data. The DCNN, which includes at least a first subgraph with a first and second layer, is executed by the device's processors. During processing, a first set of weight indices is decompressed to obtain the corresponding floating-point weights. The image data is processed by the first layer of the first subgraph using these decompressed weights, generating intermediate output data. This intermediate output data is then processed by the second layer of the first subgraph to produce subgraph output data. Crucially, once all layers associated with the first intermediate output data have been processed, this intermediate data is deleted from the mobile device to conserve memory.

Claim 2

Original Legal Text

2. The method of claim 1 , further comprising: capturing, using an image sensor of the mobile device, the first image; processing the first image as captured by the image sensor to generate a file comprising the image data at a first pixel resolution associated with a pixel height and a pixel width; and storing the file in a second memory of the mobile device.

Plain English translation pending...
Claim 3

Original Legal Text

3. The method of claim 2 wherein the first layer comprises a convolutional layer; and wherein processing the image data using at least the first layer of the first subgraph comprises convolving at least a first kernel with the image data, wherein the first kernel comprises a kernel pixel height less than the pixel height and a kernel pixel width less than the pixel width.

Plain English translation pending...
Claim 4

Original Legal Text

4. The method of claim 3 wherein the first intermediate output data comprises a plurality of matrixes, each matrix of the plurality of output matrixes generate by convolving an associated kernel of a plurality of kernels of the first layer with the image data; wherein the plurality of kernels comprises the first kernel.

Plain English translation pending...
Claim 5

Original Legal Text

5. The method of claim 2 further comprising: generating a plurality of output values from the DCNN, each output value associated with a corresponding tag; comparing each output value of the plurality of output values with a corresponding threshold; and assigning one or more tags to the first image based on the comparison of each output value associated with the corresponding tag to the corresponding threshold.

Plain English Translation

This invention relates to image tagging using deep convolutional neural networks (DCNNs). The problem addressed is the need for automated and accurate tagging of images based on their content, which is useful for image organization, search, and retrieval systems. The invention describes a method for processing an image using a DCNN to generate output values, each associated with a specific tag. These output values are compared against predefined thresholds to determine whether to assign the corresponding tags to the image. The DCNN is trained to recognize features in the image that correlate with the tags, allowing for automated classification. The method ensures that only tags meeting or exceeding their respective thresholds are assigned, improving the relevance and accuracy of the tagging process. This approach enhances the efficiency of image indexing and retrieval in applications such as digital libraries, social media platforms, and content management systems. The invention builds on prior techniques by incorporating threshold-based tag assignment to reduce false positives and improve tagging precision.

Claim 6

Original Legal Text

6. The method of claim 5 further comprising: capturing a plurality of metadata associated with the first image; processing the assigned one or more tags and the plurality of metadata using a natural language processor to generate a set of extended visual search tags.

Plain English Translation

This invention relates to image processing and search optimization, specifically addressing the challenge of improving visual search accuracy by enhancing tagging with contextual metadata. The method involves capturing metadata associated with an image, such as timestamps, geolocation, device settings, or user annotations, alongside manually or automatically assigned tags. A natural language processor analyzes both the tags and metadata to generate extended visual search tags, which provide richer, context-aware descriptors for the image. This enhances searchability by linking visual content with contextual information, improving retrieval precision in databases or search engines. The approach leverages metadata to refine tagging, ensuring more accurate and relevant search results. The system may integrate with existing image management platforms, allowing users to input or select initial tags, while the processor dynamically expands these with derived metadata-based tags. This method is particularly useful in applications requiring detailed image categorization, such as digital asset management, social media platforms, or medical imaging, where contextual details significantly impact search outcomes. The invention automates the enrichment of image metadata, reducing manual effort while improving search efficiency.

Claim 7

Original Legal Text

7. The method of claim 6 further comprising storing the first image and the set of visual search tags with a plurality of images, each having associated extended visual search tags, in a memory of the mobile device.

Plain English translation pending...
Claim 8

Original Legal Text

8. The method of claim 7 further comprising: receiving, via an input device, a first search term; and generating a set of search results by comparing the first search term with the associated extended visual search tags in the memory of the mobile device.

Plain English translation pending...
Claim 9

Original Legal Text

9. The method of claim 1 further comprising: processing the first subgraph output data using a first layer of the second subgraph to generate second intermediate data; processing the second intermediate data using a second layer of the second subgraph to generate second subgraph output data; and in response to determining that processing of each layer associated with the second intermediate data is completed, causing the second intermediate data to be deleted from the mobile device.

Plain English translation pending...
Claim 10

Original Legal Text

10. The method of claim 9 further comprising: processing the second subgraph output data using a fully connected layer converted to a convolutional layer to generate a dense prediction score map.

Plain English translation pending...
Claim 11

Original Legal Text

11. The method of claim 10 wherein the fully connected layer generates the dense prediction score map using an associated output from each convolution layer.

Plain English Translation

The invention relates to a machine learning system for generating dense prediction score maps, particularly in the context of image or signal processing tasks such as object detection, segmentation, or classification. The system addresses the challenge of efficiently combining multi-scale feature representations from convolutional neural networks (CNNs) to produce high-resolution output maps while maintaining computational efficiency. The method involves a neural network architecture where multiple convolutional layers extract hierarchical features from input data. Each convolutional layer processes the input at different scales, capturing both low-level and high-level features. The outputs of these convolutional layers are then aggregated and fed into a fully connected layer. The fully connected layer integrates these multi-scale features to generate a dense prediction score map, which assigns a score or probability to each spatial location in the input data, indicating the presence or likelihood of a specific feature or object. The system ensures that the dense prediction score map retains fine-grained spatial details by leveraging the outputs of each convolutional layer, rather than relying solely on the final layer. This approach improves accuracy in tasks requiring precise localization, such as semantic segmentation or object detection. The method also optimizes computational efficiency by avoiding redundant feature extraction, as the same convolutional outputs are reused for generating the final prediction. The architecture is adaptable to various applications, including medical imaging, autonomous driving, and remote sensing, where high-resolution predictions are critical.

Claim 12

Original Legal Text

12. The method of claim 10 further comprising subsampling the dense prediction score map using a max-pooling operating to generate a plurality of output recognition scores, each output recognition score associated with one or more corresponding tags.

Plain English translation pending...
Claim 13

Original Legal Text

13. The method of claim 1 wherein the plurality of weights are 16 bit weights.

Plain English Translation

A system and method for optimizing neural network computations involves using a plurality of weights in a neural network, where the weights are quantized to 16-bit precision. This approach reduces computational complexity and memory requirements while maintaining model accuracy. The neural network processes input data through multiple layers, each applying weighted transformations to the data. The 16-bit quantization of weights allows for efficient storage and faster processing, particularly on hardware with limited precision support. The method may include techniques for converting higher-precision weights to 16-bit values, such as rounding or clipping, to balance accuracy and efficiency. The system may also incorporate dynamic adjustment of weight precision based on performance metrics, ensuring optimal trade-offs between speed and accuracy. This technique is particularly useful in edge computing and embedded systems where computational resources are constrained. The method may further include error correction mechanisms to mitigate the impact of quantization on model performance. By using 16-bit weights, the system achieves a balance between computational efficiency and model accuracy, making it suitable for real-time applications.

Claim 14

Original Legal Text

14. The device of claim 1 wherein the one or more processors are further configured to comprising: generate a plurality of output values from the DCNN, each output value associated with a corresponding tag; compare each output value of the plurality of output values with a corresponding threshold; and assign one or more tags to the first image based on the comparison of each output value associated with the corresponding tag to the corresponding threshold; capture a plurality of metadata associated with the first image; process the assigned one or more tags and the plurality of metadata using a natural language processor to generate a set of extended visual search tags; store the first image and the set of visual search tags with a plurality of images, each having associated extended visual search tags, in a memory of the mobile device; receive, from an input device of the mobile device, a first search term; and generate a set of search results by comparing the first search term with the associated extended visual search tags in the memory of the mobile device.

Plain English translation pending...
Claim 15

Original Legal Text

15. The method of claim 1 wherein the weight indices are each represented by 8 bits or more and the floating point weights are each represented by 32 or more bits.

Plain English translation pending...
Claim 16

Original Legal Text

16. A mobile device for image tagging comprising: a memory; an image sensor coupled to the memory; and one or more processors coupled to the memory and configured to: receive, by one or more processors of the mobile device, a plurality of weights for a deep convolutional neural network (DCNN), wherein the plurality of weights are floating point weights compressed to weight indices; access image data for a first image; initiate processing of the image data using the DCNN executed by the one or more processors, the DCNN comprising at least a first subgraph and a second subgraph, the first subgraph comprising at least a first layer and a second layer; in response to initiating processing of the image data, decompressing a first set of weight indices; process the image data using at least the first layer of the first subgraph to generate first intermediate output data and using a corresponding first set of floating point weights as decompressed from the weight indices; process the first intermediate output data using at least the second layer of the first subgraph to generate first subgraph output data; and in response to a determination that each layer reliant on the first intermediate output data have completed processing, immediately delete the first intermediate data from the mobile device.

Plain English translation pending...
Claim 17

Original Legal Text

17. The device of claim 16 wherein the one or more processors are further configured to: further comprising: process the first image as captured by the image sensor to generate a file comprising the image data at a first pixel resolution associated with a pixel height and a pixel width; and store the file in a second memory of the mobile device; wherein the first layer comprises a convolutional layer; and wherein processing the image data using at least the first layer of the first subgraph comprises convolving at least a first kernel with the image data, wherein the first kernel comprises a kernel pixel height less than the pixel height and a kernel pixel width less than the pixel width; and wherein the first intermediate output data comprises a plurality of matrixes; each matrix of the plurality of output matrixes generate by convolving an associated kernel of a plurality of kernels of the first layer with the image data.

Plain English translation pending...
Claim 18

Original Legal Text

18. A non-transitory storage medium comprising instructions that, when executed by one or more processors of a mobile device, cause the mobile device to perform operations for local image tagging, the operations comprising: receive, by one or more processors of the mobile device, a plurality of weights for a deep convolutional neural network (DCNN), wherein the plurality of weights are floating point weights compressed to weight indices; capturing, via an image sensor of the mobile device, the first image; processing the first image as captured by the image sensor to generate a file comprising the image data at a first pixel resolution associated with a pixel height and a pixel width; and storing the file in a second memory of the mobile device, accessing image data for a first image; decompressing, by the mobile device, a first set of weight indices in response to initiation of the processing of the image data; initiating processing of the image data using the DCNN executed by the one or more processors, the DCNN comprising at least a first subgraph and a second subgraph, the first subgraph comprising at least a first layer and a second layer; processing, and using a corresponding first set of floating point weights as decompressed from the weight indices, the image data using at least the first layer of the first subgraph to generate first intermediate output data; processing the first intermediate output data using at least the second layer of the first subgraph to generate first subgraph output data; and in response to a determination that each layer reliant on the first intermediate output data have completed processing, immediately deleting the first intermediate data from the mobile device.

Plain English translation pending...
Claim 19

Original Legal Text

19. The non-transitory storage medium of claim 18 wherein the instructions further cause the device to perform operations comprising: processing, by the mobile device, the first subgraph output data using a first layer of the second subgraph to generate second intermediate data; processing, by the mobile device, the second intermediate data using a second layer of the second subgraph to generate second subgraph output data; and in response to determining that processing of each layer associated with the second intermediate data is completed, causing the second intermediate data to be deleted from the mobile device; and processing the second subgraph output data using a fully connected layer converted to a convolutional layer to generate a dense prediction score map.

Plain English translation pending...
Claim 20

Original Legal Text

20. The non-transitory storage medium of claim 19 wherein the instructions further cause the device to perform operations comprising subsampling the dense prediction score map using a max-pooling operating to generate a plurality of output recognition scores, each output recognition score associated with one or more corresponding tags.

Plain English translation pending...
Patent Metadata

Filing Date

Unknown

Publication Date

March 23, 2021

Inventors

Xiaoyu Wang
Ning Xu
Ning Zhang
Vitor R. Carvalho
Jia Li

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “CONTENT TAGGING” (10956793). https://patentable.app/patents/10956793

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10956793. See llms.txt for full attribution policy.