Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A computing system configured to perform image retrieval, the computing system comprising: one or more processors; and one or more non-transitory computer-readable media that collectively store: a machine-learned image descriptor model configured to determine a set of keypoint descriptors for an input image, the machine-learned image descriptor model comprising: a feature extraction portion configured to extract a plurality of local feature descriptors from the input image; and an attention portion configured to determine a plurality of attention scores respectively for the plurality of local feature descriptors; and instructions that, when executed by the one or more processors, cause the computing system to perform operations, the operations comprising: obtaining a query image; processing the query image using the feature extraction portion of the machine-learned image descriptor model to obtain, as an output of the feature extraction portion, a first plurality of local feature descriptors for the query image; processing the first plurality of local feature descriptors for the query image using the attention portion of the machine-learned image descriptor model to obtain, as an output of the attention portion, a first plurality of attention scores respectively for the first plurality of local feature descriptors; and determining a first set of keypoint descriptors for the query image based at least in part on the first plurality of attention scores and the first plurality of local feature descriptors, wherein the first set of keypoint descriptors correspond to a subset of the first plurality of local features.
Image retrieval systems. This invention addresses the problem of generating robust image descriptors for accurate image retrieval. A computing system is described that includes processors and computer-readable media storing a machine-learned image descriptor model. This model has two main parts: a feature extraction portion and an attention portion. The feature extraction portion is designed to identify and describe local features within an input image, producing a set of local feature descriptors. The attention portion then takes these local feature descriptors and calculates attention scores for each one. These attention scores indicate the relative importance or relevance of each local feature descriptor. The system performs operations that begin by obtaining a query image. This query image is first processed by the feature extraction portion to generate a set of local feature descriptors. Next, these local feature descriptors are fed into the attention portion to compute corresponding attention scores. Finally, a set of keypoint descriptors for the query image is determined. This determination relies on both the calculated attention scores and the local feature descriptors, specifically selecting a subset of the local features based on the attention information. This refined set of keypoint descriptors is used for image retrieval.
2. The computing system of claim 1 , wherein the operations further comprise: comparing the first set of keypoint descriptors for the query image to a plurality of reference sets of keypoint descriptors respectively associated with a plurality of database images, wherein the reference set of keypoint descriptors for each database image was generated based at least in part on respective reference local feature descriptors and reference attention scores respectively produced for such database image by the feature extraction portion and the attention portion of the machine-learned image descriptor model, and wherein the reference set of keypoint descriptors for each database image corresponds to a subset of the respective reference local feature descriptors determined for such database image.
The invention relates to a computing system for image matching or retrieval using machine-learned models that generate keypoint descriptors. The system addresses the challenge of efficiently and accurately comparing query images to a database of reference images by leveraging attention mechanisms to enhance feature extraction. The system includes a machine-learned image descriptor model with a feature extraction portion and an attention portion. The feature extraction portion generates local feature descriptors for an image, while the attention portion produces attention scores that highlight important regions. The system processes a query image by extracting keypoint descriptors, which are subsets of the local feature descriptors, weighted by the attention scores. These keypoint descriptors are then compared to reference sets of keypoint descriptors associated with database images. Each reference set is derived from local feature descriptors and attention scores generated by the same model for the database images. The comparison step identifies matches between the query image and database images based on the keypoint descriptors, improving accuracy by focusing on salient features. This approach enhances image retrieval performance by prioritizing relevant features through attention mechanisms.
3. The computing system of claim 2 , wherein the operations further comprise: identifying one or more matching images based at least in part on said comparing the first set of keypoint descriptors for the query image to the plurality of reference sets of keypoint descriptors respectively associated with the plurality of database images; retrieving the one or more matching images; and providing the one or more matching images as a result responsive to the query image.
This invention relates to a computing system for image matching and retrieval. The system addresses the challenge of efficiently identifying and retrieving visually similar images from a database based on a query image. The system processes a query image by extracting a first set of keypoint descriptors, which are distinctive features of the image. These descriptors are then compared to a plurality of reference sets of keypoint descriptors, each associated with a different database image. The comparison identifies one or more matching images that closely resemble the query image. The system retrieves these matching images and provides them as a result, enabling users to find visually similar images in a database. The keypoint descriptors may be generated using techniques such as scale-invariant feature transform (SIFT), speeded-up robust features (SURF), or other feature extraction methods. The system enhances image retrieval accuracy by leveraging keypoint-based comparisons, which are robust to variations in scale, rotation, and illumination. This approach improves the efficiency and effectiveness of image search applications, such as visual search engines, content-based image retrieval systems, and automated image categorization tools.
4. The computing system of claim 2 , wherein the reference set of keypoint descriptors for each database image comprises a predefined number of the respective local feature descriptors produced for such database image with the highest attention scores.
The invention relates to a computing system for image retrieval or matching, addressing the challenge of efficiently and accurately identifying relevant images from a database based on visual similarity. The system processes database images to generate local feature descriptors, which are numerical representations of distinctive keypoints within each image. These descriptors are then evaluated using an attention mechanism that assigns attention scores to each descriptor, indicating their relative importance or relevance for distinguishing the image. The system selects a predefined number of descriptors with the highest attention scores to form a compact yet informative reference set for each database image. This reference set is stored and later used to compare against query images during retrieval, improving efficiency by reducing computational overhead while maintaining accuracy. The attention-based selection ensures that the most discriminative features are retained, enhancing the system's ability to match visually similar images. The invention may be applied in applications such as image search, object recognition, or content-based image retrieval.
5. The computing system of claim 2 , wherein said comparing the first set of keypoint descriptors for the query image to the plurality of reference sets of keypoint descriptors respectively associated with the plurality of database images comprises performing a nearest neighbors search.
This invention relates to image recognition systems that use keypoint descriptors for matching query images to database images. The problem addressed is efficiently and accurately comparing keypoint descriptors to identify similar images in large databases. The system extracts keypoint descriptors from a query image and compares them to reference sets of keypoint descriptors associated with database images. The comparison involves performing a nearest neighbors search to find the closest matches between the query image's keypoint descriptors and the reference sets. This approach improves the accuracy and speed of image retrieval by leveraging efficient search algorithms to identify the most similar images in the database. The system may also include preprocessing steps to enhance the keypoint descriptors before comparison, ensuring robust matching even under varying conditions. The nearest neighbors search optimizes the matching process by reducing computational overhead while maintaining high precision in identifying relevant database images. This method is particularly useful in applications requiring fast and accurate image retrieval, such as visual search engines, object recognition, and augmented reality systems.
6. The computing system of claim 2 , wherein said comparing the first set of keypoint descriptors for the query image to the plurality of reference sets of keypoint descriptors respectively associated with the plurality of database images comprises comparing the first set of keypoint descriptors for the query image to a plurality of reduced-dimensionality reference sets of keypoint descriptors respectively associated with the plurality of database images.
This invention relates to image search and retrieval systems, specifically improving efficiency in matching query images to a database of reference images using keypoint descriptors. The problem addressed is the computational cost of comparing high-dimensional keypoint descriptors between a query image and a large database of reference images, which can be impractical for real-time applications. The system processes a query image by extracting a first set of keypoint descriptors, which are compact numerical representations of distinctive features in the image. These descriptors are then compared to a plurality of reference sets of keypoint descriptors associated with database images. To reduce computational complexity, the reference sets are pre-processed into reduced-dimensionality versions, meaning they are compressed or simplified while retaining essential discriminative information. This dimensionality reduction allows faster and more efficient similarity comparisons between the query descriptors and the reference descriptors. The comparison process involves matching the query image's keypoint descriptors against these reduced-dimensionality reference sets, enabling rapid identification of similar images in the database. This approach balances accuracy and speed, making it suitable for applications requiring quick image retrieval, such as visual search engines, augmented reality, or automated image tagging. The dimensionality reduction step ensures that the system remains scalable even with large databases.
7. The computing system of claim 2 , further comprising a database that stores the plurality of reference sets of keypoint descriptors respectively associated with the plurality of database images.
The invention relates to a computing system for image recognition or matching, addressing the challenge of efficiently comparing and identifying images based on keypoint descriptors. The system includes a database that stores multiple reference sets of keypoint descriptors, each set corresponding to a different database image. These keypoint descriptors are numerical representations of distinctive features extracted from the images, enabling fast and accurate image matching. The system likely uses these descriptors to compare query images against the stored reference sets, identifying matches or similarities based on keypoint correspondence. This approach improves image retrieval performance by leveraging precomputed descriptors, reducing computational overhead during runtime. The database organization allows for scalable storage and retrieval of image features, supporting applications like object recognition, augmented reality, or visual search engines. The system may also include preprocessing modules to generate or refine keypoint descriptors, ensuring robustness against variations in image conditions. By maintaining a structured database of descriptors, the system enables efficient indexing and searching, enhancing the speed and accuracy of image-based queries.
8. The computing system of claim 1 , wherein the feature extraction portion of the machine-learned image descriptor model comprises a convolutional neural network.
The invention relates to computing systems for image processing, specifically those using machine-learned models to extract and describe features from images. The system addresses the challenge of efficiently and accurately representing image content for tasks like classification, retrieval, or matching. The core component is a machine-learned image descriptor model that processes input images to generate a compact, discriminative representation of their features. This model includes a feature extraction portion, which captures low-level to high-level visual patterns, and a descriptor generation portion, which condenses these features into a fixed-length vector or embedding. The feature extraction portion is implemented as a convolutional neural network (CNN), leveraging its ability to hierarchically learn spatial hierarchies and invariances in image data. The CNN applies convolutional layers, pooling layers, and activation functions to progressively extract features at different scales. The resulting feature maps are then processed to produce a final descriptor that can be used for downstream applications. The system may also include additional components for preprocessing input images or post-processing the descriptors to enhance performance. The invention aims to improve the accuracy, efficiency, and robustness of image-based tasks by optimizing the feature extraction and descriptor generation pipeline.
9. The computing system of claim 1 , wherein the attention portion of the machine-learned image descriptor model comprises a convolutional neural network.
The invention relates to computing systems for processing images using machine-learned models, specifically focusing on improving image descriptor models through attention mechanisms. The problem addressed is the need for more efficient and accurate image analysis by enhancing the ability of machine-learned models to focus on relevant features within an image. The solution involves a computing system that includes a machine-learned image descriptor model with an attention portion designed to selectively emphasize important regions of an image during processing. This attention portion is implemented as a convolutional neural network (CNN), which dynamically adjusts its focus based on the input image to improve feature extraction and representation. The system may also include additional components such as a feature extraction module and a training module to optimize the model's performance. The use of a CNN in the attention portion allows for hierarchical feature learning, enabling the model to capture both local and global image characteristics more effectively. This approach enhances the accuracy and robustness of image-based tasks such as classification, retrieval, or object detection. The invention aims to improve the efficiency and effectiveness of image processing systems by leveraging attention mechanisms within machine-learned models.
10. The computing system of claim 1 , wherein determining the first set of keypoint descriptors for the query image based at least in part on the first plurality of attention scores and the first plurality of local feature descriptors comprises selecting, as the first set of keypoint descriptors, a predefined number of the plurality of local feature descriptors with the highest attention scores.
This invention relates to computing systems for image processing, specifically improving keypoint descriptor selection in image matching or recognition tasks. The problem addressed is the inefficiency of traditional methods that rely solely on local feature descriptors without considering their relevance or importance, leading to suboptimal performance in tasks like object recognition or image retrieval. The system processes a query image by first extracting a plurality of local feature descriptors from the image. These descriptors represent distinctive features at various keypoints in the image. The system then generates a plurality of attention scores, which quantify the importance or relevance of each local feature descriptor. These attention scores are derived from a learned attention mechanism, which may be trained to prioritize features that are more discriminative or contextually significant. To determine the final set of keypoint descriptors for the query image, the system selects a predefined number of local feature descriptors that have the highest attention scores. This selection process ensures that only the most relevant and informative features are retained, improving the accuracy and efficiency of subsequent image matching or recognition tasks. The attention-based selection mechanism allows the system to adaptively focus on the most important features, enhancing performance in applications such as object detection, image retrieval, or scene understanding.
11. The computing system of claim 1 , wherein processing the query image using the feature extraction portion of the machine-learned image descriptor model comprises: generating an image pyramid that includes respective versions of the query image at different sizes; and respectively processing, with the feature extraction portion, the respective versions of the query image at the different sizes to obtain the first plurality of local feature descriptors.
The invention relates to a computing system for image processing, specifically for generating local feature descriptors from a query image using a machine-learned image descriptor model. The system addresses the challenge of accurately extracting and representing image features at multiple scales, which is critical for tasks like image matching, object recognition, and scene understanding. The system processes a query image by first generating an image pyramid, which consists of multiple versions of the query image at different sizes. This pyramid captures the image's features at various resolutions, ensuring that both fine-grained details and broader structural information are preserved. Each version of the query image in the pyramid is then processed by the feature extraction portion of the machine-learned model to obtain a plurality of local feature descriptors. These descriptors are compact representations of the image's key features, enabling efficient comparison and retrieval in subsequent applications. The machine-learned model is trained to optimize the extraction of discriminative and robust features, ensuring that the descriptors are invariant to common variations such as lighting changes, viewpoint shifts, and minor deformations. The system's ability to handle multi-scale feature extraction enhances its performance in real-world scenarios where images may be captured under varying conditions. This approach improves the accuracy and reliability of image-based applications, such as visual search, augmented reality, and autonomous navigation.
12. The computing system of claim 1 , wherein the machine-learned image descriptor model has been trained using training images that have only image-level annotations.
The invention relates to a computing system that processes images using a machine-learned image descriptor model. The system addresses the challenge of training image descriptor models with limited annotation data, specifically when only image-level labels are available rather than detailed pixel-level or region-level annotations. This is a common problem in computer vision, where detailed annotations are time-consuming and expensive to obtain. The machine-learned image descriptor model is trained using training images that have only image-level annotations, meaning each image is labeled with a single category or label rather than detailed annotations for specific regions or objects within the image. Despite this limitation, the model is capable of generating meaningful image descriptors, which are compact representations of the image content that can be used for tasks such as image retrieval, classification, or similarity comparison. The system leverages these descriptors to perform downstream tasks without requiring fine-grained annotations during training. The model may be trained using techniques such as contrastive learning or self-supervised learning, where the system learns to distinguish between different images based on their global features rather than relying on localized annotations. This approach allows the system to generalize well even when only coarse labels are available, making it practical for real-world applications where detailed annotations are scarce. The resulting descriptors can be used in various applications, including content-based image retrieval, image clustering, and automated image organization.
13. The computing system of claim 1 , wherein the machine-learned image descriptor model has been trained through performance of a two-part training process, the two-part training process comprising: a first training process to train the feature extraction portion of the machine-learned image descriptor model using a first loss function; and a second training process to train the attention portion of the machine-learned image descriptor model using a second loss function.
This invention relates to a computing system that employs a machine-learned image descriptor model for processing images. The model is designed to extract features from images and apply attention mechanisms to enhance descriptive accuracy. The key innovation lies in a two-part training process for the model. The first training phase focuses on the feature extraction portion, using a first loss function to optimize its ability to capture relevant image features. The second training phase targets the attention portion, employing a second loss function to refine the model's capacity to prioritize important image regions. This dual-training approach ensures that the model effectively balances feature extraction and attention mechanisms, improving overall image description performance. The system is particularly useful in applications requiring precise image analysis, such as object recognition, scene understanding, or automated image tagging. The training process enhances the model's ability to generate accurate and contextually relevant image descriptors, addressing challenges in traditional single-phase training methods that may overlook the interplay between feature extraction and attention.
14. A computer-implemented method to train a machine-learned image descriptor model, the method comprising: performing, by a computing system comprising one or more computing devices, a first training process, the first training process comprising, for each of one or more first training images: processing, by the computing system, the first training image using a feature extraction portion of the machine-learned image descriptor model to obtain, as an output of the feature extraction portion, a first plurality of local feature descriptors for the first training image; evaluating, by the computing system, a first loss function based at least in part on the first plurality of local feature descriptors for the first training image; and training, by the computing system, the feature extraction portion of the machine-learned image descriptor model based at least in part on the first loss function; and performing, by the computing system, a second training process, the second training process comprising, for each of one or more second training images: processing, by the computing system, the second training image using the feature extraction portion of the machine-learned image descriptor model to obtain, as an output of the feature extraction portion, a second plurality of local feature descriptors for the first training image; processing the second plurality of local feature descriptors for the second training image using an attention portion of the machine-learned image descriptor model to obtain, as an output of the attention portion, a plurality of attention scores respectively for the second plurality of local feature descriptors; evaluating, by the computing system, a second loss function based at least in part on the plurality of attention scores for the second training image; and training, by the computing system, the attention portion of the machine-learned image descriptor model based at least in part on the second loss function.
The invention relates to training a machine-learned image descriptor model for extracting and processing local feature descriptors from images. The model includes a feature extraction portion and an attention portion. The feature extraction portion processes training images to generate local feature descriptors, which are then evaluated using a first loss function to train the feature extraction portion. The attention portion further processes these descriptors to produce attention scores, which are evaluated using a second loss function to train the attention portion. The training involves two distinct processes: a first process focuses on optimizing the feature extraction portion by refining local feature descriptors, while a second process enhances the attention portion by adjusting attention scores based on the descriptors. This two-stage training approach improves the model's ability to accurately describe and prioritize relevant features in images, addressing challenges in image recognition and retrieval tasks where distinguishing salient features is critical. The method leverages computational systems to iteratively refine both the feature extraction and attention mechanisms, ensuring robust performance in various image analysis applications.
15. The computer-implemented method of claim 14 , wherein the feature extraction portion of the machine-learned image descriptor model comprises a convolutional neural network.
This invention relates to computer-implemented methods for processing images using machine-learned models, specifically focusing on feature extraction. The method addresses the challenge of efficiently and accurately extracting meaningful features from images for tasks such as classification, recognition, or retrieval. Traditional approaches often struggle with computational efficiency or the ability to capture complex patterns in image data. The method involves a machine-learned image descriptor model that includes a feature extraction portion. This portion is responsible for analyzing input images and identifying key features that represent the image's content. The feature extraction is performed using a convolutional neural network (CNN), a type of deep learning architecture known for its effectiveness in processing grid-like data such as images. The CNN applies a series of convolutional layers to detect hierarchical features, from low-level edges and textures to high-level object representations. This extracted feature set is then used for downstream tasks, such as generating image descriptors or enabling image-based queries. By leveraging a CNN for feature extraction, the method improves the accuracy and robustness of image analysis compared to traditional techniques. The CNN's ability to learn and adapt to diverse image patterns makes it particularly suitable for applications requiring high precision, such as medical imaging, autonomous systems, or content-based image retrieval. The method ensures that the extracted features are both discriminative and computationally efficient, addressing the limitations of prior art in image processing.
16. The computer-implemented method of claim 14 , wherein training, by the computing system, the feature extraction portion of the machine-learned image descriptor model based at least in part on the first loss function comprises backpropagating, by the computing system, the first loss function through the feature extraction portion of the machine-learned image descriptor model.
The invention relates to machine learning techniques for training image descriptor models, specifically focusing on optimizing the feature extraction portion of such models. The problem addressed is improving the efficiency and accuracy of feature extraction in machine-learned image descriptor models, which are used to generate descriptive representations of images for tasks like image recognition, retrieval, or matching. The method involves training a machine-learned image descriptor model, which includes a feature extraction portion and a descriptor generation portion. The feature extraction portion processes input images to extract relevant features, while the descriptor generation portion converts these features into a compact, descriptive representation. During training, a first loss function is used to evaluate the performance of the model. The training process specifically targets the feature extraction portion by backpropagating the first loss function through this portion of the model. This backpropagation step adjusts the parameters of the feature extraction portion to minimize the loss, thereby improving its ability to extract meaningful features from images. The method may also involve training the descriptor generation portion using a second loss function, which could be different from the first loss function. This allows for independent optimization of both portions of the model, ensuring that the feature extraction and descriptor generation processes are fine-tuned for their respective roles. The overall goal is to enhance the model's performance in generating accurate and discriminative image descriptors.
17. The computer-implemented method of claim 14 , wherein the attention portion of the machine-learned image descriptor model comprises a convolutional neural network.
The invention relates to machine-learned image descriptor models, specifically focusing on improving attention mechanisms within these models. The problem addressed is the need for more efficient and accurate image feature extraction, particularly in tasks requiring selective focus on relevant image regions. The solution involves integrating a convolutional neural network (CNN) into the attention portion of the model. This CNN enhances the model's ability to dynamically prioritize and process important image regions, improving performance in tasks like object recognition, image classification, and scene understanding. The attention mechanism, now powered by the CNN, adaptively weights different parts of the image, allowing the model to focus on salient features while suppressing irrelevant or noisy data. This approach leverages the CNN's spatial hierarchy and feature extraction capabilities to refine attention maps, leading to more precise and context-aware image representations. The overall system combines the attention module with other components of the image descriptor model, such as feature extraction and encoding layers, to produce robust and discriminative image descriptors. The use of a CNN in the attention portion ensures that the model can handle complex visual scenes and adapt to varying image conditions, making it suitable for applications in computer vision, autonomous systems, and multimedia analysis.
18. The computer-implemented method of claim 14 , wherein training, by the computing system, the attention portion of the machine-learned image descriptor model based at least in part on the second loss function comprises backpropagating, by the computing system, the second loss function through the attention portion of the machine-learned image descriptor model.
This invention relates to machine learning techniques for training image descriptor models, particularly those incorporating attention mechanisms. The problem addressed is improving the training efficiency and performance of attention-based models in image processing tasks. Traditional training methods may struggle with optimizing attention mechanisms effectively, leading to suboptimal feature extraction and descriptor generation. The invention describes a computer-implemented method for training an attention portion of a machine-learned image descriptor model. The model processes input images to generate descriptors, which are compact representations of image features. The attention portion dynamically focuses on relevant image regions, enhancing descriptor quality. Training involves a second loss function that evaluates the attention mechanism's performance. The key innovation is backpropagating this loss function specifically through the attention portion, allowing targeted optimization of attention weights. This approach ensures the attention mechanism learns to prioritize salient image regions more effectively, improving descriptor accuracy and robustness. The method may also include preprocessing input images, generating initial descriptors, and refining them using the attention mechanism. The second loss function could measure descriptor similarity, attention weight distribution, or other relevant metrics. By isolating the attention portion during backpropagation, the training process becomes more efficient, avoiding unnecessary updates to other model components. This technique is applicable to various image-based tasks, such as retrieval, classification, or object detection, where attention mechanisms are critical for performance.
19. The computer-implemented method of claim 14 , further comprising: after performing the first and the second training processes, using, by the computing system, the machine-learned image descriptor model to perform image retrieval.
This invention relates to machine learning-based image retrieval systems. The problem addressed is improving the accuracy and efficiency of retrieving relevant images from a database based on a query image. Traditional methods often struggle with capturing complex visual features or require extensive computational resources. The system first trains a machine-learned image descriptor model using a first training process that involves generating synthetic image pairs with known transformations. These pairs are used to learn a transformation-invariant representation of images. A second training process then refines the model by incorporating real-world image pairs with known semantic relationships, ensuring the model captures meaningful visual features beyond mere transformations. After training, the model is used to perform image retrieval. When a query image is provided, the system extracts its feature representation using the trained model. The system then compares this representation against feature representations of images in a database, ranking the database images by similarity to the query. The retrieval process leverages the transformation-invariant and semantically meaningful features learned during training to improve accuracy. This approach enhances image retrieval by combining synthetic and real-world training data, resulting in a model that generalizes well to diverse image queries while maintaining computational efficiency.
20. One or more non-transitory computer-readable media that collectively store: a machine-learned image descriptor model configured to determine a set of keypoint descriptors for an input image, the machine-learned image descriptor model comprising: a feature extraction portion configured to extract a plurality of local feature descriptors from the input image; an attention portion configured to determine a plurality of attention scores respectively for the plurality of local feature descriptors; and a keypoint selection portion configured to select a plurality of keypoint descriptors from the plurality of local feature descriptors based at least in part on the plurality of attention scores, the plurality of keypoint descriptors corresponding to a subset of the plurality of local feature descriptors.
This invention relates to computer vision and machine learning, specifically improving image descriptor models for keypoint detection. The problem addressed is the inefficiency and inaccuracy of traditional methods in identifying and describing keypoints in images, which are critical for tasks like object recognition, image matching, and 3D reconstruction. The solution involves a machine-learned image descriptor model that processes an input image to extract keypoint descriptors more effectively. The model includes three main components: a feature extraction portion that analyzes the input image to generate multiple local feature descriptors, an attention portion that calculates attention scores for these descriptors to prioritize relevant features, and a keypoint selection portion that filters the local descriptors based on the attention scores to produce a refined set of keypoint descriptors. This approach ensures that only the most informative features are retained, improving accuracy and computational efficiency. The attention mechanism dynamically adjusts the importance of features, allowing the model to adapt to different image contexts. This method enhances performance in applications requiring precise keypoint detection, such as augmented reality, robotics, and autonomous navigation.
Unknown
May 12, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.