The invention relates to a real-time object detection and 3D localization method based on a single frame image. Comprising following steps: S1: inputting a 2D RGB image; S2: performing feature extraction on the 2D RGB image, extracting features of a deep network and a shallow network respectively; S3: carrying out 2D object detection and applying to subsequent modules; S4: estimating vertices, instance-level depth and center point of a 3D-box respectively; S5: adding a regularization term for maintaining horizontal locality into prediction of center point of a 3D-box to constrain and optimize the prediction of center point of the 3D-box; and S6: outputting a 2D RGB image with a 3D-box tag in combination with predictions of all modules. The invention increases the speed of model training convergence and the accuracy of 3D object detection and localization, and meets the accuracy requirements of an Advanced Driver Assistant System (ADAS) with a low hardware cost.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
3. The real-time object detection and 3D localization method based on a single frame image according to claim 2, wherein in formula (1), when depth distances of two targets are similar and the targets are more adjacent on horizontal, a weight su will be greater; and when depth distance of the target pair is larger or horizontal distance difference of the target pairs is greater, the weight sij will be smaller.
This invention relates to real-time object detection and 3D localization using a single frame image. The method addresses the challenge of accurately determining the 3D positions of objects in a scene from a single 2D image, particularly when objects are closely spaced or at varying depths. The technique employs a weight-based approach to refine depth estimation, where the weight between two objects (sij) is dynamically adjusted based on their spatial relationships. Specifically, when two objects have similar depth distances and are closely aligned horizontally, the weight is increased to enhance depth correlation. Conversely, when objects are farther apart in depth or horizontally, the weight is reduced to minimize erroneous depth associations. This adaptive weighting mechanism improves the accuracy of 3D localization by prioritizing depth consistency for nearby objects while reducing interference from distant or spatially separated objects. The method integrates these weight adjustments into a mathematical formula to compute depth estimates, enabling real-time processing for applications such as autonomous navigation, robotics, and augmented reality. The solution enhances depth perception in single-frame imaging systems by dynamically balancing spatial and depth-based relationships between detected objects.
4. The real-time object detection and 3D localization method based on a single frame image according to claim 2, wherein the loss function of target confidence is a combination of a softmax function and a cross entropy; and the loss function of a 2D-Box is calculated by an L1 distance loss function.
This invention relates to real-time object detection and 3D localization using a single frame image. The method addresses the challenge of accurately detecting and localizing objects in 3D space from a single 2D image, which is critical for applications like autonomous driving, robotics, and augmented reality. The system processes an input image to identify objects and determine their 3D positions, improving upon traditional approaches that rely on multiple frames or additional sensors. The method employs a neural network-based approach where the loss function for target confidence is a combination of a softmax function and cross-entropy, ensuring robust classification of detected objects. The loss function for the 2D bounding box (2D-Box) is calculated using an L1 distance loss function, which minimizes the Euclidean distance between predicted and ground truth box coordinates, enhancing localization accuracy. The system integrates these loss functions to optimize both object detection and 3D positioning in real time, reducing computational overhead while maintaining high precision. By combining these loss functions, the method achieves efficient and accurate object detection and 3D localization from a single image, making it suitable for applications requiring real-time processing and high reliability. The approach eliminates the need for multi-frame analysis or additional hardware, simplifying deployment in resource-constrained environments.
6. The real-time object detection and 3D localization method based on a single frame image according to claim 1, wherein the 3D-box will be represented by a 3D center point of an object and coordinate points of 8 vertices of the 3D-box.
This invention relates to real-time object detection and 3D localization using a single frame image. The method addresses the challenge of accurately identifying and determining the three-dimensional position and orientation of objects in a scene from a single two-dimensional image, which is critical for applications like autonomous driving, robotics, and augmented reality. The system processes an input image to detect objects and generate a 3D bounding box around each detected object. The 3D bounding box is defined by a 3D center point representing the object's central position and eight vertex coordinates that outline the box's boundaries in three-dimensional space. This representation allows for precise spatial localization of objects, including their dimensions, orientation, and position relative to the camera or sensor capturing the image. The method leverages deep learning techniques to analyze the image and predict the 3D box parameters. It may involve convolutional neural networks or other machine learning models trained on annotated datasets to estimate the 3D center point and vertex coordinates from the 2D image features. The approach ensures real-time performance, making it suitable for dynamic environments where rapid object detection and localization are essential. By providing a compact yet comprehensive 3D representation of objects, the method enables applications such as obstacle avoidance, path planning, and object tracking in autonomous systems. The invention improves upon traditional methods by eliminating the need for multiple frames or additional sensors, relying solely on a single image for accurate 3D localization.
9. The real-time target detection and 3D positioning method based on a single frame image according to claim 8, wherein the embedded system is Jetson AGX Xavier.
This invention relates to real-time target detection and 3D positioning using a single frame image, leveraging an embedded system for processing. The method addresses the challenge of accurately detecting and localizing objects in three-dimensional space from a single image, which is computationally intensive and often requires multiple frames or specialized hardware. The solution employs an embedded system, specifically the Jetson AGX Xavier, to perform the necessary computations efficiently. The embedded system processes the input image to detect targets and determine their 3D positions, utilizing algorithms optimized for real-time performance. The system integrates hardware-accelerated processing to handle the computational load while maintaining low latency. This approach enables applications such as autonomous navigation, robotics, and augmented reality, where real-time 3D positioning is critical. The use of a single frame reduces dependency on sequential data, improving robustness in dynamic environments. The embedded system's capabilities ensure that the method operates within the constraints of power and processing limitations typical in portable or embedded devices. The invention provides a balance between accuracy and efficiency, making it suitable for deployment in resource-constrained environments.
10. The real-time target detection and 3D positioning method based on a single frame image according to claim 1, wherein the instance-level depth information is data obtained by predicting depth zg of the center point of the 3D-box through an instance-level depth prediction module, i.e., after a feature map is divided into grids, the depth prediction module only predicts a target depth of a grid having a distance from an instance less than a distance threshold σscope.
This invention relates to real-time target detection and 3D positioning using a single frame image, addressing the challenge of accurately determining the depth and spatial position of objects in 3D space from 2D images. The method improves upon traditional approaches by incorporating instance-level depth prediction to enhance precision in 3D positioning. The system processes an input image by generating a feature map, which is then divided into grids. A depth prediction module operates on these grids, focusing only on those within a specified distance threshold from an object instance. This selective prediction ensures that depth information is accurately estimated for relevant regions while ignoring irrelevant areas, improving computational efficiency and accuracy. The depth prediction module outputs the depth (zg) of the center point of a 3D bounding box (3D-box) for each detected object. This instance-level depth data is used to refine the 3D positioning of targets, enabling real-time applications such as autonomous navigation, robotics, and augmented reality. The method avoids the need for multi-frame processing or additional sensors, relying solely on a single image for depth estimation. By combining target detection with instance-level depth prediction, the invention achieves faster and more reliable 3D positioning, making it suitable for dynamic environments where real-time performance is critical. The selective depth prediction approach reduces computational overhead while maintaining high accuracy in 3D object localization.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
June 25, 2021
April 2, 2024
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.