In various examples, systems and methods are described that generate scene flow in 3D space through simplifying the 3D LiDAR data to “2.5D” optical flow space (e.g., x, y, and depth flow). For example, LiDAR range images may be used to generate 2.5D representations of depth flow information between frames of LiDAR data, and two or more range images may be compared to generate depth flow information, and messages may be passed—e.g., using a belief propagation algorithm—to update pixel values in the 2.5D representation. The resulting images may then be used to generate 2.5D motion vectors, and the 2.5D motion vectors may be converted back to 3D space to generate a 3D scene flow representation of an environment around an autonomous machine.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
4. The processor of claim 1, further comprising processing circuitry to perform one or more operations based at least in part on the scene flow.
This invention relates to computer vision systems for analyzing dynamic scenes, particularly for estimating scene flow—the 3D motion of objects and surfaces within a scene. The technology addresses challenges in accurately tracking moving elements in real-world environments, which is critical for applications like autonomous navigation, robotics, and augmented reality. Traditional methods often struggle with occlusions, varying lighting conditions, or complex motion patterns, leading to errors in motion estimation. The system includes a processor with specialized processing circuitry designed to compute scene flow from input data, such as images or sensor readings. The circuitry processes this data to derive motion vectors representing the movement of objects or surfaces in 3D space. Additionally, the processor includes further circuitry to perform downstream tasks based on the computed scene flow, such as object tracking, collision avoidance, or scene reconstruction. The system may integrate multiple sensors, including cameras and depth sensors, to enhance accuracy. By leveraging advanced algorithms, the invention improves motion estimation in dynamic environments, enabling more reliable decision-making in automated systems. The technology is particularly useful in scenarios requiring real-time analysis, such as self-driving cars or industrial automation.
5. The processor of claim 1, wherein the at least two sequential range images are generated using corresponding LiDAR point clouds.
This invention relates to a system for processing sequential range images derived from LiDAR point clouds to enhance object detection and tracking in autonomous navigation or robotic applications. The system addresses the challenge of accurately reconstructing 3D environments from LiDAR data, which often suffers from noise, sparsity, and misalignment between consecutive scans. The processor generates at least two sequential range images from corresponding LiDAR point clouds, where each range image represents depth information captured at different time intervals. The processor then aligns these range images to correct for motion-induced distortions and merges them into a unified representation. This alignment compensates for discrepancies caused by sensor movement or environmental changes, improving the accuracy of subsequent object detection and tracking tasks. The system may further apply filtering techniques to reduce noise and enhance the clarity of the merged range image. By leveraging sequential LiDAR data, the invention enables more robust and reliable 3D scene understanding, which is critical for applications such as autonomous vehicles, drones, and industrial robotics. The processor may also incorporate machine learning models to refine the alignment and fusion process, ensuring high precision in dynamic environments.
6. The processor of claim 1, wherein, when a pair of another one or more adjacent pixels are determined not to correspond to a same object, data corresponding to the pair of the another one or more adjacent pixels is prevented from being passed.
This invention relates to image processing, specifically to a method for selectively filtering pixel data in an image based on object correspondence. The problem addressed is the need to efficiently process image data by distinguishing between pixels that belong to the same object and those that do not, thereby optimizing data transmission or storage by preventing unnecessary data from being passed when adjacent pixels do not correspond to the same object. The invention involves a processor that analyzes pairs of adjacent pixels in an image to determine whether they correspond to the same object. If the pixels are determined to belong to the same object, their data is allowed to be passed for further processing, transmission, or storage. Conversely, if the pixels are determined not to correspond to the same object, the data for those pixels is blocked or prevented from being passed. This selective filtering helps reduce redundant or irrelevant data, improving efficiency in image processing tasks such as compression, transmission, or analysis. The processor may use various techniques to determine object correspondence, such as edge detection, segmentation, or machine learning-based classification. The invention ensures that only relevant pixel data is processed, enhancing performance in applications like real-time video streaming, medical imaging, or autonomous vehicle vision systems. By dynamically filtering pixel data based on object coherence, the system optimizes computational resources and bandwidth usage.
8. The method of claim 7, wherein the first LiDAR range image and the second LiDAR range image are generated using data representative of one or more LiDAR point clouds.
LiDAR systems generate point clouds representing spatial data, but processing these point clouds for applications like autonomous navigation or 3D mapping requires efficient conversion into range images. Range images simplify point cloud data into 2D representations while preserving depth information, but traditional methods often struggle with noise, sparsity, or computational inefficiency. This invention addresses these challenges by generating a first and second LiDAR range image from one or more LiDAR point clouds. The method involves transforming the point cloud data into range images, where each pixel in the range image corresponds to a depth value derived from the point cloud. The first and second range images may be generated from the same or different point clouds, depending on the application. The process may include filtering, interpolation, or other preprocessing steps to enhance data quality. By converting point clouds into range images, the method enables faster processing, improved noise reduction, and better compatibility with existing 2D image processing algorithms. This approach is particularly useful in autonomous vehicles, robotics, and environmental mapping, where real-time data interpretation is critical. The invention ensures accurate depth perception while maintaining computational efficiency, overcoming limitations of traditional point cloud processing techniques.
9. The method of claim 7, wherein the first LiDAR range image is converted to a same coordinate system as the second LiDAR range image based at least in part on calculated ego-motion.
This invention relates to LiDAR-based systems for aligning range images from different scans to improve spatial accuracy. The problem addressed is the misalignment of LiDAR range images captured at different times or from different perspectives, which can lead to errors in 3D mapping, object detection, or autonomous navigation. The solution involves converting a first LiDAR range image to the same coordinate system as a second LiDAR range image using calculated ego-motion data. Ego-motion refers to the movement of the LiDAR sensor itself, which is determined by tracking changes in position and orientation between scans. By applying this motion data, the system corrects for discrepancies caused by sensor movement, ensuring that the range images are properly aligned in a shared coordinate frame. This alignment process enhances the accuracy of subsequent applications, such as 3D reconstruction, obstacle detection, or path planning in autonomous vehicles or robotics. The method may also involve preprocessing steps like noise filtering or point cloud segmentation to improve alignment quality. The invention is particularly useful in dynamic environments where sensor motion introduces significant misalignment between consecutive scans.
10. The method of claim 7, wherein the first LiDAR range image and the second LiDAR range image each represent at least one of reflectivity information, intensity information, depth information, time of flight (ToF) information, return information, or classification information.
This invention relates to LiDAR (Light Detection and Ranging) systems, specifically methods for processing and analyzing LiDAR range images to improve object detection, mapping, or environmental perception. The core problem addressed is the need for more detailed and accurate data extraction from LiDAR sensors, which are widely used in autonomous vehicles, robotics, and other applications requiring precise spatial awareness. The method involves capturing and processing at least two LiDAR range images, where each image contains multiple types of data. These images may include reflectivity information (how much light is reflected back), intensity information (strength of the returned signal), depth information (distance to objects), time of flight (ToF) information (time taken for light to return), return information (multiple returns from a single pulse), or classification information (categorization of detected objects). By analyzing these different data types together, the system can enhance object recognition, distinguish between different surfaces, and improve overall scene understanding. The method may also involve comparing or combining the first and second LiDAR range images to refine measurements, reduce noise, or detect changes over time. This approach allows for more robust and reliable environmental perception, which is critical for applications like autonomous navigation, obstacle avoidance, and 3D mapping. The use of multiple data types in each image ensures that the system can adapt to varying conditions and provide more comprehensive insights into the scanned environment.
11. The method of claim 7, wherein the one or more motion vectors are calculated in two-dimensional (2D) space, and the method further comprises converting the one or more motion vectors to 3D space to generate one or more 3D motion vectors.
This invention relates to motion vector processing in video or image analysis, specifically addressing the challenge of accurately representing motion in three-dimensional (3D) space when initial motion vectors are calculated in two-dimensional (2D) space. The method involves calculating motion vectors in 2D space to determine the displacement of objects or features between frames. These 2D motion vectors are then converted into 3D motion vectors to provide a more comprehensive understanding of motion in three-dimensional environments. The conversion process may involve applying geometric transformations or depth information to map the 2D vectors into 3D coordinates. This approach enhances applications such as object tracking, augmented reality, and 3D reconstruction by enabling more accurate motion representation in three-dimensional contexts. The method may also include additional steps such as refining the 3D motion vectors based on depth data or other spatial constraints to improve accuracy. The invention is particularly useful in systems where 2D motion estimation is initially performed but 3D motion analysis is required for downstream tasks.
12. The method of claim 11, wherein the one or more 3D motion vectors represent a scene flow representation between the first LiDAR range image and the second LiDAR range image.
This invention relates to LiDAR-based scene flow estimation, addressing the challenge of accurately determining 3D motion vectors between consecutive LiDAR range images to model dynamic environments. The method involves processing a first LiDAR range image and a second LiDAR range image to compute 3D motion vectors that represent the scene flow between them. Scene flow refers to the 3D motion of points in the scene, capturing both translational and rotational movements. The computed motion vectors are used to align the second LiDAR range image with the first, enabling accurate tracking of objects and environmental changes over time. This approach enhances applications such as autonomous navigation, obstacle detection, and dynamic mapping by providing a precise representation of motion in 3D space. The method leverages LiDAR data, which offers high-resolution depth information, to improve the accuracy and robustness of motion estimation compared to traditional 2D optical flow techniques. By analyzing the 3D structure of the scene, the system can distinguish between static and moving objects, improving situational awareness in real-time applications. The invention is particularly useful in autonomous vehicles and robotics, where understanding dynamic environments is critical for safe and efficient operation.
13. The method of claim 7, wherein the data corresponding to the at least one pixel of the one or more pixels is passed using a belief propagation algorithm.
This invention relates to image processing, specifically to methods for analyzing and interpreting pixel data in images. The problem addressed is the need for efficient and accurate techniques to process and propagate information about pixel data, particularly in applications like image segmentation, object recognition, or scene understanding. The method involves analyzing one or more pixels in an image, where each pixel has associated data that may include color, intensity, or other features. The key innovation is the use of a belief propagation algorithm to process and pass this pixel data. Belief propagation is a message-passing algorithm commonly used in probabilistic graphical models, such as Markov random fields, to infer the most likely configuration of variables given observed data. In this context, it helps propagate information about pixel data across the image, improving the accuracy of tasks like segmentation or classification. The method may involve constructing a graphical model where pixels are represented as nodes, and edges between nodes encode dependencies or relationships between pixels. The belief propagation algorithm then iteratively updates beliefs (probabilities or likelihoods) about the state of each pixel based on messages exchanged with neighboring pixels. This approach allows the system to incorporate contextual information from surrounding pixels, leading to more robust and coherent interpretations of the image. The technique is particularly useful in scenarios where pixel data is noisy or ambiguous, as the belief propagation algorithm helps resolve uncertainties by leveraging global or local consistency constraints. Applications include medical imaging, autonomous vehicle perception, and computer vision systems where reliable pixel-level
18. The system of claim 15, wherein the operations further comprise performing one or more operations based at least in part on the scene flow representation.
A system for analyzing dynamic scenes, such as those captured by cameras or sensors, generates a scene flow representation that encodes motion and structure information. The system processes input data, such as image sequences or sensor readings, to extract features and compute a scene flow representation that describes how objects and surfaces move within the scene. This representation includes both the 3D structure of the scene and the motion vectors of objects, enabling accurate tracking and prediction of dynamic elements. The system may use machine learning models, such as neural networks, to refine the scene flow representation and improve accuracy. Additionally, the system performs further operations based on the scene flow representation, such as object detection, trajectory prediction, or collision avoidance. These operations leverage the motion and structure data to enhance applications like autonomous navigation, robotics, or video analysis. The system may also integrate additional data sources, such as depth sensors or inertial measurement units, to improve the robustness of the scene flow representation. By analyzing the scene flow, the system enables real-time decision-making in dynamic environments, improving safety and efficiency in applications like self-driving cars, drones, or industrial automation.
19. The system of claim 15, wherein the at least two sequential range images are generated using corresponding LiDAR point clouds.
This invention relates to a system for processing sequential range images derived from LiDAR point clouds to enhance object detection and tracking in autonomous navigation or robotic applications. The system addresses the challenge of accurately identifying and tracking objects in dynamic environments where traditional camera-based systems may fail due to lighting conditions or occlusions. The core innovation involves generating multiple sequential range images from LiDAR point clouds, where each range image represents depth information captured at different time intervals. These range images are processed to extract features, such as object shapes, positions, and movements, by comparing the sequential data. The system may also incorporate additional processing steps, such as filtering noise, aligning the range images, or applying machine learning models to improve detection accuracy. By leveraging LiDAR's ability to provide high-resolution depth data, the system enables robust object tracking even in low-visibility scenarios. The sequential range images are generated by converting LiDAR point clouds into two-dimensional depth maps, which are then analyzed to detect changes between frames, allowing for real-time object identification and motion estimation. This approach enhances the reliability of autonomous systems in environments where visual sensors may be less effective.
20. The system of claim 15, wherein, when a pair of one or more other adjacent pixels are determined not to correspond to a same object, at least one message is prevented from being passed between the pair of the one or more other adjacent pixels.
This invention relates to image processing systems that analyze pixel relationships to determine object boundaries. The problem addressed is the need to accurately identify distinct objects in an image by preventing incorrect message passing between pixels that do not belong to the same object. In such systems, pixels exchange messages to infer object boundaries, but this can lead to errors when messages are passed between pixels from different objects. The system includes a processor configured to process an image by analyzing pixel adjacency and object correspondence. When evaluating a pair of adjacent pixels, the system determines whether they correspond to the same object. If they do not, the system prevents message passing between them. This ensures that boundary information is not incorrectly propagated across object edges, improving the accuracy of object segmentation. The system may also include additional features such as dynamic message filtering based on pixel attributes or learned thresholds to further refine boundary detection. The overall approach enhances the reliability of object detection in image processing applications.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
August 2, 2021
April 9, 2024
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.