Provided herein are methods and systems for implementing three-dimensional perception in an autonomous robotic system comprising an end-to-end neural network architecture that directly consumes large-scale raw sparse point cloud data and performs such tasks as object localization, boundary estimation, object classification, and segmentation of individual shapes or fused complete point cloud shapes.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A computer-implemented method of implementing three-dimensional perception in an autonomous robotic system, the method comprising: a) receiving, at a processor, two-dimensional image data from an optical camera; b) generating, by the processor, an attention region in the two-dimensional image data, the attention region marking an object of interest; c) receiving, at the processor, three-dimensional depth data from a depth sensor, the depth data corresponding to the image data; d) extracting, by the processor, a three-dimensional frustum from the depth data corresponding to the attention region; and e) applying, by the processor, a deep learning model to the frustum to: i) generate and regress an oriented three-dimensional boundary for the object of interest; and ii) classify the object of interest based on a combination of features from the attention region of the two-dimensional image data and the three-dimensional depth data within and around the regressed boundary.
2. The method of claim 1 , wherein the classification the object of interest is further based on at least one of the two-dimensional image data and the three-dimensional depth data.
3. The method of claim 1 , wherein the autonomous robotic system is an autonomous vehicle.
4. The method of claim 1 , wherein the two-dimensional image data is RGB image data.
5. The method of claim 1 , wherein the two-dimensional image data is IR image data.
6. The method of claim 1 , wherein the depth sensor comprises a LiDAR.
7. The method of claim 6 , wherein the three-dimensional depth data comprises a sparse point cloud.
8. The method of claim 1 , wherein the depth sensor comprises a stereo camera or a time-of-flight sensor.
9. The method of claim 8 , wherein the three-dimensional depth data comprises a dense depth map.
10. The method of claim 1 , wherein the deep learning model comprises a PointNet.
11. The method of claim 1 , wherein the deep learning model comprises a three-dimensional convolutional neural network on voxelized volumetric grids of the point cloud in frustum.
12. The method of claim 1 , wherein the deep learning model comprises a two-dimensional convolutional neural network on bird's eye view projection of the point cloud in frustum.
13. The method of claim 1 , wherein the deep learning model comprises a recurrent neural network on the sequence of the three-dimensional points from close to distant.
14. The method of claim 1 , wherein the classifying comprises semantic classification to apply a category label to the object of interest.
15. An autonomous robotic system comprising: an optical camera, a depth sensor, a memory, and at least one processor configured to: a) receive two-dimensional image data from the optical camera; b) generate an attention region in the two-dimensional image data, the attention region marking an object of interest; c) receive three-dimensional depth data from the depth sensor, the depth data corresponding to the image data; d) extract a three-dimensional frustum from the depth data corresponding to the attention region; and e) apply a deep learning model to the frustum to: i) generate and regress an oriented three-dimensional boundary for the object of interest; and ii) classify the object of interest based on a combination of features from the attention region of the two-dimensional image data and the three-dimensional depth data within and around the regressed boundary.
16. The system of claim 15 , wherein the classification the object of interest is further based on at least one of the two-dimensional image data and the three-dimensional depth data.
17. The system of claim 15 , wherein the autonomous robotic system is an autonomous vehicle.
18. The system of claim 15 , wherein the two-dimensional image data is RGB image data.
19. The system of claim 15 , wherein the two-dimensional image data is IR image data.
20. The system of claim 15 , wherein the depth sensor comprises a LiDAR.
21. The system of claim 20 , wherein the three-dimensional depth data comprises a sparse point cloud.
22. The system of claim 15 , wherein the depth sensor comprises a stereo camera or a time-of-flight sensor.
23. The system of claim 22 , wherein the three-dimensional depth data comprises a dense depth map.
24. The system of claim 15 , wherein the deep learning model comprises a PointNet.
25. The system of claim 15 , wherein the deep learning model comprises a three-dimensional convolutional neural network on voxelized volumetric grids of the point cloud in frustum.
26. The system of claim 15 , wherein the deep learning model comprises a two-dimensional convolutional neural network on bird's eye view projection of the point cloud in frustum.
27. The system of claim 15 , wherein the deep learning model comprises a recurrent neural network on the sequence of the three-dimensional points from close to distant.
28. The system of claim 15 , wherein the classifying comprises semantic classification to apply a category label to the object of interest.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
August 31, 2018
November 3, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.