Legal claims defining the scope of protection, as filed with the USPTO.
1. An object detection method, comprising: acquiring a scene image of a scene; acquiring a three-dimensional point cloud corresponding to the scene; segmenting the scene image into a plurality of sub-regions; merging the plurality of sub-regions according to image similarities among the sub-regions and three-dimensional point similarities among the sub-regions to generate a plurality of region proposals, wherein a three-dimensional point similarity between any two sub-regions is determined by calculating distances between three-dimensional points in a three-dimensional point sub-cloud corresponding to one of said two sub-regions and three-dimensional points in a three-dimensional point sub-cloud corresponding to another of said two sub-regions, and the three-dimensional point similarity is inversely proportional to the calculated distances; wherein when the image similarity of the two sub-regions is greater than an image similarity threshold, and the three-dimensional point similarity of the two sub-regions is greater than a three-dimensional point similarity threshold, merging the two sub-regions; and performing object detection on the plurality of region proposals to determine a target object to be detected in the scene image.
2. The method of claim 1, wherein acquiring the three-dimensional point cloud corresponding to the scene, comprises: scanning the scene by a simultaneous localization and mapping (SLAM) system to generate the three-dimensional point cloud corresponding to the scene.
3. The method of claim 1, wherein merging the plurality of sub-regions according to image similarities among the sub-regions and three-dimensional point similarities among the sub-regions to generate the plurality of region proposals, comprises: acquiring a first sub-image to a nth sub-image of the scene image and a first three-dimensional point sub-cloud to a nth three-dimensional point sub-cloud of the three-dimensional point cloud corresponding to a first sub-region to a nth sub-region, wherein n is a positive integer greater than one; and merging n sub-regions according to the first sub-image to the nth sub-image and the first three-dimensional point sub-cloud to the nth three-dimensional point sub-cloud to form the plurality of region proposals.
4. The method of claim 3, wherein merging the n sub-regions according to the first sub-image to the nth sub-image and the first three-dimensional point sub-cloud to the nth three-dimensional point sub-cloud to form the plurality of region proposals, comprises: acquiring an ith sub-region and a jth sub-region in the n sub-regions, wherein i and j are positive integers less than or equal to n; generating an image similarity between the ith sub-region and the jth sub-region according to an ith sub-image of the ith sub-region and a jth sub-image of the jth sub-region; generating, according to an ith three-dimensional point sub-cloud of the ith sub-region and a jth three-dimensional point sub-cloud of the jth sub-region, a three-dimensional point similarity between the ith sub-region and the jth sub-region; and merging the ith sub-region and the jth sub-region according to the image similarity and the three-dimensional point similarity.
5. The method of claim 4, wherein said generating an image similarity between the ith sub-region and the jth sub-region according to an ith sub-image of the ith sub-region and a jth sub-image of the jth sub-region comprises: calculating a structural similarity, a cosine similarity, mutual information, a color similarity, or a histogram similarity between the ith sub-image of the ith sub-region and the jth sub-image of the jth sub-region to generate the image similarity between the ith sub-region and the jth sub-region.
6. The method of claim 4, wherein said generating, according to an ith three-dimensional point sub-cloud of the ith sub-region and a jth three-dimensional point sub-cloud of the jth sub-region, a three-dimensional point similarity between the ith sub-region and the jth sub-region comprises: calculating a distance between each three-dimensional point in the ith three-dimensional point sub-cloud of the ith sub-region and each three-dimensional point in jth three-dimensional point sub-cloud of the jth sub-region to determine the three-dimensional point similarity between the ith sub-region and the jth sub-region.
7. The method of claim 1, wherein the scene image is segmented by one of a watershed segmentation algorithm, a pyramid segmentation algorithm, and a mean shift segmentation algorithm.
8. A terminal device, comprising: a memory, a processor, and computer programs stored in the memory and executable by the processor, wherein when the processor executes the computer programs, an object detection method is implemented, the object detection method comprising: acquiring a scene image of a scene; acquiring a three-dimensional point cloud corresponding to the scene; segmenting the scene image into a plurality of sub-regions; merging the plurality of sub-regions according to image similarities among the sub-regions and three-dimensional point similarities among the sub-regions to generate a plurality of region proposals, wherein a three-dimensional point similarity between any two sub-regions is determined by calculating distances between three-dimensional points in a three-dimensional point sub-cloud corresponding to one of said two sub-regions and three-dimensional points in a three-dimensional point sub-cloud corresponding to another of said two sub-regions, and the three-dimensional point similarity is inversely proportional to the calculated distances; wherein when the image similarity of the two sub-regions is greater than an image similarity threshold, and the three-dimensional point similarity of the two sub-regions is greater than a three-dimensional point similarity threshold, merging the two sub-regions; and performing object detection on the plurality of region proposals to determine a target object to be detected in the scene image.
9. The terminal device according to claim 8, wherein acquiring the three-dimensional point cloud corresponding to the scene, comprises: scanning the scene by a simultaneous localization and mapping (SLAM) system to generate the three-dimensional point cloud corresponding to the scene.
10. The terminal device according to claim 8, wherein merging the plurality of sub-regions according to image similarities among the sub-regions and three-dimensional point similarities among the sub-regions to generate the plurality of region proposals, comprises: acquiring a first sub-image to a nth sub-image of the scene image and a first three-dimensional point sub-cloud to a nth three-dimensional point sub-cloud of the three-dimensional point cloud corresponding to a first sub-region to a nth sub-region, wherein n is a positive integer greater than one; and merging n sub-regions according to the first sub-image to the nth sub-image and the first three-dimensional point sub-cloud to the nth three-dimensional point sub-cloud to form the plurality of region proposals.
11. The terminal device according to claim 10, wherein merging the n sub-regions according to the first sub-image to the nth sub-image and the first three-dimensional point sub-cloud to the nth three-dimensional point sub-cloud to form the plurality of region proposals, comprises: acquiring an ith sub-region and a jth sub-region in the n sub-regions, wherein i and j are positive integers less than or equal to n; generating an image similarity between the ith sub-region and the jth sub-region according to an ith sub-image of the ith sub-region and a jth sub-image of the jth sub-region; generating, according to an ith three-dimensional point sub-cloud of the ith sub-region and a jth three-dimensional point sub-cloud of the jth sub-region, a three-dimensional point similarity between the ith sub-region and the jth sub-region; and merging the ith sub-region and the jth sub-region according to the image similarity and the three-dimensional point similarity.
12. The terminal device of claim 11, wherein said generating an image similarity between the ith sub-region and the jth sub-region according to an ith sub-image of the ith sub-region and a jth sub-image of the jth sub-region comprises: calculating a structural similarity, a cosine similarity, mutual information, a color similarity, or a histogram similarity between the ith sub-image of the ith sub-region and the jth sub-image of the jth sub-region to generate the image similarity between the ith sub-region and the jth sub-region.
13. The terminal device of claim 11, wherein said generating, according to an ith three-dimensional point sub-cloud of the ith sub-region and a jth three-dimensional point sub-cloud of the jth sub-region, a three-dimensional point similarity between the ith sub-region and the jth sub-region comprises: calculating a distance between each three-dimensional point in the ith three-dimensional point sub-cloud of the ith sub-region and each three-dimensional point in jth three-dimensional point sub-cloud of the jth sub-region to determine the three-dimensional point similarity between the ith sub-region and the jth sub-region.
14. A non-transitory computer readable storage medium, storing computer programs therein, wherein when the computer programs are executed by a processor, an object detection method is implemented, the object detection method comprising: acquiring a scene image of a scene; acquiring a three-dimensional point cloud corresponding to the scene; segmenting the scene image into a plurality of sub-regions; merging the plurality of sub-regions according to image similarities among the sub-regions and three-dimensional point similarities among the sub-regions to generate a plurality of region proposals, wherein a three-dimensional point similarity between any two sub-regions is determined by calculating distances between three-dimensional points in a three-dimensional point sub-cloud corresponding to one of said two sub-regions and three-dimensional points in a three-dimensional point sub-cloud corresponding to another of said two sub-regions, and the three-dimensional point similarity is inversely proportional to the calculated distances; wherein when the image similarity of the two sub-regions is greater than an image similarity threshold, and the three-dimensional point similarity of the two sub-regions is greater than a three-dimensional point similarity threshold, merging the two sub-regions; and performing object detection on the plurality of region proposals to determine a target object to be detected in the scene image.
15. The non-transitory computer readable storage medium according to claim 14, wherein acquiring the three-dimensional point cloud corresponding to the scene, comprises: scanning the scene by a simultaneous localization and mapping (SLAM) system to generate the three-dimensional point cloud corresponding to the scene.
16. The non-transitory computer readable storage medium according to claim 14, wherein merging the plurality of sub-regions according to image similarities among the sub-regions and three-dimensional point similarities among the sub-regions to generate the plurality of region proposals, comprises: acquiring a first sub-image to a nth sub-image of the scene image and a first three-dimensional point sub-cloud to a nth three-dimensional point sub-cloud of the three-dimensional point cloud corresponding to a first sub-region to a nth sub-region, wherein n is a positive integer greater than one; and merging n sub-regions according to the first sub-image to the nth sub-image and the first three-dimensional point sub-cloud to the nth three-dimensional point sub-cloud to form the plurality of region proposals.
17. The non-transitory computer readable storage medium according to claim 16, wherein merging the n sub-regions according to the first sub-image to the nth sub-image and the first three-dimensional point sub-cloud to the nth three-dimensional point sub-cloud to form the plurality of region proposals, comprises: acquiring an ith sub-region and a jth sub-region in the n sub-regions, wherein i and j are positive integers less than or equal to n; generating an image similarity between the ith sub-region and the jth sub-region according to an ith sub-image of the ith sub-region and a jth sub-image of the jth sub-region; generating, according to an ith three-dimensional point sub-cloud of the ith sub-region and a jth three-dimensional point sub-cloud of the jth sub-region, a three-dimensional point similarity between the ith sub-region and the jth sub-region; and merging the ith sub-region and the jth sub-region according to the image similarity and the three-dimensional point similarity.
Unknown
May 6, 2025
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.