10600210

Data Processing Systems for Real-Time Camera Parameter Estimation

PublishedMarch 24, 2020
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
20 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A system comprising: one or more computer processors configured as a neural network; and memory storing computer-executable instructions that, when executed by the one or more computer processors, cause the one or more computer processors to perform operations comprising: determining semantic keypoints for an environment; determining semantic keypoints for an image in an image sequence; determining person keypoints for the environment; determining person keypoints for the image in the image sequence; matching each of the semantic keypoints for the image to a respective semantic keypoint for the environment; matching each of the person keypoints for the image to a respective person keypoint for the environment; generating a homography based on the matching of each of the semantic keypoints for the image to the respective semantic keypoint for the environment and the matching of each of the person keypoints for the image to the respective person keypoint for the environment; decomposing the homography into intrinsic parameter estimates and extrinsic parameter estimates; refining each image in the image sequence by applying outlier rejection and particle filtering to each image in the image sequence to generate refined intrinsic parameter estimates and refined extrinsic parameter estimates; and determining a camera parameter based on the refined intrinsic parameter estimates and the refined extrinsic parameter estimates.

Plain English Translation

The system relates to computer vision and neural networks, specifically for estimating camera parameters in dynamic environments. The problem addressed is accurately determining camera position and orientation (extrinsic parameters) and internal camera settings (intrinsic parameters) in scenes containing both static and moving objects, such as people. Traditional methods struggle with dynamic elements, leading to inaccurate parameter estimates. The system uses a neural network to analyze an environment and an image sequence. It identifies semantic keypoints (distinctive features like objects or landmarks) and person keypoints (human body positions) in both the environment and each image. These keypoints are matched between the environment and the image sequence. A homography (a mathematical transformation) is generated from these matches, combining both semantic and person keypoints. The homography is then decomposed into intrinsic and extrinsic parameter estimates. To improve accuracy, the system refines these estimates using outlier rejection (removing mismatched keypoints) and particle filtering (a probabilistic technique to track changes over time). This refinement process generates updated intrinsic and extrinsic parameters for each image. The final camera parameters are derived from these refined estimates, enabling precise tracking in dynamic scenes. The approach enhances robustness in environments with moving people, improving applications like augmented reality, surveillance, and autonomous navigation.

Claim 2

Original Legal Text

2. The system of claim 1 , wherein determining the camera parameter comprises determining a camera focal length based on the refined intrinsic parameter estimates.

Plain English Translation

A system for camera calibration determines intrinsic parameters of a camera, including focal length, to improve accuracy in computer vision applications. The system refines initial estimates of intrinsic parameters, such as focal length, using optimization techniques to minimize errors in image feature matching or 3D reconstruction. By adjusting the focal length based on refined parameter estimates, the system enhances the precision of camera calibration, which is critical for tasks like object tracking, augmented reality, and autonomous navigation. The refined focal length ensures that the camera's projection model accurately maps 3D world points to 2D image coordinates, reducing distortion and improving computational efficiency in downstream vision algorithms. This approach addresses challenges in dynamic environments where lighting conditions or camera movement may introduce variability in calibration accuracy. The system integrates with existing calibration workflows, providing a robust solution for applications requiring high-precision camera parameter estimation.

Claim 3

Original Legal Text

3. The system of claim 2 , wherein determining the camera focal length based on the refined intrinsic parameter estimates comprises: determining a first focal length based on a first algebraic function; determining a second focal length based on a second algebraic function; and determining the camera focal length based on a comparison of the first focal length and the second focal length.

Plain English Translation

This invention relates to camera calibration systems, specifically methods for accurately determining the focal length of a camera using refined intrinsic parameter estimates. The problem addressed is the need for precise focal length calculation in computer vision applications, where errors in focal length estimation can lead to inaccuracies in tasks such as 3D reconstruction, object tracking, and augmented reality. The system refines intrinsic camera parameters, including focal length, by leveraging multiple algebraic functions. First, a first focal length is computed using a first algebraic function derived from the refined intrinsic parameters. Independently, a second focal length is calculated using a second algebraic function, also based on the refined parameters. The final camera focal length is then determined by comparing the two computed focal lengths, ensuring robustness and accuracy by cross-verifying the results from different mathematical approaches. This dual-function method helps mitigate errors that may arise from a single algebraic model, improving the reliability of the calibration process. The system integrates with broader camera calibration frameworks, where intrinsic parameters like focal length, principal point, and lens distortion coefficients are initially estimated. The refined estimates are then used to compute the focal length through the described algebraic functions, enhancing the overall calibration accuracy. This approach is particularly useful in applications requiring high-precision camera modeling, such as autonomous vehicles, robotics, and metrology.

Claim 4

Original Legal Text

4. The system of claim 1 , wherein the semantic keypoints for the environment and the semantic keypoints for the image in the image sequence each have a fixed location; and wherein the person keypoints for the environment and the person keypoints for the image in the image sequence each have a movable location.

Plain English Translation

This invention relates to a system for analyzing and processing image sequences, particularly for applications in augmented reality, robotics, or computer vision. The system addresses the challenge of accurately mapping and tracking dynamic elements within an environment, such as moving objects or people, while maintaining stable reference points for static features. The system uses semantic keypoints to represent fixed locations in the environment, ensuring consistent spatial references for static objects or structures. These keypoints are derived from the environment and the image sequence, providing a reliable framework for spatial mapping. Additionally, the system employs person keypoints to track dynamic elements, such as individuals, within the environment. Unlike the semantic keypoints, these person keypoints are movable, allowing the system to adapt to changes in position or movement of tracked subjects. The system processes the image sequence by comparing the semantic and person keypoints, enabling accurate alignment between the static and dynamic elements. This dual-keypoint approach enhances the system's ability to distinguish between fixed and moving objects, improving tracking accuracy and spatial awareness. The fixed semantic keypoints ensure stability in the environment's representation, while the movable person keypoints allow for real-time adjustments to dynamic changes. This combination is particularly useful in applications requiring precise interaction between static and moving elements, such as augmented reality overlays or autonomous navigation.

Claim 5

Original Legal Text

5. The system of claim 1 , wherein decomposing the homography into a matrix comprises decomposing the homography into an extrinsic parameter matrix and wherein determining the camera parameter based on the matrix comprises determining a camera pose based on the extrinsic parameter matrix.

Plain English Translation

This invention relates to computer vision systems for determining camera parameters, specifically focusing on decomposing a homography matrix to extract camera pose information. The problem addressed is the need for accurate and efficient estimation of camera parameters, such as pose, from homography transformations derived from image correspondences. Homography is a mathematical representation of a planar perspective transformation between two images, often used in tasks like image stitching, object tracking, and 3D reconstruction. However, extracting meaningful camera parameters from homography requires decomposition into interpretable components. The system decomposes a homography matrix into an extrinsic parameter matrix, which encodes the camera's position and orientation relative to a reference plane. From this decomposition, the camera's pose—including rotation and translation—is determined. This approach leverages the fact that homography can be factorized into intrinsic and extrinsic components, where the extrinsic parameters describe the camera's viewpoint. By isolating these parameters, the system enables precise pose estimation without requiring additional calibration or sensor data. The method is particularly useful in applications where real-time or low-latency camera tracking is needed, such as augmented reality, robotics, and autonomous navigation. The decomposition process may involve numerical optimization or algebraic factorization techniques to ensure robustness against noise and measurement errors. The resulting camera pose can then be used for further tasks like scene reconstruction, object localization, or motion tracking.

Claim 6

Original Legal Text

6. The system of claim 1 , wherein the refined extrinsic parameter estimates are represented as an extrinsic parameter matrix, wherein the refined intrinsic parameter estimates are represented as an intrinsic parameter matrix, and wherein determining the camera parameter based on the on the refined intrinsic parameter estimates and the refined extrinsic parameter estimates comprises: determining a rotation matrix based on the extrinsic parameter matrix; determining a translation matrix based on the extrinsic parameter matrix; orthogonalizing the rotation matrix using a singular value decomposition; and determining the camera parameter based on applying non-linear least-squares minimization to the orthogonalized rotation matrix and the translation matrix.

Plain English Translation

This invention relates to camera calibration, specifically refining intrinsic and extrinsic parameters to improve accuracy in computer vision applications. The system processes raw parameter estimates to generate refined intrinsic and extrinsic parameters, which are represented as matrices. The extrinsic parameter matrix is used to derive a rotation matrix and a translation matrix. The rotation matrix undergoes orthogonalization via singular value decomposition to correct any deviations from orthogonality. The refined camera parameters are then determined by applying non-linear least-squares minimization to the orthogonalized rotation matrix and the translation matrix. This process enhances the precision of camera calibration, which is critical for applications such as 3D reconstruction, augmented reality, and robotic vision. The method ensures that the rotation matrix remains mathematically valid while optimizing the translation vector to minimize errors in the calibration process. The system improves upon traditional calibration techniques by systematically refining both intrinsic (e.g., focal length, principal point) and extrinsic (e.g., position, orientation) parameters, leading to more accurate camera models.

Claim 7

Original Legal Text

7. The system of claim 1 , wherein generating the homography based on the matching of each of the semantic keypoints for the image to the respective semantic keypoint for the environment and the matching of each of the person keypoints for the image to the respective person keypoint for the environment comprises: generating a first homography based on the matching of each of the semantic keypoints for the image to the respective semantic keypoint for the environment; projecting the semantic keypoints for the image and the person keypoints for the image to a coordinate system of the environment, associating each projected semantic keypoint for the image with a nearest semantic keypoint for the environment based on the coordinate system of the environment; associating each projected person keypoint for the image with a nearest person keypoint for the environment based on the coordinate system of the environment; and generating a second homography based on the associations of the projected semantic keypoints for the image with the semantic keypoints for the environment and the projected person keypoints for the image with the person keypoints for the environment.

Plain English Translation

The invention relates to a system for generating a homography between an image and an environment model using semantic and person keypoints. The system addresses the challenge of accurately aligning an image with a pre-existing environment model, particularly in scenarios where traditional feature-based methods fail due to dynamic or occluded elements. The system first generates a first homography by matching semantic keypoints (e.g., objects, landmarks) between the image and the environment model. These keypoints are then projected into the environment's coordinate system, and each projected semantic keypoint is associated with the nearest corresponding keypoint in the environment. Similarly, person keypoints (e.g., human poses) in the image are projected and matched to the nearest person keypoints in the environment. A second homography is then generated based on these refined associations. The final homography combines the first and second homographies to improve alignment accuracy, particularly in dynamic environments where both static and moving elements (e.g., people) are present. This approach enhances the robustness of image-to-environment registration by leveraging both semantic and person-based correspondences.

Claim 8

Original Legal Text

8. The system of claim 1 , wherein each of the person keypoints for the environment corresponds to a projection of a center of mass of a person onto a ground plane of the environment.

Plain English Translation

The system involves tracking and analyzing human movement within an environment using keypoint-based detection. The technology addresses challenges in accurately monitoring human positions and movements in dynamic settings, such as crowded spaces or surveillance areas, where traditional methods may fail to provide precise or reliable data. The system detects and tracks keypoints representing key anatomical landmarks of individuals within the environment. Each keypoint corresponds to a projection of a person's center of mass onto the ground plane, ensuring accurate spatial positioning. This projection method improves tracking consistency by reducing errors caused by body posture variations or occlusions. The system may also include additional features, such as real-time monitoring, collision detection, or behavioral analysis, by leveraging the precise keypoint data. The use of center-of-mass projections enhances the system's ability to maintain accurate tracking even when individuals are partially obscured or in complex poses. This approach is particularly useful in applications like crowd management, safety monitoring, or human-robot interaction, where precise positional data is critical. The system may integrate with other sensors or cameras to refine tracking accuracy further.

Claim 9

Original Legal Text

9. A non-transitory computer-readable medium storing computer-executable instructions for: training a neural network to locate and identify keypoints in an image sequence corresponding to keypoints in an environment, wherein each keypoint of the keypoints in the environment has a known location; matching each keypoint of the keypoints in the image sequence to a respective keypoint of the keypoints in the environment; generating a homography for each image in the image sequence based on the matching of the keypoints in the image sequence to the keypoints in the environment; and determining a camera parameter based on the homography.

Plain English Translation

This invention relates to computer vision techniques for camera pose estimation using neural networks. The problem addressed is accurately determining camera parameters, such as position and orientation, from image sequences by identifying and matching keypoints between images and known environmental landmarks. The solution involves training a neural network to detect and locate keypoints in an image sequence that correspond to predefined keypoints in a physical environment, where each environmental keypoint has a known spatial location. The trained network matches image keypoints to their corresponding environmental counterparts. A homography is then computed for each image in the sequence based on these matched keypoints, representing the geometric transformation between the image plane and the environment. Finally, camera parameters, such as extrinsic parameters (position and orientation), are derived from the computed homographies. This approach enables robust camera localization by leveraging deep learning for keypoint detection and geometric constraints for pose estimation, improving accuracy in applications like augmented reality, robotics, and autonomous navigation.

Claim 10

Original Legal Text

10. The non-transitory computer-readable medium of claim 9 , wherein the keypoints in the image sequence and the keypoints in the environment comprise semantic keypoints having fixed locations and person keypoints having movable locations, wherein the person keypoints are determined using a tracking system.

Plain English Translation

This invention relates to computer vision and augmented reality, specifically improving the alignment of virtual content with real-world environments by tracking both fixed and movable keypoints. The problem addressed is the difficulty in accurately overlaying digital objects onto dynamic scenes where some elements (like people) move independently of the static environment. The system uses a non-transitory computer-readable medium storing instructions for processing an image sequence. It identifies keypoints in the image sequence and in the environment, categorizing them into semantic keypoints (fixed locations, such as walls or furniture) and person keypoints (movable locations, such as individuals). The person keypoints are tracked using a dedicated tracking system, allowing the system to distinguish between static and dynamic elements. This differentiation enables more precise alignment of virtual content, ensuring it remains correctly positioned relative to fixed structures while adapting to the movement of people. The tracking system may use techniques like pose estimation or object detection to monitor person keypoints over time. The overall approach enhances augmented reality applications by maintaining accurate spatial relationships in mixed reality environments.

Claim 11

Original Legal Text

11. The non-transitory computer-readable medium of claim 10 , wherein generating the homography for each image in the image sequence based on the matching of the keypoints in the image sequence to the keypoints in the environment comprises: generating a first homography based on the matching of each of the semantic keypoints of the keypoints in the image sequence to a respective semantic keypoint of the keypoints in the environment; projecting the semantic keypoints of the keypoints in the image sequence and the person keypoints of the keypoints in the image sequence to a coordinate system of the environment; associating each projected semantic keypoint of the keypoints in the image sequence with a nearest semantic keypoint of the keypoints in the environment based on the coordinate system of the environment; associating each projected person keypoint of the keypoints in the image sequence with a nearest person keypoint of the keypoints in the environment based on the coordinate system of the environment; and generating a second homography based on the associations of the projected semantic keypoints of the keypoints in the image sequence with the semantic keypoints of the keypoints in the environment and the projected person keypoints of the keypoints in the image sequence with the person keypoints of the keypoints in the environment.

Plain English Translation

This invention relates to computer vision techniques for generating homographies in image sequences, particularly for aligning images with a known environment. The problem addressed involves accurately mapping keypoints from an image sequence to corresponding keypoints in a predefined environment, which is essential for applications like augmented reality, robotics, and autonomous navigation. The method involves generating a homography, a mathematical transformation that aligns images, by matching keypoints in the image sequence to keypoints in the environment. The keypoints are categorized into semantic keypoints (e.g., objects, landmarks) and person keypoints (e.g., human poses). A first homography is generated by matching semantic keypoints between the image sequence and the environment. The semantic and person keypoints from the image sequence are then projected into the environment's coordinate system. Each projected semantic keypoint is associated with the nearest semantic keypoint in the environment, and similarly, projected person keypoints are matched to the nearest person keypoints in the environment. A second homography is then generated based on these associations, refining the alignment between the image sequence and the environment. This dual-homography approach improves accuracy by leveraging both semantic and person-based keypoint correspondences. The technique enhances the robustness of image-to-environment mapping in dynamic or complex scenes.

Claim 12

Original Legal Text

12. The non-transitory computer-readable medium of claim 10 , wherein each of the person keypoints in the environment corresponds to a projection of a center of a mass of a person onto a ground plane of the environment.

Plain English Translation

This invention relates to computer vision and human pose estimation, specifically improving the accuracy of detecting and tracking individuals in an environment. The problem addressed is the difficulty in precisely locating people within a 3D space using 2D image data, particularly when multiple individuals are present or when occlusions occur. The solution involves projecting the center of mass of each person onto a ground plane, creating a set of keypoints that represent their positions in the environment. These keypoints are used to enhance tracking and interaction analysis by providing a stable reference point for each individual, reducing errors caused by partial visibility or overlapping poses. The system processes input data, such as images or video frames, to identify and map these keypoints, ensuring consistent tracking even in dynamic or cluttered scenes. The method improves upon existing pose estimation techniques by incorporating a physical model of human mass distribution, leading to more reliable spatial localization. This approach is particularly useful in applications like surveillance, augmented reality, and autonomous navigation, where accurate human positioning is critical. The invention ensures that the keypoints are derived from a non-transitory computer-readable medium, enabling real-time or offline processing of the data.

Claim 13

Original Legal Text

13. The non-transitory computer-readable medium of claim 9 , wherein each of the semantic keypoints of the keypoints in the environment is associated with one of an intersection of two or more lines configured in the environment, an endpoint of a line configured in the environment, or a corner formed by two or more lines configured in the environment.

Plain English Translation

This invention relates to computer vision systems for analyzing environments with structured lines, such as road networks or architectural layouts. The problem addressed is the need for precise semantic keypoint detection in environments where lines form intersections, endpoints, or corners, which are critical for navigation, mapping, or autonomous systems. The system processes sensor data (e.g., images or LiDAR) to identify keypoints in the environment. Each keypoint is semantically labeled based on its geometric role: either an intersection where two or more lines meet, an endpoint where a line terminates, or a corner where lines form an angular junction. These semantic keypoints enable more accurate environmental modeling by distinguishing between different types of geometric features. The method involves extracting lines from the environment, analyzing their intersections, endpoints, and corners, and assigning semantic labels to keypoints accordingly. This enhances applications like autonomous vehicle path planning, robotic navigation, or urban infrastructure mapping by providing structured, interpretable data about the environment's geometric layout. The approach improves upon prior methods by explicitly categorizing keypoints, reducing ambiguity in feature interpretation.

Claim 14

Original Legal Text

14. The non-transitory computer-readable medium of claim 9 , wherein determining the camera parameter based on the homography comprises: decomposing the homography into intrinsic parameter estimates and extrinsic parameter estimates; refining images in the image sequence by applying outlier rejection and particle filtering to the images in the image sequence to generate refined intrinsic parameter estimates and refined extrinsic parameter estimates; and determining the camera parameter based on the refined intrinsic parameter estimates and the refined extrinsic parameter estimates.

Plain English Translation

This invention relates to computer vision techniques for estimating camera parameters from an image sequence. The problem addressed is the accurate determination of intrinsic (e.g., focal length, lens distortion) and extrinsic (e.g., position, orientation) camera parameters, which are essential for tasks like 3D reconstruction, augmented reality, and autonomous navigation. Existing methods often struggle with noise, outliers, and dynamic scenes, leading to inaccurate parameter estimates. The invention improves upon prior art by decomposing a homography matrix—derived from image correspondences—into initial intrinsic and extrinsic parameter estimates. These estimates are then refined through a two-step process. First, outlier rejection is applied to discard erroneous image correspondences that could distort the parameter estimates. Second, particle filtering is used to iteratively refine the estimates by modeling the probabilistic relationships between image features and camera parameters. The refined intrinsic and extrinsic parameters are then combined to determine the final camera parameters, resulting in higher accuracy and robustness in dynamic or noisy environments. This approach is particularly useful in applications requiring precise camera calibration, such as robotics, autonomous vehicles, and augmented reality systems, where real-time and reliable parameter estimation is critical. The method leverages statistical techniques to enhance the reliability of homography-based camera parameter estimation, addressing limitations in traditional calibration methods.

Claim 15

Original Legal Text

15. The non-transitory computer-readable medium of claim 14 , wherein the refined extrinsic parameter estimates are represented as an extrinsic parameter matrix, wherein the refined intrinsic parameter estimates are represented as an intrinsic parameter matrix, and wherein determining the camera parameter based on the on the refined intrinsic parameter estimates and the refined extrinsic parameter estimates comprises: determining a rotation matrix based on the extrinsic parameter matrix; determining a translation matrix based on the extrinsic parameter matrix; orthogonalizing the rotation matrix using a singular value decomposition; and determining the camera parameter based on applying non-linear least-squares minimization to the orthogonalized rotation matrix and the translation matrix.

Plain English Translation

This invention relates to computer vision techniques for refining camera parameters, specifically intrinsic and extrinsic parameters, to improve accuracy in camera calibration. The problem addressed is the need for precise camera parameter estimation, which is critical for applications like 3D reconstruction, augmented reality, and robotics. The invention involves refining extrinsic and intrinsic parameter estimates, which are represented as matrices. The extrinsic parameter matrix includes rotation and translation components, while the intrinsic parameter matrix contains internal camera characteristics like focal length and distortion coefficients. The refinement process involves decomposing the extrinsic parameter matrix into a rotation matrix and a translation matrix. The rotation matrix is then orthogonalized using singular value decomposition (SVD) to ensure numerical stability. Finally, the camera parameters are determined by applying non-linear least-squares minimization to the orthogonalized rotation matrix and the translation matrix. This approach enhances the accuracy of camera calibration by systematically refining both intrinsic and extrinsic parameters, ensuring reliable and precise camera parameter estimation for various computer vision applications.

Claim 16

Original Legal Text

16. A computer-implemented data-processing method for camera pose estimation, the method comprising: determining semantic keypoints for an environment; determining semantic keypoints for an image; generating a plurality of random camera poses; for each random camera pose of the plurality of random camera poses: generating projected semantic keypoints for the image based on the semantic keypoints for the environment and the random camera pose, determining an error value for the random camera pose based on a comparison of the projected semantic keypoints for the image and corresponding semantic keypoints for the image; and assigning a weight to the random camera pose based on the error value.

Plain English Translation

This invention relates to computer-implemented methods for estimating the pose of a camera in an environment using semantic keypoints. The problem addressed is accurately determining the position and orientation of a camera relative to a known environment, which is crucial for applications like augmented reality, robotics, and autonomous navigation. Traditional methods often rely on geometric features, which can be unreliable in dynamic or feature-poor environments. The method involves first identifying semantic keypoints in the environment, which are distinctive points associated with meaningful objects or structures (e.g., corners of a building, edges of furniture). Similarly, semantic keypoints are extracted from an input image captured by the camera. The method then generates multiple random camera poses, each representing a hypothetical position and orientation of the camera. For each random pose, the method projects the environment's semantic keypoints onto the image plane to simulate how they would appear from that pose. The projected keypoints are compared to the actual keypoints in the image, and an error value is calculated based on their alignment. The random pose is then weighted according to this error, with lower errors indicating higher likelihood of being the correct pose. This process iterates over many random poses to refine the estimate, ultimately identifying the most probable camera pose. The approach leverages semantic features for robustness in varied environments, improving accuracy over purely geometric methods.

Claim 17

Original Legal Text

17. The computer-implemented data-processing method of claim 16 , wherein determining the error comprises determining a distance between the projected semantic keypoint for the image and the corresponding semantic keypoint of the semantic keypoints for the image.

Plain English Translation

This invention relates to computer-implemented data processing methods for analyzing images, specifically focusing on error determination in semantic keypoint projection. Semantic keypoints are critical features in images that represent meaningful points or regions, such as object boundaries or anatomical landmarks. The method addresses the challenge of accurately projecting these keypoints from a source domain (e.g., a 3D model or another image) onto a target image, where misalignment or projection errors can degrade performance in applications like object recognition, medical imaging, or augmented reality. The method involves comparing a projected semantic keypoint—derived from a source domain and mapped onto the target image—with the corresponding semantic keypoint directly identified in the target image. The error is quantified by measuring the distance between these two keypoints, which can be Euclidean distance in pixel space or another metric depending on the application. This distance measurement helps assess the accuracy of the projection, enabling corrections or refinements to improve alignment. The technique is particularly useful in scenarios where precise keypoint correspondence is essential, such as in medical imaging for diagnostic accuracy or in robotics for object manipulation. By minimizing projection errors, the method enhances the reliability of downstream tasks that depend on accurate semantic keypoint localization.

Claim 18

Original Legal Text

18. The computer-implemented data-processing method of claim 16 , further comprising: for each random camera pose of the plurality of random camera poses: transforming the random camera pose based on the weight assigned to the random camera pose, and determining whether to retain the random camera pose based on the weight assigned to the random camera pose.

Plain English Translation

This invention relates to computer-implemented methods for optimizing camera poses in data processing, particularly for applications like 3D reconstruction, rendering, or scene analysis. The problem addressed is efficiently selecting and refining camera poses to improve computational efficiency and accuracy in tasks requiring multiple viewpoints. The method involves generating a plurality of random camera poses, each representing a potential viewpoint for capturing or processing a scene. Each pose is assigned a weight based on factors such as coverage, redundancy, or relevance to the scene. For each random camera pose, the method transforms the pose according to its assigned weight, adjusting its position or orientation to enhance its contribution to the overall scene analysis. The method then evaluates whether to retain the pose based on its weight, filtering out less relevant or redundant poses to optimize computational resources. This process ensures that only the most valuable camera poses are retained, improving efficiency and accuracy in applications like 3D modeling, virtual reality, or autonomous navigation. The weighted transformation and selective retention of poses help reduce redundancy and enhance the quality of the resulting data.

Claim 19

Original Legal Text

19. A non-transitory computer-readable medium storing computer-executable instructions for: determining semantic keypoints for an environment; determining semantic keypoints for an image; determining person keypoints for the environment; determining person keypoints for the image; generating a plurality of random camera poses; for each random camera pose of the plurality of random camera poses: generating projected semantic keypoints for the environment based on the semantic keypoints for the image and the random camera pose, generating projected person keypoints for the environment based on the person keypoints for the image and the random camera pose, determining an error value for the random camera pose based on a comparison of the projected semantic keypoints for the environment and corresponding semantic keypoints for the environment and a comparison of the projected person keypoints for the environment and corresponding person keypoints for the environment, and assigning a weight to the random camera pose based on the error value.

Plain English Translation

This invention relates to computer vision and augmented reality, specifically to techniques for estimating camera pose by aligning semantic and person keypoints between an environment and an image. The problem addressed is accurately determining the position and orientation of a camera in a 3D environment using both semantic and human pose information to improve robustness in real-world applications. The method involves analyzing an environment and an image to extract semantic keypoints (e.g., object landmarks) and person keypoints (e.g., human joint positions). Multiple random camera poses are generated, and for each pose, the system projects the image keypoints onto the environment. The projected keypoints are compared to the environment keypoints, and an error value is calculated based on the alignment of both semantic and person keypoints. Each camera pose is then weighted according to its error value, allowing the system to identify the most accurate pose. This approach enhances pose estimation by leveraging both static environmental features and dynamic human poses, improving accuracy in scenarios where traditional methods may fail. The technique is particularly useful in augmented reality, robotics, and autonomous navigation where precise camera localization is critical.

Claim 20

Original Legal Text

20. The non-transitory computer-readable medium of claim 19 , wherein determining the error comprises determining a distance between a projected person keypoint for the environment and a corresponding person keypoint for the environment.

Plain English Translation

The invention relates to computer vision systems for analyzing human poses in an environment, particularly for detecting errors in pose estimation. The system captures images or video of a person in an environment and processes the data to identify keypoints representing key anatomical landmarks of the person. These keypoints are then projected onto a 3D model of the environment to determine their spatial relationships. The system compares the projected keypoints with corresponding keypoints derived directly from the environment to identify discrepancies, which are used to quantify errors in pose estimation. This comparison helps assess the accuracy of the pose reconstruction by measuring the distance between the projected and actual keypoints. The method improves the reliability of pose estimation in applications such as augmented reality, robotics, and human-computer interaction by providing a quantitative measure of error. The system may also use additional data, such as depth information or sensor inputs, to refine the keypoint projections and enhance error detection. The invention addresses challenges in accurately mapping human poses in dynamic environments, ensuring more precise and robust pose tracking.

Patent Metadata

Filing Date

Unknown

Publication Date

March 24, 2020

Inventors

Leonardo Citraro
Pablo Márquez Neila
Stefano Savarè
Vivek Jayaram
Charles Xavier Quentin Dubout
Felix Constant Marc Renaut
Andres Michael Levering Hasfura
Horesh Beny Ben Shitrit
Pascal Fua

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “DATA PROCESSING SYSTEMS FOR REAL-TIME CAMERA PARAMETER ESTIMATION” (10600210). https://patentable.app/patents/10600210

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10600210. See llms.txt for full attribution policy.

DATA PROCESSING SYSTEMS FOR REAL-TIME CAMERA PARAMETER ESTIMATION