Patentable/Patents/US-11276190
US-11276190

Active image depth prediction

PublishedMarch 15, 2022
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

An active depth detection system can generate a depth map from an image and user interaction data, such as a pair of clicks. The active depth detection system can be implemented as a recurrent neural network that can receive the user interaction data as runtime inputs after training. The active depth detection system can store the generated depth map for further processing, such as image manipulation or real-world object detection.

Patent Claims
20 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method comprising: storing, on a user device, a first neural network and a second neural network, the first neural network in use outputting into the second neural network; identifying, on the user device, an image depicting an environment; receiving, by the user device, an ordinal pair indicating a direction of depth in the environment depicted in the image; generating, on the user device, an initial depth map from the image using the first neural network; generating, on the user device, an updated depth map by inputting the received ordinal pair and the initial depth map into the second neural network; and storing the updated depth map.

Plain English Translation

This invention relates to depth estimation in images using neural networks on a user device. The problem addressed is the challenge of accurately determining depth information from a single image, particularly when relying solely on monocular cues. Traditional methods often produce depth maps with errors, especially in ambiguous regions where depth perception is unclear. The solution involves a two-stage neural network system running on a user device. The first neural network processes an input image depicting an environment and generates an initial depth map. This depth map represents estimated distances of objects in the image from the camera. The second neural network refines this initial depth map by incorporating user-provided directional feedback. The user inputs an ordinal pair—a simple directional cue (e.g., "left-right" or "near-far")—indicating the relative depth of objects in the image. The second neural network adjusts the initial depth map based on this feedback, producing an updated depth map that better reflects the true depth relationships in the scene. The refined depth map is then stored on the device for further use. This approach leverages both automated depth estimation and human input to improve accuracy, making it suitable for applications like augmented reality, robotics, or 3D modeling where precise depth information is critical. The system operates entirely on the user device, ensuring privacy and reducing reliance on external processing.

Claim 2

Original Legal Text

2. The method of claim 1 , further comprising: generating a modified image by modifying the image using the updated depth map.

Plain English Translation

This invention relates to image processing techniques for enhancing or modifying images using depth information. The problem addressed is the need to accurately modify images based on depth data to achieve realistic visual effects, such as depth-based adjustments, object isolation, or scene reconstruction. The method involves generating a modified image by altering an original image using an updated depth map. The depth map provides spatial information about the image, indicating the relative distances of objects or surfaces from a reference point. By applying modifications to the image based on this depth data, the method ensures that changes are applied in a way that preserves or enhances the perceived depth and realism of the scene. The process may include steps such as generating an initial depth map from the image, refining or updating the depth map to improve accuracy, and then using this updated depth map to guide modifications. These modifications can include adjustments to color, lighting, focus, or other visual properties, applied selectively based on depth to create effects like depth-of-field blurring, depth-based color grading, or virtual object insertion. The method ensures that modifications align with the spatial structure of the scene, avoiding artifacts that could occur from uniform or depth-agnostic processing. This approach is useful in applications such as photography, film production, augmented reality, and computer vision, where depth-aware image manipulation is required for realistic and visually coherent results.

Claim 3

Original Legal Text

3. The method of claim 1 wherein the second neural network is a recurrent neural network.

Plain English Translation

A method for improving the accuracy of a machine learning system involves using a recurrent neural network (RNN) as a second neural network to process sequential data. The system first uses a primary neural network to generate an initial output from input data. The RNN then processes this output, leveraging its ability to handle temporal dependencies, to refine or transform the result. This approach is particularly useful in applications where data has a sequential or time-dependent structure, such as natural language processing, time-series forecasting, or speech recognition. The RNN's recurrent connections allow it to maintain contextual information across steps, enhancing the system's ability to capture long-range dependencies and improve overall performance. By combining the primary network with the RNN, the method achieves more accurate and context-aware predictions compared to using a single network. The RNN may be configured with various architectures, such as long short-term memory (LSTM) or gated recurrent units (GRU), to optimize performance for specific tasks. This hybrid approach addresses limitations in traditional neural networks that lack the ability to model sequential data effectively.

Claim 4

Original Legal Text

4. The method of claim 3 , wherein the recurrent neural network is trained to implement an alternating direction method of multipliers scheme.

Plain English Translation

A system and method for training a recurrent neural network (RNN) to solve optimization problems using an alternating direction method of multipliers (ADMM) scheme. The method addresses the challenge of efficiently training RNNs for complex optimization tasks by integrating ADMM, a widely used algorithm for constrained optimization, directly into the neural network architecture. The RNN is structured to decompose the optimization problem into smaller subproblems, which are solved iteratively while enforcing consistency through dual variables and penalty terms. This approach improves convergence and stability compared to traditional training methods, particularly for problems with coupled constraints or large-scale data. The RNN's recurrent structure allows it to handle sequential or time-dependent optimization tasks, making it suitable for applications in control systems, signal processing, and machine learning. The ADMM scheme is embedded within the RNN's training process, enabling the network to learn optimal parameters while respecting problem-specific constraints. This method enhances the RNN's ability to generalize across different optimization scenarios and reduces the need for manual tuning of hyperparameters. The system is designed to be scalable, allowing it to handle high-dimensional optimization problems efficiently.

Claim 5

Original Legal Text

5. The method of claim 3 , wherein the first neural network is a trained neural network and the second neural network is configured to receive the ordinal pair as constraints after training of the trained neural network.

Plain English Translation

This invention relates to neural network systems for processing ordinal data, addressing the challenge of incorporating ordinal constraints into trained neural networks without retraining. Ordinal data consists of ordered categories, such as rankings or ratings, where the relative order between categories is meaningful. The invention involves a system with two neural networks: a first, pre-trained neural network and a second neural network that receives ordinal constraints as input. The first neural network processes input data and generates outputs, while the second neural network enforces ordinal relationships by receiving these outputs along with ordinal constraints. The constraints define the expected order between categories, ensuring the system respects the inherent structure of ordinal data. This approach allows the integration of ordinal constraints into an existing neural network without requiring full retraining, improving efficiency and adaptability. The system is particularly useful in applications like ranking, recommendation systems, or any domain where ordered relationships between data points are critical. By dynamically applying constraints, the invention ensures that the neural network's outputs align with known ordinal relationships, enhancing accuracy and reliability in decision-making processes.

Claim 6

Original Legal Text

6. The method of claim 1 , further comprising: receiving, by the user device, at least one additional ordinal pair, each additional ordinal pair indicating an additional direction of depth in the environment depicted in the image.

Plain English Translation

This invention relates to a method for enhancing depth perception in images by processing ordinal pairs that indicate directional depth information. The method involves analyzing an image to determine depth relationships within the depicted environment. Specifically, it receives ordinal pairs, where each pair represents a directional depth relationship between objects or regions in the image. For example, an ordinal pair may indicate that one object is in front of or behind another object in the environment. The method processes these pairs to construct a depth map or depth-aware representation of the scene, improving spatial understanding for applications like augmented reality, 3D modeling, or computer vision tasks. Additionally, the method can receive further ordinal pairs to refine or expand the depth information, allowing for more accurate or detailed depth mapping. This approach enables systems to infer depth without requiring explicit depth sensors or complex stereo vision techniques, relying instead on interpretable directional cues. The technique is particularly useful in scenarios where depth information is ambiguous or partially available, providing a lightweight yet effective way to enhance depth perception in images.

Claim 7

Original Legal Text

7. The method of claim 1 , further comprising: generating, on the user device, the image using an image sensor of the user device; and displaying the generated image on a display device of the user device.

Plain English Translation

This invention relates to image capture and display systems, specifically addressing the need for seamless integration of image generation and display on user devices. The method involves capturing an image using an integrated image sensor on a user device, such as a smartphone or tablet, and immediately displaying the captured image on the device's display screen. The image sensor, which may include a camera module, is used to generate the image data, which is then processed and rendered on the display device of the same user device. This approach eliminates the need for external devices or additional steps, providing a streamlined user experience. The method ensures real-time feedback by displaying the captured image without delay, enhancing usability for applications like photography, video calls, or augmented reality. The system may also include preprocessing steps to optimize image quality before display, such as adjusting brightness, contrast, or applying filters. The invention is particularly useful in portable devices where compactness and efficiency are critical, ensuring high-quality image capture and display within a single integrated system.

Claim 8

Original Legal Text

8. The method of claim 7 , wherein receiving the ordinal pair comprises receiving a first point on the image and a second point on the image while the generated image is displayed on the display device.

Plain English Translation

This invention relates to image processing and user interaction techniques for selecting and manipulating points within a displayed image. The problem addressed is the need for an efficient and intuitive way to specify ordinal pairs of points in an image, which can be used for various applications such as image analysis, object tracking, or user interface interactions. The method involves displaying a generated image on a display device and receiving an ordinal pair of points from a user. The ordinal pair consists of a first point and a second point selected on the image while it is being displayed. The selection process allows the user to interactively mark two distinct locations within the image, which can then be processed further. This interaction may be used to define regions of interest, establish reference points, or perform spatial comparisons within the image. The method ensures that the points are captured in a specific order, which may be important for subsequent processing steps that rely on the sequence of the selected points. The technique enhances user experience by providing a straightforward way to input spatial data into an image processing system.

Claim 9

Original Legal Text

9. The method of claim 2 , further comprising: identifying, using the updated depth map, a background area of the image.

Plain English Translation

This invention relates to image processing techniques for enhancing depth perception in images, particularly for identifying and processing background areas. The method involves generating or updating a depth map of an image, which represents the spatial arrangement of objects within the scene. The depth map is then used to distinguish between foreground and background regions. By analyzing the updated depth map, the system identifies a background area, which can be processed separately from the foreground to improve visual effects, such as depth-based blurring, compositing, or segmentation. The depth map may be derived from stereo imaging, structured light, or other depth-sensing technologies. The background identification step allows for selective adjustments, such as applying depth-of-field effects or background replacement, while preserving the integrity of the foreground objects. This technique is useful in applications like virtual reality, augmented reality, and computational photography, where accurate depth information is critical for realistic rendering and post-processing. The method ensures that background regions are accurately isolated, enabling advanced image manipulation while maintaining natural-looking results.

Claim 10

Original Legal Text

10. The method of claim 9 , wherein the modified image is generated by applying an image effect to the background area of the image.

Plain English Translation

This invention relates to image processing techniques for enhancing or modifying images, particularly focusing on background areas. The method involves generating a modified image by applying an image effect to the background area of an original image. The background area is first identified, typically by segmenting the image to distinguish between foreground and background regions. Once the background is isolated, an image effect is applied to it, such as blurring, color adjustment, or stylization, while preserving the foreground content. This technique is useful in applications like portrait photography, where the background is often modified to emphasize the subject or achieve a desired aesthetic. The method ensures that the applied effect is confined to the background, avoiding unintended alterations to the foreground. The process may involve machine learning-based segmentation or traditional image processing algorithms to accurately detect and process the background. The result is a visually enhanced image where the background is transformed while maintaining the integrity of the foreground elements. This approach is particularly valuable in digital photography, video editing, and graphic design, where background manipulation is a common requirement. The invention provides a way to automate or streamline background modifications, reducing manual effort and improving consistency in image editing workflows.

Claim 11

Original Legal Text

11. A system comprising: one or more processors of a machine; and a memory storing instructions that, when executed by at least one processor among the one or more processors, cause the machine to perform operations comprising: storing, in the memory, a first neural network and a second neural network, the first neural network in use outputting into the second neural network; identifying, in the memory, an image depicting an environment; receiving an ordinal pair indicating a direction of depth in the environment depicted in the image; generating an initial depth map from the image using the first neural network; generating an updated depth map by inputting the received ordinal pair and the initial depth map into the second neural network; and storing the updated depth map in the memory.

Plain English Translation

The system relates to depth estimation in images using neural networks, addressing the challenge of accurately determining depth information from 2D images. Depth estimation is crucial for applications like autonomous navigation, 3D reconstruction, and augmented reality, where understanding spatial relationships in an environment is essential. Traditional methods often struggle with depth ambiguity, particularly in scenes with limited texture or repetitive patterns. The system employs two neural networks to improve depth estimation accuracy. The first neural network processes an input image depicting an environment and generates an initial depth map, which represents the estimated distances of objects within the scene. The second neural network refines this initial depth map by incorporating additional directional information provided as an ordinal pair, which indicates the depth direction in the environment. The ordinal pair helps resolve ambiguities in the initial depth map, such as distinguishing between foreground and background objects. The refined depth map is then stored for further use. By combining the outputs of two neural networks and integrating directional cues, the system enhances the accuracy and reliability of depth estimation, making it more robust for real-world applications. The approach leverages machine learning to improve spatial understanding in images, addressing limitations of single-network depth estimation methods.

Claim 12

Original Legal Text

12. The system of claim 11 , wherein the operations further comprise: generating a modified image by modifying the image using the updated depth map.

Plain English Translation

This invention relates to image processing systems that enhance visual content by adjusting depth information. The problem addressed is the need to improve image realism or artistic effects by dynamically modifying depth data and applying those changes to the original image. The system includes a depth estimation module that analyzes an input image to generate an initial depth map, representing spatial relationships between objects in the scene. A depth adjustment module then updates this depth map based on user inputs, algorithmic corrections, or predefined parameters. The system further processes the modified depth map to generate a new image where visual elements are altered according to the updated depth information. This may include changes in perspective, scaling, or depth-based effects like blurring or lighting adjustments. The invention ensures that modifications are applied coherently across the image, maintaining natural or stylized visual consistency. The system is particularly useful in applications like virtual reality, 3D content creation, and computational photography, where accurate or creatively adjusted depth perception is critical. The key innovation lies in the dynamic generation and application of depth-based modifications to enhance image quality or artistic expression.

Claim 13

Original Legal Text

13. The system of claim 11 wherein the first neural network is a trained neural network and the second neural network is configured to receive the ordinal pair as constraints after training of the trained neural network.

Plain English Translation

This invention relates to a machine learning system that uses two neural networks to process data, particularly for tasks involving ordinal relationships. The system addresses the challenge of incorporating ordinal constraints into neural network training, which is important for applications where the order of data points matters, such as ranking, sequence prediction, or structured data analysis. The system includes a first neural network that is pre-trained on a dataset to learn general patterns. A second neural network is then configured to receive ordinal pairs—pairs of data points with a defined order—as constraints after the first neural network has completed its training. The second neural network uses these constraints to refine its outputs, ensuring that the relationships between data points align with the ordinal structure. This approach allows the system to leverage pre-trained knowledge while enforcing specific ordering rules, improving accuracy in tasks where ordinal relationships are critical. The system may be applied in various domains, such as recommendation systems, natural language processing, or time-series forecasting, where maintaining the correct order of elements is essential. By separating the training of the first neural network from the application of ordinal constraints in the second, the system achieves flexibility and efficiency in handling structured data.

Claim 14

Original Legal Text

14. The system of claim 11 , wherein the system comprises a display device, and wherein receiving the ordinal pair comprises receiving a first point on the image and a second point on the image while the image is displayed on the display device.

Plain English Translation

This invention relates to a system for processing image data, specifically for receiving and interpreting user input related to an image displayed on a display device. The system addresses the challenge of accurately capturing and utilizing user-defined points within an image to facilitate further analysis or manipulation. The system includes a display device that presents an image to a user. The user interacts with the image by selecting a first point and a second point on the displayed image. These points are received as an ordinal pair, representing a sequence of selections that may define a region, a line, or another geometric relationship within the image. The system processes these points to enable subsequent operations, such as measuring distances, identifying objects, or adjusting image parameters based on the user's input. The display device ensures that the image is visible to the user during the selection process, allowing for real-time interaction. The system may further include components for capturing the user's input, such as a touchscreen, mouse, or other input device, and processing the coordinates of the selected points. The ordinal pair of points can be used in various applications, including image editing, medical imaging, or computer vision tasks, where precise user-defined inputs are necessary for accurate results. The system enhances user interaction by providing a straightforward method for specifying points within an image, improving the efficiency and accuracy of image-based workflows.

Claim 15

Original Legal Text

15. A method comprising: identifying, on a user device, an image depicting an environment; receiving, by the user device, an ordinal pair indicating a direction of depth in the environment depicted in the image; generating a depth map by inputting the received ordinal pair into a depth engine running on the user device; and storing the depth map.

Plain English Translation

This invention relates to computer vision and depth estimation in images, addressing the challenge of generating accurate depth maps from single images without requiring specialized hardware or extensive computational resources. The method operates on a user device, such as a smartphone or tablet, to estimate depth in an environment depicted in an image. The process begins by capturing or selecting an image of the environment. The user then provides an ordinal pair, which represents a direction of depth within the image, such as specifying a point in the image that is closer or farther than another point. This ordinal pair is input into a depth engine running locally on the user device, which processes the information to generate a depth map. The depth map is then stored for further use, such as in augmented reality applications, 3D modeling, or scene understanding. The depth engine may employ machine learning models or geometric algorithms to infer depth based on the provided ordinal pair and image features. This approach reduces reliance on stereo cameras or LiDAR sensors, making depth estimation accessible on standard devices. The method ensures real-time or near-real-time processing by leveraging on-device computation, enhancing privacy and reducing latency compared to cloud-based solutions.

Claim 16

Original Legal Text

16. The method of claim 15 , wherein the depth engine comprises a neural network.

Plain English Translation

A system and method for depth estimation in images or video frames uses a depth engine to analyze input data and generate depth information. The depth engine processes the input data to determine depth values for pixels or regions within the input data, enabling applications such as 3D reconstruction, augmented reality, or autonomous navigation. The depth engine may include a neural network, which is trained to infer depth from input features such as stereo images, monocular images, or other sensor data. The neural network may be configured to output depth maps, disparity maps, or other depth-related representations. The system may also include preprocessing steps to enhance input data quality, such as noise reduction or feature extraction, and post-processing steps to refine depth estimates, such as smoothing or interpolation. The depth engine may operate in real-time or near-real-time, depending on computational constraints. The method may further include calibration or optimization steps to improve accuracy, such as adjusting network parameters or incorporating additional sensor inputs. The system is designed to handle varying environmental conditions, lighting variations, and different types of input data to provide robust depth estimation.

Claim 17

Original Legal Text

17. The method of claim 15 , further comprising: generating a modified image by modifying the image using the depth map; and displaying the modified image on a display device of the user device.

Plain English Translation

This invention relates to image processing and display systems, specifically for enhancing user interaction with images by incorporating depth information. The technology addresses the problem of static, two-dimensional image display by dynamically modifying images based on depth data to create more immersive or interactive visual experiences. The method involves capturing or obtaining an image and generating a depth map that represents the spatial relationships of objects within the image. The depth map is used to modify the original image, such as by adjusting lighting, perspective, or other visual effects to simulate three-dimensional depth. The modified image is then displayed on a user device, such as a smartphone, tablet, or augmented reality headset, to provide an enhanced viewing experience. The depth map may be generated using various techniques, including stereo imaging, structured light, or depth sensors. The modifications applied to the image can include depth-based lighting adjustments, parallax effects, or interactive user controls that allow manipulation of the image based on depth information. The system may also include user input mechanisms, such as gestures or touch inputs, to further customize the displayed image. This approach improves traditional image display by introducing dynamic depth-based effects, making images more engaging and interactive for users. The technology is applicable in fields such as augmented reality, virtual reality, photography, and multimedia applications.

Claim 18

Original Legal Text

18. The method of claim 15 , further comprising: receiving, by the user device, at least one additional ordinal pair, each additional ordinal pair indicating an additional direction of depth in the environment depicted in the image.

Plain English Translation

This invention relates to a method for enhancing depth perception in images, particularly for virtual or augmented reality applications. The method addresses the challenge of accurately representing three-dimensional (3D) spatial relationships in two-dimensional (2D) images, which is critical for immersive user experiences. The core technique involves analyzing an image to identify ordinal pairs that define directional depth cues within the depicted environment. Each ordinal pair consists of two points in the image, where the relative positions of these points indicate a direction of depth. By processing these ordinal pairs, the system can infer spatial relationships and generate a more accurate 3D representation of the scene. The method further includes receiving additional ordinal pairs, which provide supplementary depth information. These additional pairs refine the depth mapping by indicating further directions of depth, improving the overall accuracy and richness of the 3D reconstruction. The technique is particularly useful in applications where precise depth perception is essential, such as virtual reality navigation, augmented reality overlays, or 3D modeling. The method dynamically adjusts to new depth data, ensuring that the reconstructed environment remains consistent and realistic as more information is gathered.

Claim 19

Original Legal Text

19. The method of claim 15 , further comprising: generating, on the user device, the image using an image sensor of the user device; and displaying the generated image on a display device of the user device, wherein receiving the ordinal pair comprises receiving a first point on the image and a second point on the image while the image is displayed on the display device, and wherein the order of receipt of the first and second points indicates a relative depth order between the first and second points in the depth direction.

Plain English Translation

This invention relates to a method for determining depth information from a two-dimensional image using user input. The problem addressed is the lack of depth perception in standard 2D images, which limits applications in fields like augmented reality, 3D modeling, and user interface design. The solution involves a user interacting with an image displayed on a device to indicate depth relationships between points. The method begins by capturing an image using an image sensor on a user device, such as a smartphone or tablet. The image is then displayed on the device's screen. The user interacts with the image by selecting two points in sequence, where the order of selection indicates which point is closer in depth. For example, selecting a point first and then a second point implies the first point is closer to the viewer than the second. This ordinal pair of points is processed to infer depth relationships, allowing the system to reconstruct a 3D representation or apply depth-based effects to the image. The technique leverages user input to overcome limitations of passive depth estimation methods, such as those relying solely on image texture or lighting. By incorporating explicit user feedback, the method improves accuracy and adaptability across different scenes and lighting conditions. The approach is particularly useful in applications where depth information is needed but traditional depth sensors or stereo imaging are unavailable.

Claim 20

Original Legal Text

20. The method of claim 15 , further comprising: identifying, using the depth map, a background area of the image; and generating a modified image by applying an image effect to the background area of the image.

Plain English Translation

This invention relates to image processing, specifically enhancing images by applying effects to background areas. The method involves analyzing an image to generate a depth map, which represents the spatial relationships between objects in the scene. The depth map is used to distinguish between foreground and background regions. Once the background area is identified, an image effect is applied to it, such as blurring, color adjustment, or other visual modifications, while preserving the foreground details. This technique improves image composition by directing focus to the subject while creatively altering the background. The method may also include additional steps like adjusting the depth map to refine background detection or applying multiple effects based on depth variations. The approach is useful in photography, video editing, and computer vision applications where background enhancement is desired.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

April 27, 2020

Publication Date

March 15, 2022

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Active image depth prediction” (US-11276190). https://patentable.app/patents/US-11276190

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/US-11276190. See llms.txt for full attribution policy.

Active image depth prediction