A method for generating an augmented reality image from first and second images, wherein at least a portion of at least one of the first and the second image is captured from a real scene, identifies a confidence region in which a confident determination as to which of the first and second image to render in that region of the augmented reality image can be made, and identifies an uncertainty region in which it is uncertain as to which of the first and second image to render in that region of the augmented reality image. At least one blending factor value in the uncertainty region is determined based upon a similarity between a first colour value in the uncertainty region and a second colour value in the confidence region, and an augmented reality image is generated by combining, in the uncertainty region, the first and second images using the at least one blending factor value.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A method for generating a composite image from first and second images, the composite image, the first image and the second image each comprising a first region and a second region, the method comprising:
2. The method of, wherein at least a portion of at least one of the first image and second image is captured from a real scene.
3. The method of, wherein the first region of the composite image corresponds to the respective first regions of each of the first and second image.
4. The method of, wherein a determination as to which of the first and second images to render can be made for a second region of the composite image, wherein the second region of the composite image corresponds to the respective second regions of each of the first and second image.
5. The method of, wherein the second region of the composite image is a confidence region in which a confident determination as to which of the first and second image to render in that region of the composite image can be made.
6. The method of, wherein the first and second values are colour values.
7. The method of, further comprising:
8. The method of, wherein the first image and the second image each have associated therewith a plurality of colour values and a corresponding plurality of depth values, wherein the method further comprises making said determination as to which of the first image and the second image to render based upon a depth value of the first image and the corresponding depth value of the second image in the second region.
9. The method of, wherein the first image and the second image each have associated therewith a plurality of colour values and a corresponding plurality of depth values, and wherein the first region is identified based upon at least one depth value associated with at least one of the first and the second image, the at least one depth value being derived from a depth value captured from the real scene.
10. The method of, further comprising generating at least one initial blending factor value in a second region based upon said determination and wherein generating the composite image further comprises combining a corresponding colour value of the first image and a corresponding colour value of the second image in the second region using the at least one initial blending factor value.
11. The method of, further comprising identifying the confidence region, wherein identifying the confidence region comprises categorising portions of the confidence region as first confidence regions or second confidence regions, wherein:
12. The method of, wherein determining the at least one blending factor value is further based upon the distance between the position of the first value and the position of the second value.
13. The method of, wherein the first region, of the at least one of the first image and second image that is captured from a real scene, comprises a plurality of sample points and determining the at least one blending factor value further comprises processing, for each of a plurality of sample points in the said first region, that sample point based upon values at a plurality of sample points located in a second region, of the said at least one of the first image and second image, within a predetermined distance of that sample point.
14. The method of, wherein determining at least one blending factor value for the first region of the composite image is based upon a similarity between a colour value in the said first region and at least one corresponding colour value of each of the first image and the second image.
15. The method of, further comprising performing an erosion operation on the second region of the composite image, wherein the erosion operation is configured to re-categorise at least one portion of the second region of the composite image as forming a part of a first region of the composite image.
16. The method of, wherein the first image is a captured image of a real scene and the second image is an image of a virtual object.
17. An image processing system for generating a composite image from first and second images, the composite image, the first and the second image each comprising a first region and a second region, the image processing system comprising:
18. The image processing system ofwherein the second region of the composite image is a confidence region in which a confident determination as to which of the first and second image to render in that region of the composite image can be made, and wherein the image processing system further comprises:
19. A non-transitory computer readable storage medium having stored thereon computer readable instructions that, when executed at a computer system, cause the computer system to generate a composite image from first and second images, the composite image, the first image and the second image each comprise a first region and a second region, the instructions causing the computer system to generate the composite image by:
Complete technical specification and implementation details from the patent document.
This application is a continuation under 35 U.S.C. 120 of application Ser. No. 17/397,092 filed Aug. 9, 2021, now U.S. Pat. No. 11,830,153, which is a continuation of prior application Ser. No. 16/794,041 filed Feb. 18, 2020, now U.S. Pat. No. 11,087,554, which is a continuation of prior application Ser. No. 15/623,690 filed Jun. 15, 2017, now U.S. Pat. No. 10,600,247, which claims foreign priority under 35 U.S.C. 119 from United Kingdom Application No. 1610657.7 filed Jun. 17, 2016, the contents of which are incorporated by reference herein in their entirety.
In augmented reality (AR) systems, a pair of images may be combined so as to create an augmented reality image in which the content from one image appears to be included in the other image. In some arrangements, an image of a virtual object and an image of a real scene are combined so as to generate an augmented reality image in which it appears to the viewer that the virtual object has been included in the real scene. The augmented reality image may be generated by rendering the virtual object within a portion of the captured real scene. When rendering the virtual object in the scene, the relative depth of the virtual object with respect to the depth of the scene is considered to ensure that portions of the virtual object and/or the scene are correctly occluded with respect to one another. By occluding the images in this way, a realistic portrayal of the virtual object within the scene can be achieved.
Techniques for generating an augmented reality image of a scene typically require the generation of an accurate model of the real scene by accurately determining depth values for the objects within the real scene from a specified viewpoint. By generating an accurate model, it is possible to compare depth values and determine portions of the two images to be occluded. Determining the correct occlusion in an augmented reality image may be performed by comparing corresponding depth values for the image of the virtual object and the image of the real scene and rendering, for each pixel of the scene, a pixel using a colour selected from the colour at that pixel in the image of the virtual object or the real scene based upon which image has the smaller depth value with respect to the specified viewpoint, i.e. is closer to the specified viewpoint.
To avoid potential errors with depth measurements, a scene can be scanned from a number of positions to generate an accurate map of the scene. For example, camera tracking may be performed whilst moving a camera around a scene and capturing a number of different scans or images of the scene. However, such processing is time consuming and processor intensive and is not suited to real-time applications, where the position of objects in the scene may vary or where it may be necessary to update the model of the real scene regularly. For example, in video applications where a constant frame rate is required there may be insufficient time between frames to update a scene model.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
One approach for capturing depth information regarding a scene is to make use of a capture device that is configured to capture information relating to both colour and depth, such as an RGBD camera. An RGBD camera is configured to capture Red, Green, and Blue (RGB) colour information as well as depth information, D.
The inventors have recognised that depth information obtained from a single point, for example using such a capture device, may not be complete or the depth information may be imprecise for portions of the captured scene. For example, there may be portions of an image captured by an RGBD camera where a corresponding depth measurement could not have been obtained. This may occur where a surface of an object in the scene is absorptive of the signals used for depth measurement or is positioned at an angle relative to a capture device such that a depth signal is not directed back to a sensor of the capture device with sufficient signal strength for a precise depth measurement to be captured. Similarly, it may be that the depth information is detected but is inaccurate, for example due to signal reflections or interference, which can result in noise in the captured depth measurement.
For time-critical applications, the inventors have recognised that it is sometimes useful to make use of depth data captured at a single point rather than generate a complex model of a scene when generating an augmented reality image. However, the result of errors in the depth information or an absence of depth information for a particular portion of the scene is that, when generating an augmented reality image, erroneous depth comparison results may occur. These erroneous depth comparison results may result in portions of one image being incorrectly rendered or occluded leading to visual artefacts in a resultant rendered augmented reality image.
The present application seeks to address these above problems and to provide an improved approach to generating an augmented reality image.
There is provided a method for generating an augmented reality image from first and second images, wherein at least a portion of at least one of the first and the second image is captured from a real scene, the method comprising: identifying a confidence region in which a confident determination as to which of the first and second image to render in that region of the augmented reality image can be made; identifying an uncertainty region in which it is uncertain as to which of the first and second image to render in that region of the augmented reality image; determining at least one blending factor value in the uncertainty region based upon a similarity between a first colour value in the uncertainty region and a second colour value in the confidence region; and generating an augmented reality image by combining, in the uncertainty region, the first and second images using the at least one blending factor value.
There is provided an augmented reality processing system for generating for generating an augmented reality image from first and second images, wherein at least a portion of at least one of the first and the second image is captured from a real scene, the augmented reality processing system comprising: a confidence identification module arranged to identify a confidence region in which a confident determination as to which of the first and second image to render in that region of the augmented reality image can be made; an uncertainty identification module arranged to identify an uncertainty region in which it is uncertain as to which of the first and second image to render in that region of the augmented reality image; a blend module arranged to determine at least one blending factor value in the uncertainty region based upon a similarity between a first colour value in the uncertainty region and a second colour value in the confidence region; and an image generation module arranged to generate an augmented reality image by combining, in the uncertainty region, the first and second images using the at least one blending factor value.
The first image and the second image may each have associated therewith a plurality of colour values and a corresponding plurality of depth values. The confident determination as to which of the first image and the second image to render based upon a depth value of the first image and the corresponding depth value of the second image in the confidence region may be made as part of the method or processing system. The uncertainty region may be identified based upon at least one depth value associated with at least one of the first and the second image, the at least one depth value being derived from a depth value captured from a real scene. The at least one depth value may be derived from an unreliable or incomplete depth value captured from the real scene. Identifying the uncertainty region may be based on the absolute depth value of the unreliable or incomplete depth value, where the absolute depth value is indicative of an erroneously captured depth value. Identifying the uncertainty region may comprise comparing at least one depth value in the region in the first image with a depth value in a corresponding region of the second image and determining that the difference in compared depth values is below a predetermined threshold.
At least one initial blending factor value in a confidence region may be generated based upon the confident determination and generating the augmented reality image may further comprise combining a corresponding colour value of the first image and a corresponding colour value of the second image in the confidence region using the at least one initial blending factor value. The at least one blending factor value and the at least one initial blending factor value may form part of an alpha matte for combining colour values of the first image and the second image to generate the augmented reality image.
Making the confident determination may be based upon at least one depth value associated with the first image and at least one corresponding depth value associated with the second image. Making the confident determination may be based upon a comparison of at least one depth value associated with a region of the first image with at least one depth value associated with a corresponding region of the second image and wherein the result of the comparison exceeds a predetermined threshold.
Identifying a confidence region further may comprise categorising portions of the confidence region as first confidence regions or second confidence regions, wherein: first confidence regions are confidence regions in which a colour value of the first image is to be rendered in the corresponding region of the augmented reality image; and second confidence regions are confidence regions in which a colour value of the second image is to be rendered in the corresponding region of the augmented reality image. Re-categorising an uncertainty region as either a first confidence region or a second confidence region may be performed prior to determining at least one blending factor value. Re-categorising an uncertainty region as a first confidence region may be based on the uncertainty region being surrounded by a first confidence region. Re-categorising an uncertainty region as a first confidence region may be based upon a determination that confidence regions within a predetermined distance of the uncertainty region are first confidence regions. Re-categorising an uncertainty region as a second confidence region may be based on the uncertainty region being surrounded by a second confidence region. Re-categorising an uncertainty region as a second confidence region based upon a determination that confidence regions within a predetermined distance of the uncertainty region are second confidence regions.
Colour and depth values of at least one of the first and second images from the real scene may be captured using a capture device. Determining the at least one blending factor value may be further based upon the distance between the position of the first colour value and the position of the at least one second colour value. The first colour value and the colour value may be colour values associated with a single image of the first image and the second image. The first colour value and the second colour values may be colour values captured from a real scene.
The uncertainty region may comprise a plurality of sample points and determining the at least one blending factor value may further comprise processing, for each of a plurality of sample points in the uncertainty region, that sample point based upon colour values at a plurality of sample points located in a confidence region within a predetermined distance of that sample point. When processing a sample point in the uncertainty region, a zero weight may be assigned to other sampling points within the predetermined distance of the sampling point that are in an uncertainty region.
Determining the at least one blending factor value for the uncertainty region may comprise applying a cross bilateral filter to each of a plurality of sample points in the uncertainty region based upon: the distance between the position of the first colour value and the position of the at least one second colour value; and the similarity in colour value between the first colour value and the at least one second colour value. The plurality of sample points used in the cross bilateral filter may be identified using a filter kernel and sample points within the filter kernel may be used to determine the at least one blending factor value for the uncertainty region. Comparing the similarity in colour values may comprise comparing the difference in colour for each of a red, a green, and a blue colour component at a sample point with the corresponding colour component at each sample point within the filter kernel that is in the confidence region. The distance between the position of the first colour value and the position of the at least one second colour value may be determined based upon the number of sample points between the first colour value and the at least one second colour value.
Determining at least one blending factor value in the uncertainty region may be based upon a similarity between a colour value in the uncertainty region and at least one corresponding colour value of each of the first image and the second image. Determining at least one blending factor value may be based upon generating at least two error metrics for the uncertainty region, and minimising the error metrics to determine the at least one blending factor value in the uncertainty region. A first error metric may be a gradient metric indicative of gradient changes in blending factor values and a second error metric may be a colour metric indicative of colour similarities between colour values in the uncertainty region and colour values in the confidence region. A plurality of initial blending factor values may be determined and the gradient metric may be determined based upon variations in the plurality of initial blending factor values across an alpha matte.
The colour metric may estimate the probability that a colour value in the uncertainty region forms part of an image of the real scene in front of a virtual object or forms part of the image of the real scene behind a virtual object based on neighbouring colour values. Colour values used in determining the colour metric may be selected by performing a dilation operation on the uncertainty region. The at least two error metrics may be minimised using an iterative method. The colour metric may be formed from fitted Mixture of Gaussian models for each of the part of the real scene in front of a virtual object and the part of the real scene behind a virtual object. The error metrics may be minimised using the Levenberg-Marquardt algorithm to determine the at least one blending factor in the uncertainty region.
An erosion operation may be performed on the confidence region, wherein the erosion operation is configured to re-categorise at least one portion of the confidence region as forming a part of an uncertainty region.
The first image may be a captured image of a real scene and the second image may be an image of a virtual object.
An augmented reality video sequence may be generated from a first video sequence and a further image, the method comprising performing, for a plurality of frames of the video sequence, the above-discussed methods, wherein the first image corresponds to the frame of the first video sequence and the second image corresponds to the further image.
The augmented reality processing system may be embodied in hardware on an integrated circuit. There may be provided a method of manufacturing, at an integrated circuit manufacturing system, an augmented reality processing system. There may be provided an integrated circuit definition dataset that, when processed in an integrated circuit manufacturing system, configures the system to manufacture a augmented reality processing system. There may be provided a non-transitory computer readable storage medium having stored thereon a computer readable description of an integrated circuit that, when processed, causes a layout processing system to generate a circuit layout description used in an integrated circuit manufacturing system to manufacture a augmented reality processing system.
There may be provided an integrated circuit manufacturing system comprising: a non-transitory computer readable storage medium having stored thereon a computer readable integrated circuit description that describes the augmented reality processing system; a layout processing system configured to process the integrated circuit description so as to generate a circuit layout description of an integrated circuit embodying the augmented reality processing system; and an integrated circuit generation system configured to manufacture the augmented reality processing system according to the circuit layout description.
There may be provided computer program code for performing a method as claimed in any preceding claim. There may be provided non-transitory computer readable storage medium having stored thereon computer readable instructions that, when executed at a computer system, cause the computer system to perform the method as claimed in any preceding claim.
The above features may be combined as appropriate, as would be apparent to a skilled person, and may be combined with any of the aspects of the examples described herein.
The accompanying drawings illustrate various examples. The skilled person will appreciate that the illustrated element boundaries (e.g., boxes, groups of boxes, or other shapes) in the drawings represent one example of the boundaries. It may be that in some examples, one element may be designed as multiple elements or that multiple elements may be designed as one element. Common reference numerals are used throughout the figures, where appropriate, to indicate similar features.
The following description is presented by way of example to enable a person skilled in the art to make and use the invention. The present invention is not limited to the embodiments described herein and various modifications to the disclosed embodiments will be apparent to those skilled in the art.
Embodiments will now be described by way of example only.
illustrates an isometric view of a real scenethat is to be the subject of processing by an augmented reality processing system. The sceneis a real three-dimensional space in which real objects,may be positioned. The position and orientation of the objects,within the scenemay be determined in a number of different ways, such as by modelling the scene. For example, it is possible to map the scenewith a laser scan to accurately determine the position of the objects with the scene. Alternatively, one or more images of the scenemay be captured using a capture device (not shown) to obtain depth measurements.
In, a virtual object is not rendered and only a real scene is shown. An augmented reality processing systemmay be configured to select a viewpoint of the scene and to capture a first imageof the scene with respect to that viewpoint. The augmented reality processing systemmay then generate a new image, i.e. an augmented reality image, which is a combination of the first image of the real scene and a second image, which may be an image of one or more virtual objects that are to be visually inserted within the real scene.
The second image may be an image of one or more virtual objects taken from the same viewpoint as the first image. As such, the virtual object or the real objects within the scene may be correctly occluded by the other depending on their relative depths with respect to the selected viewpoint.
illustrates a plan view of the sceneofin which a first objectand a second objectis located. A capture devicemay be positioned relative to the sceneso as to capture a first imageof the scene. Specifically, the capture devicemay be configured to capture depth values and colour values (such as RGB colour values) of the scenefrom the viewpoint of the capture device. The captured depth values are determined relative to the viewpoint. The position of the capture devicemay correspond to the viewpoint from which a second imageof a virtual object is generated and in which a virtual object is rendered. The second imagemay therefore be considered to be a virtual image. The depth values of the second imagemay therefore correspond to those of the first image and are determined with respect to a common viewpoint.
Alternatively, a “virtual” viewpoint may be generated for the first image by interpolating between depth measurements taken from multiple real viewpoints. For example, the capture devicemay obtain two different depth measurements from two different viewpoints and the augmented reality processing system may interpolate between the two depth measurements to obtain depth measurements for the first image that correspond with the depth measurements for the second image. However, for the purposes of describing the following examples, it will be assumed that the viewpoint from which the augmented reality imageis rendered is the same as the position of the capture devicefrom which the colour values and depth values of the scene are captured.
When capturing the depth values of the scene, the capture devicedetermines the distance of the scene from the capture deviceat a plurality of different sampling points across the scene to create an array of depth values. For example, the capture devicemay comprise a first sensorand a second sensor. The first sensoris configured to capture a first imageof the scenecomprising a plurality of first colour values. The captured colour values in the first imagemay be in the form of RGB colour values for a plurality of pixels which combine to represent the scene from the viewpoint of the capture device, for example in an array of pixels each having a red, green, and blue colour component value.
The second sensoris configured to capture depth values from the scene. For example, the second sensormay be an Infra-Red (IR) sensor configured to detect the presence of IR signals. The capture devicemay also include an IR transmitter (not shown) configured to transmit IR signals which are then captured by the second sensor. By measuring the received IR signals, it is possible to make a determination regarding depth information at each of a plurality of sampling points across the scene.
The sampling points at which a depth value is captured may correspond with the points at which colour information is captured. Put another way, portions of the scene at which depth measurements are captured may have a one-to-one correspondence with pixels of an image of the scene captured by the capture device. The depth information may be captured such that it directly corresponds in position to the colour information.
For example, depth information may be obtained for an area of the scene with the same resolution as colour information by the capture device. In some arrangements, depth information may be obtained at a lower resolution than the colour values and thus some degree of interpolation may be required to ensure a correspondence in values. Similarly, the depth information may be at a higher resolution than the colour information. It will be assumed for the purposes of describing the following examples that the resolution of the captured depth values and the captured colour values are the same.
The IR signals transmitted by the capture devicemay be transmitted in a grid and time-of-flight information may be used to determine the depth value at each sampling point captured by second sensor. For example, the second sensormay be configured to detect the phase of the IR signal. In this way, it is the surface of the scene which is closest to the capture device at a particular sampling point which is used to determine the depth value at that sampling point. For example, the face of objectthat is closest to the capturedefines the depth value for sampling points that fall upon that face.
As can be seen from the plan view of scenein, the first objectis located closer to the capture devicethan the second objectin the z dimension. Accordingly, the depth measurements detected by second sensorat sampling points that align with the first objectwill be less than corresponding depth measurements taken at sampling points aligned with the second object, i.e. the first object is closer than the second object. Similarly, for portions of the scenecaptured at sampling points where neither the first objectnor the second objectare present, the measured depth will be determined by the distance of the background of the scenefrom the capture device. In the example of, the background is determined by the rear plane of the scenefurthest from the capture device.
illustrates the relative positions of the firstand secondobjects as seen from a viewpoint of the scene in an x-y plane defined by X-Yat capture device. An example set of depth values are demonstrated inalong dimension y. The captured depth values shown inreflect the depth values captured along line Y-Yas shown in.illustrates a number of sampling points at which depth values were captured. As shown in, three different values are identified by the capture device across these sampling points. A number of lines of depth values across plane X-Ymay be obtained to generate an array of depth values of the scene.
It can be seen that the largest of the three depth values captured by the capture devicealong line Y-Yare captured where neither the first objectnor the second objectis located, for example in the area between the two objects at depth d. Accordingly, the captured depth measurement is based upon the measured depth of the background of the scene. Another measured depth is dwhich corresponds with the depth values determined at sampling points which fall on the surface of first object, i.e. the portion of line Y-Ythat intersects first object. Similarly, depth dcorresponds with sampling points of the depth value that fall on second object. As illustrated in, the captured depth values are discrete values that represent the depth value determined at a sampling point. However, the depth values may correspond with regions of the image rather than individual points.
It will be noted that, in the example ofto, occlusion of the two real objects,does not occur with respect to one another. This is because the two objects do not overlap one another along dimension y. The example oftherefore illustrates an arrangement in which real depth values are captured.
Another example of a different sceneis provided in relation toin which real objects are occluded with respect to one another.
In, thirdand fourthobjects are located within a different three-dimensional scene. Objects,are located within scenesuch that they overlap one another in dimension y, when considered from the viewpoint of the capture deviceat plane X-Y. Since third objectis closer to the capture devicewith respect to dimension z than fourth object, a portion of fourth objectindicated by areais occluded from view in an image of the scenetaken from the viewpoint of capture device.
For example, third objectand fourth objectoverlap in the y dimension at a portion of the respective objects across an area indicated by reference number. Accordingly, depth values obtained by the capture deviceat sampling points in regionare determined based upon the distance of third objectfrom the capture device rather than the distance of fourth object, since the third objectis closer to the viewpoint at the capture devicethan the fourth object, with respect to dimension z. Similarly, the colour values captured by capture deviceover regionwill be the captured colour of the third objectrather than the fourth object.
In this way, a portionof fourth objectthat is located within regionis occluded from the viewpoint at the capture deviceby the portion of third objectthat also falls within region.illustrate depth measurements for scene. It will be appreciated that real objects may be occluded in traditional image capture systems by other objects.
In more detail,illustrates the viewpoint of the capture devicewith respect to real objectsandthrough plane X-Y. The resultant depth value measurements across line Y-Yare shown in. As can be seen from, the depth values for portions of line Y-Ythat are intersected by either the third objector both the thirdand fourth objecttake the depth values of the third object(d), whilst portions of line Y-Ythat are intersected only by the fourth objecttake the depth values of the fourth object(d). As with the arrangement of, the portions of line Y-Ynot intersected by either the third object or the fourth object have a depth value corresponding to the background of the scene (d).
Accordingly, in traditional image capture systems, only colour information relating to real objects in a scene that are not occluded by other real objects is captured by the image sensor. In augmented reality processing systems, it is desirable to re-create this behaviour for arrangements in which virtual objects are to be rendered in a manner that allows the virtual objects to appear to behave in the same manner as a real object to provide added realism to the augmented reality image.
Unknown
October 14, 2025
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.