A processor-implemented method includes obtaining a first motion matrix corresponding to an extended reality (XR) system and a second motion matrix based on a conversion coefficient from an XR system coordinate system into a rolling shutter (RS) camera coordinate system, and projecting an RS color image of a current frame onto a global shutter (GS) color image coordinate system based on the second motion matrix and generating a GS color image of the current frame, wherein the second motion matrix is a motion matrix of a timestamp of a depth image captured by a GS camera corresponding to a timestamp of a first scanline of an RS color image captured by the GS camera.
Legal claims defining the scope of protection, as filed with the USPTO.
. A processor-implemented method, the method comprising:
. The method of, further comprising:
. The method of, wherein the determining of whether to perform the obtaining of the second motion matrix based on the motion information related to the RS camera comprises
. The method of, wherein the depth image captured by the GS camera corresponding to the timestamp of the selected scanline is a GS depth image with a timestamp closest to the timestamp of the selected scanline.
. The method of, further comprising:
. The method of, further comprising:
. The method of, further comprising:
. The method of, further comprising:
. The method of, wherein the projecting of the GS depth image corresponding to the selected scanline of the RS color image of the current frame onto the RS camera coordinate system comprises:
. The method of, wherein the plurality of second scanlines comprises a second scanline corresponding to the selected scanline of the RS color image of the current frame and a second scanline located in a predetermined range of the corresponding second scanline.
. The method of, wherein the first neural network is a convolutional neural network (CNN) and/or the second neural network is a U-network (UNET) neural network.
. The method of, wherein the projecting of the RS color image of the current frame onto the GS color image coordinate system comprises:
. The method of, wherein the determining of the target area comprising the movable target in the GS color image of the current frame comprises:
. The method of, wherein the obtaining of the corrected GS color image of the current frame by replacing the pixel of the target area of the GS color image of the current frame using the pixel corresponding to the target area in the RS color image of the current frame comprises:
. An electronic device comprising:
. The electronic device of, wherein the one or more processors are configured to:
. The electronic device of, wherein, for the determining of whether to perform the obtaining of the second motion matrix based on the motion information related to the RS camera, the one or more processors are configured to:
. The electronic device of, wherein the depth image captured by the GS camera corresponding to the timestamp of the selected scanline is a GS depth image with a timestamp closest to the timestamp of the selected scanline.
. The electronic device of, wherein the one or more processors are configured to:
. The electronic device of, wherein the one or more processors are configured to:
Complete technical specification and implementation details from the patent document.
This application claims the benefit under 35 USC § 119(a) of Chinese Patent Application No. 202211542904.8 filed on Dec. 2, 2022, in the China National Intellectual Property Administration, and Korean Patent Application No. 10-2023-0142046 filed on Oct. 23, 2023, in the Korean Intellectual Property Office, the entire disclosures of which are incorporated herein by reference for all purposes.
The following description relates to an apparatus and method with image processing.
Augmented reality (AR) technology may provide a realistic information experience by adding virtual content to the actual scene in front of a user. In a three-dimensional (3D) space, an AR system may require high-precision real-time processing and understanding on the 3D status of surrounding objects and implement high-quality virtual reality fusion effects in front of the user.
Accurate and fast correction of a rolling shutter (RS) camera image is very important for AR interaction. The RS camera image correction is important to improve the user's experience of an AR scene. The RS camera image correction method according to the related art includes motion inversion estimation, which uses images of multiple frames for depth estimation or correction. However, the computation speed of the method according to the related art is too slow to use in a fast AR scene. Additionally, the method according to the related art assumes that the RS camera image correction is effective, but this assumption may damage the robustness of a downstream task.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
In one or more general aspects, a processor-implemented method includes: obtaining a first motion matrix corresponding to an extended reality (XR) system and a second motion matrix based on a conversion coefficient from an XR system coordinate system into a rolling shutter (RS) camera coordinate system; and projecting an RS color image of a current frame onto a global shutter (GS) color image coordinate system based on the second motion matrix and generating a GS color image of the current frame, wherein the second motion matrix is a motion matrix of a timestamp of a depth image captured by a GS camera corresponding to a timestamp of a first scanline of an RS color image captured by a RS camera.
The method may include determining whether to perform the obtaining of the second motion matrix based on motion information related to an RS camera.
The determining of whether to perform the obtaining of the second motion matrix based on the motion information related to the RS camera may include: determining angular velocity and positional velocity of the RS camera based on pose information of an XR system for the RS color image of the current frame and pose information of an XR system for an RS color image of a previous frame; and performing either one of: determining to perform the obtaining of the second motion matrix in response to either one or both of the angular velocity being greater than a first threshold and the positional velocity being greater than a second threshold; and determining a first optical flow matrix between the RS color image of the current frame and the RS color image of the previous frame in response to the angular velocity not being greater than the first threshold and the positional velocity not being greater than the second threshold, determining to perform the obtaining of the second motion matrix in response to a maximum value of the first optical flow matrix being greater than a third threshold, and determining not to perform the obtaining of the second motion matrix in response to the maximum value of the first optical flow matrix not being greater than the third threshold.
The depth image captured by the GS camera corresponding to the timestamp of the first scanline may be a GS depth image with a timestamp closest to the timestamp of the first scanline.
The method may include: obtaining the first motion matrix and a third motion matrix based on the conversion coefficient from the XR system coordinate system into the RS camera coordinate system; projecting a GS depth image corresponding to a first scanline of the RS color image of the current frame onto the RS camera coordinate system using the third motion matrix based on the RS color image of the current frame and obtaining a GS depth image aligned with the RS color image of the current frame; obtaining a depth feature by inputting the aligned GS depth image to a first neural network; and inputting the depth feature and the GS color image of the current frame to a second neural network and obtaining a corrected GS color image of the current frame, wherein the third motion matrix is a motion matrix for the timestamp of the first scanline.
The method may include: determining a target area comprising a movable target in the GS color image of the current frame; and obtaining a corrected GS color image of the current frame by replacing a pixel of the target area of the GS color image of the current frame using a pixel corresponding to the target area in the RS color image of the current frame.
The method may include: obtaining the first motion matrix and a third motion matrix based on the conversion coefficient from the XR system coordinate system into the RS camera coordinate system; projecting a GS depth image corresponding to a first scanline of the RS color image of the current frame onto the RS camera coordinate system using the third motion matrix based on the RS color image of the current frame and obtaining a GS depth image aligned with the RS color image of the current frame; obtaining a depth feature by inputting the aligned GS depth image to a first neural network; and inputting the depth feature and the corrected GS color image of the current frame to a second neural network and obtaining an improved and corrected GS color image of the current frame, wherein the third motion matrix is a motion matrix for the timestamp of the first scanline.
The method may include: in response to determining not to perform the obtaining of the second motion matrix, obtaining the first motion matrix and a third motion matrix based on the conversion coefficient from the XR system coordinate system into the RS camera coordinate system; projecting a GS depth image corresponding to a first scanline of the RS color image of the current frame onto the RS camera coordinate system using the third motion matrix based on the RS color image of the current frame and obtaining a GS depth image aligned with the RS color image of the current frame; obtaining a depth feature by inputting the aligned GS depth image to a first neural network; and inputting the depth feature and a RS color image of the current frame to a second neural network and obtaining the GS color image of the current frame, wherein the third motion matrix is a motion matrix for the timestamp of the first scanline.
The projecting of the GS depth image corresponding to the first scanline of the RS color image of the current frame onto the RS camera coordinate system may include: projecting a plurality of second scanlines of the GS depth image corresponding to the first scanline onto a GS camera coordinate system for the first scanline of the RS color image of the current frame; and projecting the plurality of second scanlines onto the RS camera coordinate system based on the third motion matrix and obtaining a second scanline aligned with the first scanline.
The plurality of second scanlines may include a second scanline corresponding to the first scanline of the RS color image of the current frame and a second scanline located in a predetermined range of the corresponding second scanline.
The first neural network may be a convolutional neural network (CNN) and/or the second neural network is a U-network (UNET) neural network.
The projecting of the RS color image of the current frame onto the GS color image coordinate system may include: projecting the first scanline onto a GS camera coordinate system for a first scanline of the RS color image of the current frame; and projecting the first scanline onto the GS camera coordinate system and projecting the first scanline onto the GS color image coordinate system based on the second motion matrix and obtaining the GS color image of the current frame.
The determining of the target area comprising the movable target in the GS color image of the current frame may include: projecting a GS depth image corresponding to a first scanline of the RS color image of the current frame onto the RS camera coordinate system based on an RS color image of a previous frame using a third motion matrix and obtaining a GS depth image aligned with the RS color image of the previous frame; determining a second optical flow matrix based on a GS depth image aligned with the RS color image of the current frame and a GS depth image aligned with the RS color image of the previous frame; and determining the target area in the GS color image of the current frame based on a first optical flow matrix and the second optical flow matrix between the RS color image of the current frame and the RS color image of the previous frame, wherein the third motion matrix is obtained based on the first motion matrix and a conversion coefficient from the XR system coordinate system into the RS camera coordinate system, and wherein the GS depth image aligned with the RS color image of the current frame is obtained by projecting the GS depth image corresponding to the first scanline of the RS color image of the current frame onto the RS camera coordinate system based on the RS color image of the current frame using the third motion matrix.
The obtaining of the corrected GS color image of the current frame by replacing the pixel of the target area of the GS color image of the current frame using the pixel corresponding to the target area in the RS color image of the current frame may include: determining the pixel corresponding to the target area among the RS color image of the current frame and the GS color image of the current frame based on the first optical flow matrix; and replacing the pixel of the target area of the GS color image of the current frame with the corresponding pixel.
In one or more general aspects, an electronic device includes: one or more processors configured to: obtain a first motion matrix corresponding to an extended reality (XR) system and a second motion matrix based on a conversion coefficient from an XR system coordinate system into a rolling shutter (RS) camera coordinate system; and project an RS color image of a current frame onto a global shutter (GS) color image coordinate system based on the second motion matrix and generate a GS color image of the current frame, wherein the second motion matrix is a motion matrix of a timestamp of a depth image captured by a GS camera corresponding to a timestamp of a first scanline of an RS color image captured by a RS camera.
The one or more processors may be configured to determine whether to perform the obtaining of the second motion matrix based on motion information related to an RS camera.
For the determining of whether to perform the obtaining of the second motion matrix based on the motion information related to the RS camera, the one or more processors may be configured to: determine angular velocity and positional velocity of the RS camera based on pose information of an XR system for the RS color image of the current frame and pose information of an XR system for an RS color image of a previous frame; and perform either one of determining to perform the obtaining of the second motion matrix in response to the angular velocity being greater than a first threshold or the positional velocity being greater than a second threshold; and determining a first optical flow matrix between the RS color image of the current frame and the RS color image of the previous frame in response to the angular velocity not being greater than the first threshold and the positional velocity not being greater than the second threshold, determine to perform the obtaining of the second motion matrix in response to a maximum value of the first optical flow matrix being greater than a third threshold, and determine not to perform the obtaining of the second motion matrix in response the maximum value of the first optical flow matrix not being greater than the third threshold.
The depth image captured by the GS camera corresponding to the timestamp of the first scanline may be a GS depth image with a timestamp closest to the timestamp of the first scanline.
The one or more processors may be configured to: obtain the first motion matrix and a third motion matrix based on the conversion coefficient from the XR system coordinate system into the RS camera coordinate system; project a GS depth image corresponding to a first scanline of the RS color image of the current frame onto the RS camera coordinate system using the third motion matrix based on the RS color image of the current frame and obtain a GS depth image aligned with the RS color image of the current frame; input the aligned GS depth image to a first neural network and obtain a depth feature; and input the depth feature and the GS color image of the current frame to a second neural network and obtain a corrected GS color image of the current frame, wherein the third motion matrix is a motion matrix for the timestamp of the first scanline.
The one or more processors may be configured to: determine a target area comprising a movable target in the GS color image of the current frame; and obtain a corrected GS color image of the current frame by replacing a pixel of the target area of the GS color image of the current frame using a pixel corresponding to the target area in the RS color image of the current frame.
Other features and aspects will be apparent from the following detailed description, the drawings, and the claims.
Throughout the drawings and the detailed description, unless otherwise described or provided, the same drawing reference numerals will be understood to refer to the same elements, features, and structures. The drawings may not be to scale, and the relative size, proportions, and depiction of elements in the drawings may be exaggerated for clarity, illustration, and convenience.
The following detailed description is provided to assist the reader in gaining a comprehensive understanding of the methods, apparatuses, and/or systems described herein. However, various changes, modifications, and equivalents of the methods, apparatuses, and/or systems described herein will be apparent after an understanding of the disclosure of this application. For example, the sequences of operations described herein are merely examples, and are not limited to those set forth herein, but may be changed as will be apparent after an understanding of the disclosure of this application, with the exception of operations necessarily occurring in a certain order. Also, descriptions of features that are known after an understanding of the disclosure of this application may be omitted for increased clarity and conciseness.
The terminology used herein is for describing various examples only and is not to be used to limit the disclosure. The articles “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. As used herein, the term “and/or” includes any one and any combination of any two or more of the associated listed items. As non-limiting examples, terms “comprise” or “comprises,” “include” or “includes,” and “have” or “has” specify the presence of stated features, numbers, operations, members, elements, and/or combinations thereof, but do not preclude the presence or addition of one or more other features, numbers, operations, members, elements, and/or combinations thereof. The use of the term “may” herein with respect to an example or embodiment (for example, as to what an example or embodiment may include or implement) means that at least one example or embodiment exists where such a feature is included or implemented, while all examples are not limited thereto.
Unless otherwise defined, all terms including technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains and based on an understanding of the disclosure of the present application. It will be further understood that terms, such as those defined in commonly-used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the disclosure of the present application, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
When describing the examples with reference to the accompanying drawings, like reference numerals refer to like components and a repeated description related thereto will be omitted. In the description of examples, detailed description of well-known related structures or functions will be omitted when it is deemed that such description will cause ambiguous interpretation of the present disclosure.
Although terms such as “first,” “second,” and “third”, or A, B, (a), (b), and the like may be used herein to describe various members, components, regions, layers, or sections, these members, components, regions, layers, or sections are not to be limited by these terms. Each of these terminologies is not used to define an essence, order, or sequence of corresponding members, components, regions, layers, or sections, for example, but used merely to distinguish the corresponding members, components, regions, layers, or sections from other members, components, regions, layers, or sections. Thus, a first member, component, region, layer, or section referred to in the examples described herein may also be referred to as a second member, component, region, layer, or section without departing from the teachings of the examples.
Throughout the specification, when a component or element is described as “connected to,” “coupled to,” or “joined to” another component or element, it may be directly (e.g., in contact with the other component or element) “connected to,” “coupled to,” or “joined to” the other component or element, or there may reasonably be one or more other components or elements intervening therebetween. When a component or element is described as “directly connected to,” “directly coupled to,” or “directly joined to” another component or element, there can be no other elements intervening therebetween. Likewise, expressions, for example, “between” and “immediately between” and “adjacent to” and “immediately adjacent to” may also be construed as described in the foregoing.
The phrases “at least one of A, B, and C,” “at least one of A, B, or C,” and the like are intended to have disjunctive meanings, and these phrases “at least one of A, B, and C,” “at least one of A, B, or C,” and the like also include examples where there may be one or more of each of A, B, and/or C (e.g., any combination of one or more of each of A, B, and C), unless the corresponding description and embodiment necessitates such listings (e.g., “at least one of A, B, and C”) to be interpreted to have a conjunctive meaning.
The same name may be used to describe an element included in the examples described above and an element having a common function. Unless otherwise mentioned, the descriptions on the examples may be applicable to the following examples and thus, duplicated descriptions will be omitted for conciseness.
First, the terms and/or symbols of non-limiting examples of the present disclosure are described as follows.
1. A rolling shutter (RS) camera.
A color sensor of the RS camera may capture a color image using an RS method.
The color sensor may work with a time period t_cp (“cp” is shorter than a depth time period.)
A first color image of a video may be captured at timestamp t_c(0). In the RS color camera, t_c(0) is a timestamp of a first scanline of an image and the RS color image is represented as I_c(0).
An r-th scanline of I_c(0) may be captured at timestamp t_c(0)_r=t_c(0)+r×t_row, where r denotes a row index and t_row denotes an exposure time of the color sensor divided by the scanline of the color image.
The r-th scanline of a j-th color image may be captured at time t_c(j)_r=t_c(0)+j×t_cp+r×t_row.
The first scanline may represent the color image captured at timestamp t_c(j) as I_c(j).
The scanline of row r in the color image I_c(j) may be represented as I_c(j)_r.
2. A global shutter (GS) camera.
A depth sensor of the GS camera may capture a depth image using a GS method.
Assume the depth sensor works with a time period t_dp (“dp” is shorter than a depth time period.)
A starting depth image may be captured at timestamp t_d(0).
An i-th depth image may be captured at timestamp t_d(i)=t_d(0)+i×t_dp.
The depth image captured at timestamp t_d(i) may be represented as D_d(i).
Unknown
March 24, 2026
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.