US-12626396-B2

Electronic device and controlling method of electronic device

PublishedMay 12, 2026

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

An electronic device and a control method of an electronic device are provided. The method acquiring a plurality of images through at least one camera, inputting red green blue (RGB) data for each of the plurality of images into a first neural network model to obtain two-dimensional pose information on an object included in the plurality of images, inputting RGB data for at least one image of the plurality of images into a second neural network model to identify whether the object is transparent, if the object is a transparent object, performing stereo matching based on the two-dimensional pose information on each of the plurality of images to obtain three-dimensional pose information on the object, and if the object is an opaque object, acquiring three-dimensional pose information on the object based on one image of the plurality of images and depth information corresponding to the one image.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

. An electronic device comprising:

. The electronic device of, wherein the instructions, when executed by the at least one processor individually or collectively, further cause the electronic device to:

. The electronic device of, wherein, based on the object being an object having symmetry, the first feature points are identified based on a three-dimensional coordinate system in which x-axis or y-axis is perpendicular to the at least one camera.

. The electronic device of, wherein the plurality of images are two images acquired at two different points in time through a first camera among the at least one camera.

. The electronic device of, wherein the plurality of images are two images acquired at same points in time through each of the first camera and a second camera among the at least one camera.

. The electronic device of, wherein the instructions, when executed by the at least one processor individually or collectively, further cause the electronic device to:

. The electronic device of, further comprising:

. The electronic device of, wherein the first neural network model and the second neural network model are included in one integrated neural network model.

. The electronic device of, wherein the two-dimensional pose information includes a bounding box corresponding to each object included in an image through a first neural network model.

. A method performed by an electronic device, the method comprising:

. The method of, wherein identifying transparency of the object comprises:

. The method of, wherein, based on the object being an object having symmetry, the first feature points are identified based on a three-dimensional coordinate system in which x-axis or y-axis is perpendicular to the at least one camera.

. The method of, wherein the plurality of images are two images acquired at two different points in time through a first camera among the at least one camera.

. The method of, wherein the plurality of images are two images acquired at same points in time through each of the first camera and a second camera among the at least one camera.

Detailed Description

Complete technical specification and implementation details from the patent document.

This application is a continuation application, claiming priority under § 365(c), of an International application No. PCT/KR2021/016060, filed on Nov. 5, 2021, which is based on and claims the benefit of a Korean patent application number 10-2020-0147389, filed on Nov. 6, 2020, in the Korean Intellectual Property Office, and of a Korean patent application number 10-2021-0026660, filed on Feb. 26, 2021, in the Korean Intellectual Property Office, the disclosure of each of which is incorporated by reference herein in its entirety.

The disclosure relates to an electronic device and a controlling method of the electronic device. More particularly, the disclosure relates to a device capable of acquiring three-dimensional (3D) pose information of an object included in an image.

A need for a technology for acquiring three-dimensional (3D) pose information on an object included in an image is highlighted recently. More particularly, development of technology for detecting an object included in an image and using 3D pose information for the detected object by using a neural network model, such as a convolutional neural network (CNN) has been accelerated recently.

However, when pose information on an object is acquired based on one image according to the prior art, it is difficult to acquire pose information of an object for which a 3D model has not been established, and particularly, it is difficult to acquire accurate pose information for a transparent object.

In addition, when pose information on an object is acquired based on a stereo camera according to the related art, there are limitations in that a range of distances that may be measured for acquisition of pose information is limited due to a narrow field of view difference between the two cameras, and when the positional relationship between the two cameras is changed, a trained neural network model may not be used with the premise that the positional relationship between the two cameras is fixed, or the like.

The above information is presented as background information only to assist with an understanding of the disclosure. No determination has been made, and no assertion is made, as to whether any of the above might be applicable as prior art with regard to the disclosure.

Aspects of the disclosure is to address at least the above-mentioned problems and/or disadvantages and to provide at least the advantages described below. Accordingly, an aspect of the disclosure is to provide an electronic device capable of acquiring 3D pose information for an object in an efficient manner according to the features of an object included in an image, and a method for controlling the electronic device.

Additional aspects will be set forth in part in the description which follows and, in part, will be apparent from the description, or may be learned by practice of the presented embodiments.

In accordance with an aspect of the disclosure, an electronic device is provided. The electronic device includes at least one camera, a memory, and a processor configured to acquire plurality of images through the least one camera, input red green blue (RGB) data for each of the plurality of images into a first neural network model to acquire two-dimensional pose information on an object included in the plurality of images, input RGB data for at least one image of the plurality of images into a second neural network model to identify whether the object is transparent, based on the object being a transparent object, perform stereo matching based on the two-dimensional pose information on each of the plurality of images to acquire three-dimensional pose information on the object, and based on the object being an opaque object, acquire three-dimensional pose information on the object based on one image of the plurality of images and depth information corresponding to the one image.

The processor may acquire information about transparency of the object through the second neural network model, and identify transparency of the object based on the information about the transparency of the object.

The processor may acquire information on whether the object is symmetrical through the second neural network model, identify whether the object is symmetrical based on the information on whether the object is symmetrical, based on the object being an object having symmetry, convert first feature points included in the two-dimensional pose information into second feature points unrelated to symmetry, and acquire three-dimensional pose information for the object by performing the stereo matching based on the second feature points.

Based on the object being an object having symmetry, the first feature points are identified based on a three-dimensional coordinate system in which x-axis or y-axis is perpendicular to the at least one camera.

The plurality of images are two images acquired at two different points in time through a first camera among the at least one camera.

The plurality of images are two images acquired at same points in time through each of the first camera and the second camera among the at least one camera.

The processor may acquire first location information about a positional relationship between the first camera and the second camera, perform the stereo matching based on the two-dimensional pose information for each of the plurality of images and the first location information.

The electronic device may further include a driver, and the processor may control the driver to change a position of at least one of the first camera and the second camera, acquire second position information about a positional relationship between the first camera and the second camera based on the changed position of the at least one camera, perform the stereo matching based on the two-dimensional pose information for each of the plurality of images and the second location information.

The first neural network model and the second neural network model are included in one integrated neural network model.

In accordance with another aspect of the disclosure, a method of controlling the electronic device is provided. The method of controlling the electronic device includes acquiring plurality of images through at least one camera, inputting RGB data for each of the plurality of images into a first neural network model to acquire two-dimensional pose information on an object included in the plurality of images, inputting RGB data for at least one image of the plurality of images into a second neural network model to identify whether the object is transparent, based on the object being a transparent object, performing stereo matching based on the two-dimensional pose information on each of the plurality of images to acquire three-dimensional pose information on the object, and based on the object being an opaque object, acquiring three-dimensional pose information on the object based on one image of the plurality of images and depth information corresponding to the one image.

The identifying transparency of the object may include acquiring information about transparency of the object through the second neural network model, and identifying transparency of the object based on the information about the transparency of the object.

The identifying symmetry of the object may include acquiring information on whether the object is symmetrical through the second neural network model, and identifying whether the object is symmetrical based on the information on whether the object is symmetrical, and the control method of the electronic device further includes, based on the object being an object having symmetry as a result of identification, converting first feature points included in the two-dimensional pose information into second feature points unrelated to symmetry, and acquiring three-dimensional pose information for the object by performing the stereo matching based on the second feature points.

The plurality of images are two images acquired at two different points in time through a first camera among the at least one camera.

The plurality of images are two images acquired at same points in time through each of the first camera and the second camera among the at least one camera.

The method may further include acquiring first location information about a positional relationship between the first camera and the second camera, performing the stereo matching based on the two-dimensional pose information for each of the plurality of images and the first location information.

The method may further include controlling the driver to change a position of at least one of the first camera and the second camera, acquiring second position information about a positional relationship between the first camera and the second camera based on the changed position of the at least one camera, performing the stereo matching based on the two-dimensional pose information for each of the plurality of images and the second location information.

The first neural network model and the second neural network model are included in one integrated neural network model.

In accordance with another aspect of the disclosure, a non-transitory computer readable recordable medium including a program for executing a control method of an electronic device is provided. The method includes acquiring plurality of images through the least one camera, inputting RGB data for each of the plurality of images into a first neural network model to acquire two-dimensional pose information on an object included in the plurality of images, inputting RGB data for at least one image of the plurality of images into a second neural network model to identify whether the object is transparent, based on the object being a transparent object, performing stereo matching based on the two-dimensional pose information on each of the plurality of images to acquire three-dimensional pose information on the object, and based on the object being an opaque object, acquiring three-dimensional pose information on the object based on one image of the plurality of images and depth information corresponding to the one image.

Other aspects, advantages, and salient features of the disclosure will become apparent to those skilled in the art from the following detailed description, which, taken in conjunction with the annexed drawings, discloses various embodiments of the disclosure.

Throughout the drawings, like reference numerals will be understood to refer to like parts, components, and structures.

The following description with reference to the accompanying drawings is provided to assist in a comprehensive understanding of various embodiments of the disclosure as defined by the claims and their equivalents. It includes various specific details to assist in that understanding but these are to be regarded as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the various embodiments described herein can be made without departing from the scope and spirit of the disclosure. In addition, a descriptions of well-known functions and constructions may be omitted for clarity and conciseness.

The terms and words used in the following description and claims are not limited to the bibliographical meanings, but, are merely used by the inventor to enable a clear and consistent understanding of the disclosure. Accordingly, it should be apparent to those skilled in the art that the following description of various embodiments of the disclosure is provided for illustration purpose only and not for the purpose of limiting the disclosure as defined by the appended claims and their equivalents.

It is to be understood that the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a component surface” includes reference to one or more of such surfaces.

In this specification, expressions, such as “have,” “may have,” “include,” “may include” or the like represent presence of a corresponding feature (for example, components, such as numbers, functions, operations, or parts) and does not exclude the presence of additional feature.

In this disclosure, the expressions “A or B,” “at least one of A and/or B,” or “one or more of A and/or B,” and the like include all possible combinations of the listed items. For example, “A or B,” “at least one of A and B,” or “at least one of A or B” includes (1) at least one A, (2) at least one B, or (3) at least one A and at least one B together.

In this disclosure, the terms “first,” “second,” and so forth are used to describe diverse elements regardless of their order and/or importance, and to discriminate one element from other elements, but are not limited to the corresponding elements.

It is to be understood that an element (e.g., a first element) that is “operatively or communicatively coupled with/to” another element (e.g., a second element) may be directly connected to the other element or may be connected via another element (e.g., a third element).

Alternatively, when an element (e.g., a first element) is “directly connected” or “directly accessed” to another element (e.g., a second element), it may be understood that there is no other element (e.g., a third element) between the other elements.

Herein, the expression “configured to” may be used interchangeably with, for example, “suitable for,” “having the capacity to,” “designed to,” “adapted to,” “made to,” or “capable of” The expression “configured to” does not necessarily mean “specifically designed to” in a hardware sense.

Instead, under some circumstances, “a device configured to” may indicate that such a device can perform an action along with another device or part. For example, the expression “a processor configured to perform A, B, and C” may indicate an exclusive processor (e.g., an embedded processor) to perform the corresponding action, or a generic-purpose processor (e.g., a central processing unit (CPU) or application processor (AP)) that can perform the corresponding actions by executing one or more software programs stored in the memory device.

The term, such as “module,” “unit,” “part”, and so on may refer, for example, to an element that performs at least one function or operation, and such element may be implemented as hardware or software, or a combination of hardware and software. Further, except for when each of a plurality of “modules”, “units”, “parts”, and the like needs to be realized in an individual hardware, the components may be integrated in at least one module or chip and be realized in at least one processor.

It is understood that various elements and regions in the figures may be shown out of scale. Accordingly, the scope of the disclosure is not limited by the relative sizes or spacing drawn from the accompanying drawings.

Hereinafter, with reference to the attached drawings, various example embodiments will be described so that those skilled in the art can easily practice.

is a flowchart illustrating a method for controlling an electronic device according to an embodiment of the disclosure.

are views illustrating a plurality of images and two-dimensional pose information according to various embodiments of the disclosure.

An electronic device according to the disclosure refers to a device capable of acquiring 3D pose information for an object included in an image. More particularly, the electronic device according to the disclosure may acquire 3D pose information in various ways according to features of an object included in an image. For example, the electronic device according to the disclosure may be implemented as a user terminal, such as a smartphone or a tablet personal computer (PC), and may also be implemented as a device, such as a robot. Hereinafter, an electronic device according to the disclosure is referred to as an “electronic device.”

Referring to, an electronic devicemay acquire a plurality of images through at least one camera in operation S.

The electronic devicemay include at least one camera, that is, one or more cameras. When the electronic deviceincludes two or more cameras, a positional relationship between the two or more cameras may be fixed and changed. For example, the electronic devicemay include a camera disposed on the left side of the rear surface of the electronic deviceand a camera disposed on the right side of the rear surface of the electronic device. When the electronic deviceis implemented as a robot, at least one camera may include two cameras disposed on the head and the hand of the robot, that is, a head camera and a hand camera, and in this case, the positional relationship between the two cameras may be changed as the position of at least one of the head camera and the hand camera is changed.

The plurality of images may include the same object, and may be images acquired through one camera or two or more different cameras. Specifically, the plurality of images may be two images acquired at different time points through the first camera. In addition, the plurality of images may be two images acquired at the same time point through each of the first camera and the second camera. In other words, in the disclosure, a plurality of images may be different image frames included in a video sequence acquired through one camera, and may be different image frames according to a result of capturing the same scene through a camera having different views at the same time.

For example, when the electronic deviceaccording to the disclosure is implemented as a robot, the images ofindicate a first image and a second image acquired through a head camera and a hand camera of the robot, respectively. Specifically, referring to the example of, each of the first image and the second image may include an object “wine glass”, and an object “wine glass” may be disposed at different positions with different poses in each of the first image and the second image.

When a plurality of images are acquired, the electronic devicemay input RGB data for each of a plurality of images into a first neural network model to acquire two-dimensional pose information for an object included in the plurality of images in operation S.

Patent Metadata

Filing Date

Unknown

Publication Date

May 12, 2026

Inventors

Unknown

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search