Patentable/Patents/US-20260099207-A1

US-20260099207-A1

Host, Object-Based Operation System and Method

PublishedApril 9, 2026

Assigneenot available in USPTO data we have

Technical Abstract

A host is described herein. The host includes a storage circuit and a processor. The storage circuit is configured to store a program code. The processor is coupled to the storage circuit and configured to access the program code to execute: obtaining an environment image of an environment around a user; identifying one or more objects in the environment based on the environment image; performing a hand tracking to determine a hand track of a hand of the user; determining one or more pointing periods of the one or more objects based on the hand track; determining one of the one or more objects as a target object based on the one or more pointing periods; and performing an object-based operation based on the target object.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

a storage circuit, configured to store a program code; and obtaining an environment image of an environment around a user; identifying one or more objects in the environment based on the environment image; performing a hand tracking to determine a hand track of a hand of the user; determining one or more pointing periods of the one or more objects based on the hand track; determining one of the one or more objects as a target object based on the one or more pointing periods; and performing an object-based operation based on the target object. a processor, coupled to the storage circuit and configured to access the program code to execute: . A host, comprising:

claim 1 determining a longest pointing period out of the one or more pointing periods; determining an object of the one or more objects corresponding to the longest pointing period as the target object. . The host according to, wherein the processor is further configured to access the program code to execute:

claim 1 determining that whether a pointing period of the one or more pointing periods is greater than a predetermined threshold period; and in response to the pointing period being greater than the predetermined threshold period, determining an object of the one or more objects corresponding to the pointing period as the target object. . The host according to, the processor is further configured to access the program code to execute:

claim 1 the one or more objects comprises a first object, a second object, and a third object, the hand points to the first object, the second object, and the third object in order, and in response to a second pointing period corresponding to the second object being greater than a first pointing period corresponding to the first object and a third pointing period corresponding to the third object, determining the second object as the target object. the processor is further configured to access the program code to execute: . The host according to, wherein

claim 1 . The host according to, wherein the object-based operation is an artificial intelligence query.

claim 5 in response to receiving query content of the artificial intelligence query from the user, enabling the camera. . The host according to, wherein the processor is further configured to access the program code to execute:

claim 1 obtaining a hand tracking video of the hand tracking; obtaining a target frame of the hand tracking video as a target image based on the pointing period; and performing the object-based operation based on the target image. . The host according to, wherein the processor is further configured to access the program code to execute:

claim 7 determining a frame in the middle of the pointing period corresponding to the target object as the target frame; determining that whether the target object is at least partly blocked by the hand in the target frame; and in response to the target object being at least partly blocked by the hand, determining a frame right before or after the pointing period corresponding to the target object as the target frame. . The host according to, wherein the processor is further configured to access the program code to execute:

claim 7 cropping the target object of the target image as a region of interest; and performing the object-based operation based on the region of interest. . The host according to, wherein the processor is further configured to access the program code to execute:

claim 7 cropping a target area extending a specific distance from the target object of the target image as a region of interest; and performing the object-based operation based on the region of interest. . The host according to, wherein the processor is further configured to access the program code to execute:

claim 1 determining that whether the hand is in a tagging gesture or not based on the hand tracking; and in response to the hand being in the tagging gesture, assigning a tag to the target object based on the tagging gesture. . The host according to, wherein the processor is further configured to access the program code to execute:

a camera, configured to obtain an environment image of an environment around a user; a display configured to display information about the environment to the user; a storage circuit, configured to store a program code; and obtaining the environment image from the camera; identifying one or more objects in the environment based on the environment image; performing a hand tracking to determine a hand track of a hand of the user; determining one or more pointing periods of the one or more objects based on the hand track; determining one of the one or more objects as a target object based on the one or more pointing periods; and performing an object-based operation based on the target object. a processor, coupled to the storage circuit and configured to access the program code to execute: . An object-based operation system, comprising:

claim 12 determining a longest pointing period out of the one or more pointing periods; determining an object of the one or more objects corresponding to the longest pointing period as the target object. . The object-based operation system according to, wherein the processor is further configured to access the program code to execute:

claim 12 determining that whether a pointing period of the one or more pointing periods is greater than a predetermined threshold period; and in response to the pointing period being greater than the predetermined threshold period, determining an object of the one or more objects corresponding to the pointing period as the target object. . The object-based operation system according to, the processor is further configured to access the program code to execute:

claim 12 the one or more objects comprises a first object, a second object, and a third object, the hand points to the first object, the second object, and the third object in order, and in response to a second pointing period corresponding to the second object being greater than a first pointing period corresponding to the first object and a third pointing period corresponding to the third object, determining the second object as the target object. the processor is further configured to access the program code to execute: . The object-based operation system according to, wherein

claim 12 . The object-based operation system according to, wherein the object-based operation is an artificial intelligence query.

claim 16 in response to receiving query content of the artificial intelligence query from the user, enabling the camera. . The object-based operation system according to, wherein the processor is further configured to access the program code to execute:

claim 12 obtaining a hand tracking video of the hand tracking; obtaining a target frame of the hand tracking video as a target image based on the pointing period; and performing the object-based operation based on the target image. . The object-based operation system according to, wherein the processor is further configured to access the program code to execute:

claim 12 determining that whether the hand is in a tagging gesture or not based on the hand tracking; and in response to the hand being in the tagging gesture, assigning a tag to the target object based on the tagging gesture. . The object-based operation system according to, wherein the processor is further configured to access the program code to execute:

obtaining, through a camera, an environment image of an environment around a user; identifying, through a processor, one or more objects in the environment based on the environment image; performing, through the processor, a hand tracking to determine a hand track of a hand of the user; determining, through the processor, one or more pointing periods of the one or more objects based on the hand track; determining, through the processor, one of the one or more objects as a target object based on the one or more pointing periods; and performing, through the processor, an object-based operation based on the target object. . An object-based operation method, comprising:

Detailed Description

Complete technical specification and implementation details from the patent document.

The disclosure relates to a host; particularly, the disclosure relates to a host, an object-based operation system, and an object-based operation method.

In order to bring an immersive experience to user, technologies related to extended reality (XR), such as augmented reality (AR), virtual reality (VR), and mixed reality (MR) are constantly being developed. AR technology allows a user to bring virtual elements to the real world. VR technology allows a user to enter a whole new virtual world to experience a different life. MR technology merges the real world and the virtual world. Further, to bring a fully immersive experience to the user, visual content, audio content, or contents of other senses may be provided through one or more devices.

The disclosure is direct to a host, an object-based operation system, and an object-based operation method, so as to improve user experience of an object-based operation.

The embodiments of the disclosure provide a host. The host includes a storage circuit and a processor. The storage circuit is configured to store a program code. The processor is coupled to the storage circuit and configured to access the program code to execute: obtaining an environment image of an environment around a user; identifying one or more objects in the environment based on the environment image; performing a hand tracking to determine a hand track of a hand of the user; determining one or more pointing periods of the one or more objects based on the hand track; determining one of the one or more objects as a target object based on the one or more pointing periods; and performing an object-based operation based on the target object.

The embodiments of the disclosure provide an object-based operation system. The object-based operation system includes, a camera, a display, a storage circuit and a processor. The camera is configured to obtain an environment image of an environment around a user. The display is configured to display information about the environment to the user. The storage circuit is configured to store a program code. The processor is coupled to the storage circuit and configured to access the program code to execute: obtaining the environment image from the camera; identifying one or more objects in the environment based on the environment image; performing a hand tracking to determine a hand track of a hand of the user; determining one or more pointing periods of the one or more objects based on the hand track; determining one of the one or more objects as a target object based on the one or more pointing periods; and performing an object-based operation based on the target object.

The embodiments of the disclosure provide an object-based operation method. The object-based operation method includes: obtaining, through a camera, an environment image of an environment around a user; identifying, through a processor, one or more objects in the environment based on the environment image; performing, through the processor, a hand tracking to determine a hand track of a hand of the user; determining, through the processor, one or more pointing periods of the one or more objects based on the hand track; determining, through the processor, one of the one or more objects as a target object based on the one or more pointing periods; and performing, through the processor, an object-based operation based on the target object.

Based on the above, according to the host, the object-based operation system, and the object-based operation method, the object-based operation may be performed easily and conveniently, thereby improving the user experience.

To make the aforementioned more comprehensible, several embodiments accompanied with drawings are described in detail as follows.

Reference will now be made in detail to the present preferred embodiments of the invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the description to refer to the same or like parts.

1 FIG. 1 FIG. 100 1 2 3 4 is a schematic diagram of an object-based operation scenario according to an embodiment of the disclosure. In, an object-based operation scenariomay include an object O, an object O, an object O, an object O, a hand H, a hand track TR, a computer vision CV, and an artificial intelligence (AI) query AIQ. That is, in one embodiment, the object-based operation may be the AI query AIQ. However, this disclosure is not limited thereto. For example, the object-based operation may be storing an object, tagging an object, or other kinds of processing or reactions to the object. For the sake of convenience in explanation, in the following discussion, the AI query AIQ may be used as one exemplary embodiment of the object-based operation, but this disclosure is not limited thereto.

1 FIG. 1 4 3 3 With reference to, a user may be in an environment with a plurality of objects O˜Oand the user would like to know information about a certain object in the environment. In one embodiment, the user may want to know information about the object O. The user may point to the object Owith a hand H of the user and require the information through the AI query AIQ. Further, the AI query AIQ may be performed with the help of the computer vision CV. For example, the computer vision CV may be configured to obtain the hand track TR of the hand H, which may be used to determine an intention of the user by a processor (e.g., an AI).

In one embodiment, the computer vision CV may be implemented as a camera or a sensor. That is, the computer vision CV may be implemented as a complementary metal oxide semiconductor (CMOS) camera, a charge coupled device (CCD) camera, a light detection and ranging (LiDAR) device, a radar, an infrared sensor, an ultrasonic sensor, other similar devices, or a combination of these devices.

3 However, under some circumstances, the user may have performed a gesture before or after aiming at a region of interest (ROI) (i.e., object O). That is, a non-ROI object on its way (e.g. the hand track TR) to the ROI or after the ROI may be aimed instead. In other words, the processor may not be able to determine (i.e., select) which object is the correct ROI.

On the other hand, the user may speak a content of the AI query AIQ to require the information of the ROI. However, time points of the gesture and the AI query AIQ may not be consistent. That is, assuming that the processor uses the image at that time to make judgments after understanding the content of the question, the user must deliberately adjust the timing of speech and gestures, such as pointing at the object, in order to obtain an expected result.

In addition, although utilizing video data may be a solution, huge amount of size of the video data may not only cost huge computing power or energy consumption, but also increase a processing time of the object-based operation.

Therefore, it is the pursuit of people skilled in the art to provide an intuitive and convenient way to perform an object-based operation (e.g., query) with the processor.

2 FIG.A 200 200 200 200 200 is a schematic diagram of a host according to an embodiment of the disclosure. In various embodiments, a hostmay be any smart device and/or computer device. In some embodiments, the hostmay be any electronic device capable of providing reality services (e.g., AR/VR/MR services, or the like). In some embodiments, the hostmay be implemented as an XR device, such as a pair of AR/VR glasses and/or a head-mounted device. In some embodiments, the hostmay be a computer and/or a server, and the hostmay provide the computed results (e.g., AR/VR/MR contents) to other external display device(s), such that the external display device(s) can show the computed results to the user. However, this disclosure is not limited thereto.

2 FIG.A 200 202 204 202 204 In, the hostincludes a storage circuitand a processor. The storage circuitis one or a combination of a stationary or mobile random access memory (RAM), read-only memory (ROM), flash memory, hard disk, or any other similar device, and which records a plurality of modules and/or a program code that can be executed by the processor.

204 202 204 The processormay be coupled with the storage circuit, and the processormay be, for example, a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Array (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like.

204 202 In the embodiments of the disclosure, the processormay access the modules and/or the program code stored in the storage circuitto implement an object-based operation method provided in the disclosure, which would be further discussed in the following.

2 FIG.B 2 FIG.B 2 FIG.A 290 200 206 208 200 is a schematic diagram of an object-based operation system according to an embodiment of the disclosure. In, an object-based operation systemmay include the host, a camera, and a display. Details of the hostmay be referred to the description of, while the details are not redundantly described seriatim herein.

206 204 206 206 In the embodiments of the disclosure, the cameramay be configured to capture an image of the user and the processormay be configured to perform hand tracking of the hand H of the user based on the image. In some embodiments, the cameramay be, for example, a complementary metal oxide semiconductor (CMOS) camera, a charge coupled device (CCD) camera, a light detection and ranging (LiDAR) device, a radar, an infrared sensor, an ultrasonic sensor, other similar devices, or a combination of these devices. In some embodiments, the cameramay be disposed on a head-mounted device, wearable glasses (e.g., AR/VR goggles), an electronic device, other similar devices, or a combination of these devices. However, this disclosure is not limited thereto.

208 208 208 In the embodiments of the disclosure, the displaymay be configured to display information to the user, such as information related to the environment. In some embodiments, the displaymay be, for example, an organic light-emitting diode (OLED) display device, a mini LED display device, a micro LED display device, a quantum dot (QD) LED display device, a liquid-crystal display (LCD) display device, a tiled display device, a foldable display device, an electronic paper display (EPD), other similar devices, or a combination of these devices. In some embodiments, the displaymay be disposed on a head-mounted device, wearable glasses (e.g., AR/VR goggles), an electronic device, other similar devices, or a combination of these devices. However, this disclosure is not limited thereto.

200 200 206 208 In some embodiments, the hostmay further include a communication circuit and the communication circuit may include, for example, a wired network module, a wireless network module, a Bluetooth module, an infrared module, a radio frequency identification (RFID) module, a Zigbee network module, or a near field communication (NFC) network module, but the disclosure is not limited thereto. That is, the hostmay communicate with external device(s) (such as the camera, the display. . . etc.) through either wired communication or wireless communication.

3 FIG. 3 FIG. 300 301 302 is a schematic diagram of an object-based operation scenario according to an embodiment of the disclosure. In, an object-based operation scenarioincludes an operation scenarioand a timing sequence.

301 1 2 3 206 204 206 1 3 204 204 1 3 204 1 3 204 In one embodiment, the operation scenarioincludes the object O, the object O, the object O, the hand H, and the hand track TR. First of all, an environment image of an environment around a user may be obtained through the camera. Further, the processormay be configured to obtain the environment image from the cameraand identify one or more objects O˜Oin the environment based on the environment image. Furthermore, the processormay be configured to perform a hand tracking to determine the hand track TR of the hand H of the user. Next, the processormay be configured to determine one or more pointing periods of the one or more objects O˜Obased on the hand track TR. Moreover, the processormay be configured to determine one of the one or more objects O˜Oas the target object based on the one or more pointing periods. In addition, the processormay be configured to perform the object-based operation (e.g., the AI query AIQ) based on the target object. Details will be explained in detail below.

302 302 1 3 In one embodiment, the timing sequenceincludes time, pose and object. The pose and the object in the timing sequencerespectively represent timing periods of a gesture and an aiming target corresponding to the gesture (e.g., one of the objects O˜O).

301 302 2 2 2 Reference is made to the operation scenarioand the timing sequencetogether. In one embodiment, the user would like to know information about the object O. The user may reach out and point to the object Owith the hand H. For example, the user may move the hand H along the hand track TR for pointing the object O.

302 1 2 3 2 2 2 2 It is noted that, when the user is moving the hand H along the hand track TR, as shown in the timing sequence, the hand H may first point to the object O(e.g., for 0.5 sec), then point to the object O(e.g., for 1 sec), and last point to the object O(e.g., for 0.5 sec). Further, when the user moves the hand H along the hand track TR, no special gestures are made by the hand H first until the hand H is moving close to the object O. Furthermore, when the hand H is moving close to the object O, the hand H may make a predefined gesture (e.g., pointing gesture). Moreover, after the hand H passes the object Oand moving away from the object O, the hand H may make no special gestures again. In addition, the specific gesture (e.g., the pointing gesture) may be configured to trigger the object-based operation (e.g., the AI query AIQ). However, this disclosure is not limited thereto.

2 2 1 3 204 1 3 1 3 1 3 204 204 It is word mentioned that, when the hand H is in the pointing gesture, the hand H may point to the object Ofor the longest period of time (e.g., a pointing direction of the pointing gesture overlaps the object Ofor the longest period of time). In other words, by comparing a pointing period corresponding to each of the objects O˜O, a target object may be determined. A pointing period may be defined as a length of time the pointing gesture is directed at a specific object. For example, when the hand H is moving along the hand track TR, the processormay be configured to determine a start time and an end time of the pointing period corresponding each of the objects O˜O. A timing point of a pointing direction of the hand H starting to overlap each of the objects O˜Omay be determined as the start time and a timing point of a pointing direction of the hand H stopping to overlap each of the objects O˜Omay be determined as the end time. That is to say, the processormay be configured to determine one or more pointing periods of the one or more objects based on the hand track TR. Then, the processormay be configured to determine the target object based on the one or more pointing periods.

In this manner, the object-based operation (e.g., query with AI) may be performed easily and conveniently, thereby improving the user experience.

1 3 1 3 204 204 1 3 In embodiment, the pointing periods corresponding to the objects O˜Omay be compared with each other to determine whether one of the objects O˜Ois the target object or not. That is to say, the processormay be configured to determine a longest pointing period out of the one or more pointing periods. Further, the processormay be configured to determine an object of the one or more objects O˜Ocorresponding to the longest pointing period as the target object.

1 3 1 3 204 204 1 3 In one embodiment, the pointing periods corresponding to the objects O˜Omay be compared with a predetermined threshold period to determine whether one of the objects O˜Ois the target object or not. That is to say, the processormay be configured to determine that whether a pointing period of the one or more pointing periods is greater than a predetermined threshold period. Further, in response to the pointing period being greater than the predetermined threshold period, the processormay be configured to determine an object of the one or more objects O˜Ocorresponding to the pointing period as the target object.

1 3 1 3 1 3 1 2 3 204 In one embodiment, a pointing period corresponding to a second one of the objects O˜Omay be compared with a first one and a third one of the objects O˜O. That is to say, the one or more objects O˜Omay include a first object (e.g., the object O), a second object (e.g., the object O), and a third object (e.g., the object O). Further, the hand H points to the first object, the second object, and the third object in order. Furthermore, in response to a second pointing period corresponding to the second object being greater than a first pointing period corresponding to the first object and a third pointing period corresponding to the third object, the processormay be configured to determine the second object as the target object.

4 FIG. 4 FIG. 3 FIG. 3 FIG. 4 FIG. 4 FIG. 4 FIG. 3 FIG. 400 401 402 is a schematic diagram of an object-based operation scenario according to an embodiment of the disclosure. In, an object-based operation scenarioincludes an operation scenarioand a timing sequence. Compared with, the difference betweenandis thatfurther include the AI query AIQ. For the sake of brevity, similar details inwill not be repeated redundantly herein and may be referred tofor further details.

401 402 2 2 2 Reference is made to the operation scenarioand the timing sequencetogether. In one embodiment, the user would like to know information about the object O. The user may reach out and point to the object Owith the hand H. For example, the user may move the hand H along the hand track TR for pointing the object O. Further, the user may speak query content of the AI query AIQ out to trigger the AI query AIQ.

402 2 206 204 206 206 It is noted that, as shown in the timing sequence, when the user moves the hand H and speak out the query content of the AI query AIQ, the user may speak out the content first, and then point to the target object (e.g., object O). In other words, for the purpose of saving energy, the cameramay be disabled until the user saying the query content out. That is to say, in response to receiving query content of the AI query from the user (e.g., through a microphone, face tracking camera, or a physical/virtual button), the processormay be configured to enable the camera. In this manner, the energy consumption may be decrease and the camerawill be enabled only on the request of the user to protect the user's privacy, thereby improving the user experience.

It is worth mentioned that, instead of utilizing a whole file of a live video for the object-based operation, utilizing only one single key frame for the object-based operation would be more friendly to the computing power, the energy consumption, and the processing time.

206 206 204 1 3 206 1 3 In one embodiment, after the camerais enabled, the cameramay be configured to obtain the environment image and the processormay be configured to identify the object O˜Obased on the environment image. Then, in order to perform the hand tracking and the object-based operation, the cameramay be configured to obtain a live video (which may be also referred to as a hand tracking video). It is noted that, the live video may be also used to identify the objects O˜O. However, this disclosure is not limited thereto.

2 It is worth mentioned that, the key frame for the object-based operation may be determined based on the pointing period. For example, the pointing period corresponding to the target object (e.g., object O) may use one “span” in time. An image of a frame in the span or close to the span may be used to perform the object-based operation. In one embodiment, a frame in the center of the span may be used to perform the object-based operation. In another embodiment, a frame right before the span (e.g., right before a point direction of the hand H starts to overlap target object) may be used to perform the object-based operation. In yet another embodiment, a frame right after the span (e.g., right after a point direction of the hand H stops to overlap target object) may be used to perform the object-based operation. However, this disclosure is not limited thereto.

204 204 204 That is, the processormay be configured to obtain a hand tracking video of the hand tracking. Further, the processormay be configured to obtain a target frame of the hand tracking video as a target image based on the pointing period. For example, depending on system setting or user setting, the target frame may be any one specified frame in, before, or after the pointing period corresponding to the target object. Furthermore, the processormay be configured to perform the object-based operation (e.g. AI query AIQ) based on the target image. In this manner, only the target frame is utilized for the AI query AIQ, thereby becoming more friendly to the computing power, the energy consumption, and the processing time.

5 FIG. 5 FIG. 4 FIG. 5 FIG. 500 is a schematic diagram of an object-based operation scenario according to an embodiment of the disclosure. In, an object-based operation scenarioincludes a target frame F_T and an alternative frame F_A. Reference is made toandtogether. In one embodiment, the target frame F_T may be a frame in the center of the span and the alternative frame F_A may be a frame before the span. However, this disclosure is not limited thereto.

In one embodiment, after the target object O_T is determined, a frame including the target object O_T may be selected from frames of the hand tracking video. For example, a frame that the hand H is right pointing to the target object O_T may be select as a target frame F_T. However, in some embodiment, when the hand H is too closed to the target object O_T or due to a position of the user, part of the target object O_T may be blocked by the hand H in the target frame F_T. For example, as shown in the target frame F_T, lower part of the target object O_T is blocked by the hand H.

204 204 204 In order to obtain full information of the target object O_T, the alternative frame F_A may be selected alternatively. That is, the processormay be configured to determine a frame in the middle of the pointing period corresponding to the target object O_T as the target frame F_T. Further, the processormay be configured to determine that whether the target object O_T is at least partly blocked by the hand H in the target frame F_T. Furthermore, in response to the target object O_T being at least partly blocked by the hand H, the processormay be configured to determine a frame right before or after the pointing period corresponding to the target object O_T (e.g., alternative frame F_A) as the target frame F_T. In this manner, an optimal image may be utilized for the object-based operation, thereby improving the user experience.

6 FIG. 6 FIG. 600 601 602 601 602 1 2 3 2 601 1 602 2 is a schematic diagram of an object-based operation scenario according to an embodiment of the disclosure. In, an object-based operation scenarioincludes a cropping scenarioand a cropping scenario. The cropping scenarioand the cropping scenarioboth include the object O, the object O, and the object O. In one embodiment, the object Omay be the target object O_T. Further, the cropping scenarioincludes a ROI Rand the cropping scenarioinclude a ROI R.

601 Reference is first made to the cropping scenario. After the target object O_T is determined, the target image of the target frame may be selected from the frames of the hand tracking video and utilized as the ROI for the object-based operation. It is noted that, in order to further save the computing power, decease the energy consumption, and/or deceasing the processing time, instead of utilizing the whole image of the target image as the ROI, part of the target image may be utilized as the ROI.

601 2 1 204 204 In one embodiment, as shown in the cropping scenario, only the target object O_T (e.g., object O) may be cropped as the ROI (e.g. the ROI R). That is, the processormay be configured to crop the target object O_T of the target image as the ROI. Further, the processormay be configured to perform the object-based operation based on the ROI.

602 204 204 In another embodiment, as shown in the cropping scenario, not only the target object O_T but also an area near the target object O_T may be cropped together as the ROI. That is, the processormay be configured to cropping a target area extending a specific distance from the target object O_T of the query image as the ROI. The specific distance may be predetermined according to design needs or user's preference. Further, the processormay be configured to perform the object-based operation based on the ROI. In this manner, more computing power, energy, and/or processing time may be saved, thereby improving the user experience.

7 FIG. 7 FIG. 700 701 702 703 is a schematic diagram of an object-based operation scenario according to an embodiment of the disclosure. In, an object-based operation scenarioincludes a tagging scenario, a gesture database, and a tagging method. That is, in one embodiment, the object-based operation may be tagging an object. However, this disclosure is not limited thereto.

701 1 2 3 2 1 1 2 1 3 2 2 204 1 204 1 1 1 202 Reference is first made to the tagging scenario. The environment may include the object O, the object O, and the object O. In one embodiment, the user would like to assign a tag to the object O. For example, the user may perform a tagging gesture Gand the tagging gesture Gpoints to the object O. By comparing the pointing periods corresponding to the objects O˜O, the object Omay be determined as the target object O_T and the object Omay be assigned with the tag. That is, the processormay be configured to determine that whether the hand H is in a tagging gesture Gor not based on the hand tracking. Further, in response to the hand H being in the tagging gesture, the processormay be configured to assign a tag to the target object based on the tagging gesture G. In this manner, the user may be able to assign a tag to an object, for example, the object-based operation may further include a save operation to store the target object O_T (e.g., the ROI R) along with the tag(s) (e.g., the tagging gesture G) in a database or album in the storage circuit, or to send to another device for further processing.

702 702 1 4 1 4 Reference is then made to the gesture database. The gesture databasemay include a plurality of tagging gestures G˜G. That is, by performing different tagging gestures G˜G, the user may assign different tags to a same object or different objects, thereby improving the user experience.

703 703 710 740 710 204 720 204 1 3 730 204 1 4 740 204 Reference is now made to the tagging method. The tagging methodincludes steps S˜S. In the step S, the processormay be configured to determine that whether the user is performing a gesture or not. In the step S, the processormay be configured to determine the ROI based on the pointing periods corresponding to the objects O˜O. In the step S, the processormay be configured to identify that whether the gesture is one of the tagging gestures G˜G. In the step S, the processormay be configured to assign the tag to the ROI (e.g., the target object O_T).

8 FIG. 8 FIG. 800 810 860 is a schematic flowchart of an object-based operation method according to an embodiment of the disclosure. In, an object-based operation methodincludes steps S˜S.

810 204 820 204 1 3 830 204 840 204 1 3 850 204 1 3 860 204 In the step S, the processormay be configured to obtain an environment image of an environment around a user. In the step S, the processormay be configured to identifying one or more objects O˜Oin the environment based on the environment image. In the step S, the processormay be configured to perform a hand tracking to determine the hand track TR of the hand H of the user. In the step S, the processormay be configured to determine one or more pointing periods of the one or more objects O˜Obased on the hand track TR. In the step S, the processormay be configured to determine one of the one or more objects O˜Oas the target object O_T based on the one or more pointing periods. In the step S, the processormay be configured to perform the object-based operation based on the target object O_T. In this manner, the object-based operation may be performed easily and conveniently.

800 1 FIG. 7 FIG. In addition, the implementation details of the object-based operation methodmay be referred to the descriptions oftoto obtain sufficient teachings, suggestions, and implementation embodiments, while the details are not redundantly described seriatim herein.

200 290 800 In summary, according to the host, the object-based operation system, and the object-based operation method, the target object O_T may be determined based on the hand track TR. Therefore, the target object O_T may be determined accurately and easily, thereby improving the user experience.

It will be apparent to those skilled in the art that various modifications and variations can be made to the structure of the present invention without departing from the scope or spirit of the disclosure. In view of the foregoing, it is intended that the present disclosure cover modifications and variations of this invention provided they fall within the scope of the following claims and their equivalents.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06F G06F3/17 G06T G06T7/11 G06V G06V10/25 G06V40/113 G06T2207/20132

Patent Metadata

Filing Date

October 9, 2024

Publication Date

April 9, 2026

Inventors

Chang-Hua Wei

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search