Patentable/Patents/US-20250303272-A1

US-20250303272-A1

Interactive Entertainment System

PublishedOctober 2, 2025

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

The present disclosure describes a method for providing an interactive experience. The method includes illuminating, by an enhancer, an area with light having wavelength invisible to humans. A plurality of image detectors capture at least two images of a person in the area including a portion of the light reflected from the person. A processing element determines a first skeletal feature of the person based on the at least two images. The processing element determines a position characteristic of the first skeletal feature; constructs, from the first skeletal feature, a vector in three dimensions corresponding to the position characteristic; and outputs an interactive effect based on the position characteristic.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

. A method of calibrating a camera, the method comprising:

. The method of, further comprising detecting a reference point in the calibration target in the image.

. The method of, wherein determining the characteristic of the camera comprises determining position information of the camera based on position information of the reference point.

. The method of, further comprising adjusting the position information of the camera based on updated position information of the reference point.

. The method of, wherein the reference point is defined by a checkerboard pattern.

. The method of, wherein the reference point comprises a text element, a machine readable code, a geometric shape, or an area of varying contrast.

. The method of, wherein the actuator is a robot configured to articulate the calibration target relative to the camera.

. The method of, wherein determining the characteristic of the camera comprises determining position information of the camera relative to the calibration target.

. The method of, further comprising adjusting one or more images received from the camera based on the position information of the camera.

. The method of, further comprising:

. A method of calibrating a camera, the method comprising:

. The method of, wherein comparing the images comprises detecting reference points in the images and determining position information for the reference points.

. The method of, wherein determining the characteristic of the camera comprises determining a position of the camera relative to the reference points

. The method of, further comprising constructing a virtual representation of a physical environment based on a first reference point of the reference points as an origin of a coordinate system.

. The method of, further comprising adjusting or calibrating the coordinate system based on a second reference point of the reference points.

. The method of, wherein positioning the calibration target comprises positioning, by the actuator, one of the calibration target or the camera relative to the other of the calibration target or the camera.

. The method of, wherein moving the calibration target or the camera comprises moving the calibration target or the camera automatically by the actuator based on a triggering event.

. The method of, wherein the triggering event comprises a start of a ride or a timed schedule.

. The method of, wherein moving the calibration target or the camera comprises moving the calibration target, and wherein the camera is stationary.

. The method of, wherein moving the calibration target or the camera comprises moving the camera, and wherein the calibration target is stationary.

Detailed Description

Complete technical specification and implementation details from the patent document.

This application is a divisional application of U.S. application Ser. No. 16/832,703, filed Mar. 27, 2020, which claims the benefit of priority pursuant to 35 U.S.C. § 119 (e) of U.S. Provisional Application No. 62/987,214, filed Mar. 9, 2020, the disclosures of which are hereby incorporated by reference herein in their entireties.

Motion capture systems that detect and translate the pose of a person, including the position and motion of the person's figure to a computer for use in rendering virtual characters or other visual effects, are widely used in gaming, entertainment, filmmaking, and other applications. Traditional motion capture systems that detect a person's pose may use markers placed on the person, for instance at joints of limbs, on the skin above facial muscles, clothing, or other locations on, or associated with, a person. The markers are then detected in images by cameras and used to re-create the pose in a virtual space, for example generating an animation of the pose via a computer. The virtual pose of the person may enable the person to interact with a virtual environment, or may be used as the basis for the pose of a virtual character.

However, as these types of traditional systems require a person to wear markers on his or her body and that such markers are accurately positioned, such systems require intensive calibration and setup processes, which limits their application and use. As such, these types of systems may not allow seamless, immersive theatrical, gameplay, theme park ride, or other entertainment experiences.

A method for providing an interactive experience is disclosed. The method includes illuminating, by an enhancer, an area with light having a wavelength invisible to humans. A plurality image detectors receive at least two images of a person in the area including a portion of the light emitted by the enhancer and reflected from the person. A processing element determines a first skeletal feature of the person based on the at least two images. The processing element determines a position characteristic of the first skeletal feature. The processing element constructs, from the first skeletal feature, a vector in three dimensions corresponding to the position characteristic. An interactive effect is outputted based on the position characteristic.

A method of calibrating a camera is disclosed. The method includes positioning a calibration target within a field of view of a camera and capturing, by the camera, an image of the calibration target. A difference between the image of the calibration target and a physical characteristic of the calibration target is determined. A characteristic of the camera is determined based on the difference. The calibration target is moved automatically by an actuator on a predefined path within the field of view of the camera.

A method of calibrating a camera is disclosed. The method includes positioning a calibration target relative to a camera and moving, by an actuator, the calibration target or the camera on a predefined path. Images of the calibration target are captured by the camera and compared. A characteristic of the camera is determined based on the comparison.

A method of calibrating a camera is disclosed. The method includes positioning a calibration target in a physical environment within a field of view of a plurality of cameras. An image of the calibration target is captured by each of the plurality of cameras. A reference point is determined in the image. A respective position of the reference point with respect to each of the plurality of cameras is determined. Position information of the camera is adjusted based on the respective position of the reference point. A virtual representation of the field of view in three dimensions is constructed based on the reference point in the physical environment.

A system for augmenting a user experience is disclosed. The system includes a wearable article adapted to be worn by a person. The wearable article includes a processing element that encodes data related to a virtual characteristic of the wearable article and a transmitter that transmits the encoded data to an entertainment environment. A detector in the entertainment environment detects the transmitted data. A processing element receives the data detected by the transmitter and changes a characteristic of the entertainment environment according to the virtual characteristic of a wearable article.

The present disclosure is related to a system and methods to utilize pose and physical movements to interact with an environment, render effects, or interactive media. The system is able to detect a pose or gesture of a user's figure and includes an image detector to capture images that can be analyzed to determine a pose. A pose may include a position of a figure of a user. For example, a physical pose may include the positions, locations, and/or movements of a figure, including for example, a head, neck, torso, arm, hand, finger, leg, foot, or toes. Some poses are gestures that include a movement or a position of a part of the body meant to express an idea, intent, opinion, or emotion. The system may also include an enhancer such as an illuminator that helps to increase the detectability of a pose by the camera. Utilizing the captured images, the system generates a virtual pose corresponding to the physical pose of a figure in a virtual environment. A virtual pose may be a virtual representation of an aspect of a physical pose of a figure. A virtual pose may be a representation of a physical pose such as a representation stored in a memory, or operated on by a processing element of a computing device. For example, the system may use artificial intelligence and/or machine learning algorithms to detect the locations of the bones and the joints therebetween of a physical pose. Based on the detected locations, the system may create virtual bone objects linked by virtual joint objects about which the bone objects may move. As a pose changes or initiates, such as due to movement of a person, the system captures successive images, updating the position in the virtual space, which can then be used to output various effects or interactions within an entertainment environment (e.g., amusement park ride, virtual reality, game, or the like). For example, the camera may have a sufficiently fast frame rate to capture changes in a pose over time and thus capture position and motion information of the physical pose and apply such position and motion information to the virtual pose.

As a specific example, the system may capture images of poses as a user travels on a moving vehicle through an amusement park attraction, and as the poses change, the system outputs different effects, such as displayed images, or physical effects such as a fountain, corresponding to the changes in poses.

In some instances, the system may include multiple cameras that detect a pose in three dimensional space so that the system can more accurately and quickly generate a virtual pose in a virtual three dimensional space.

The system may include an imager such as a projector or display that displays images on a surface, such as a screen, wall, or other area of the environment. The detected poses may allow the user to interact with objects displayed or presented in the virtual environment. For example, the system may display images of a virtual environment including characters from comic books, cartoons, movies or video game characters, or other types of interactive content objects, elements, or situations that can be varied based on a user's input.

In some examples, the system may include an accessory, such as a wearable article or distinguishing element, that is worn by a user to enhance the interaction between the user and the system. The accessory can be more easily detectable by the system and/or transmit data about its physical or virtual properties to allow different effects, outputs, or the like, based on characteristics of the accessory. For example, the accessory may include light sources that illuminate or are reflective to be more easily detected in images captured by the camera. Additionally or alternatively, the accessory may transmit data to the system including accessory characteristics, e.g., accessory type (hat, gauntlet, wand, ring or the like), color, user information, character association, effect types or virtual abilities, etc., which can be used to generate different effects or outputs in the experience.

In some examples, the system may be self-calibrating. As one example, the system may include a calibration target that assists in calibrating the camera, for instance by assisting in calibrating a relationship between the virtual environment and the physical environment or between multiple cameras. In some examples, the calibration target can be a creative graphic or feature that may serve the dual purpose of enabling camera calibration and adding to the creative aesthetic of the physical environment. The calibration target may be a two dimensional graphic or a textured three dimensional feature or surface.

In some examples, the system may be calibrated by external devices. For example, the calibration target can be mounted on an actuator, such as a robot, mechanical rig, cart, motor, linkage, pneumatic or hydraulic piston, track, movable mount, or the like, that moves the target relative to a camera, such as on a predefined path, to calibrate the camera. In other examples, a camera may be mounted to an actuator that moves the camera relative to a calibration target to calibrate the camera.

illustrates a pose detection system. The pose detection systemincludes camerasandenhancerswithin a physical environment, as well as one or more computing devices in communication with the camerasand the enhancersMore or fewer cameras and/or enhancers may be used, as desired. One or more usersmay be located within the physical environment, for instance in an area, in the example illustrated in, two usersare located in the area. The pose detection systemmay be adapted to detect poses of one or more users in the area. The physical environmentmay be part of an entertainment attraction, such as an amusement park ride or experience, and in some instances, the areamay include a movable vehicle, that moves the usersthrough the environment(e.g., on a track or other pathway), or otherwise may be a stationary element (e.g., seat or viewing space). In some examples, a pose detection system can be statically installed in the physical environment. For instance, elements of a pose detection system, such as cameras or enhancers, can be statically installed on a wall, stand, building, ceiling, floor, or other structure and can detect a pose as a user passes by the cameras. In some examples, a pose detection system can be mounted to a vehicle or other structure that moves with users. An advantage of in-vehicle pose detection is that it reduces influence of the moving vehicle so that accurate pose-detection can be performed with lower demand on computational resources. For example, an entertainment attraction may have a vehicle that moves users through the environment and that includes elements of the pose detection system such as cameras or enhancers on the vehicle to detect poses as the vehicle moves through the environment. In some examples, movable portions of a pose detection system may be worn or held by a user. In various examples, some elements of a pose detection system may be statically mounted in the environment while others may be movable through the environment.

During operation, the usersinteract with content, such as images or videos, displayed within the physical environment, such as by moving their bodies including head, torso, hands, arms, legs, or the like. The systemcaptures images of poses and uses the images to generate outputs, either physically or virtually (e.g., in the displayed content), where the output is based on the movement or position of the pose.

The enhanceris generally any type of device that increases the detectability of a pose within the environment, either at the time of image capture or after the images have been captured and while the system is analyzing the images. In some examples, an enhancermay be an illuminatorthat lights a portion of the environment, such as the area. In other examples, an enhancermay be an ultrasonic emitter, a rangefinder such as a time of flight sensor, a proximity detector, or the like. Combinations of different types of enhancers may be used in conjunction with one another.

In examples where the enhancer is an illuminator, the illuminator may illuminate the areato allow a camerato capture information about a pose while being illuminated with a light. In some instances, such as where an increased sensitivity is desired, a field of view of cameramay be selected or adjusted to include a single pose, or even portions that include poses such that there may be narrow view camerasand/or enhancers, but in other instances, such as where less sensitivity and cost savings are desired, fewer wider view cameras/enhancers may be used.

In instances where the enhanceris an illuminator or light it includes a light source, such as a light emitting diode (LED). In these instances, the enhancermay include a filter positioned between the light source and the areato filter, absorb, block, or redirect light emitted from a light source so that the light does not impact the content and the user experience. For example, some infrared LEDs emit a red glow that can be seen by some usersand the filter may block the glow to prevent usersfrom seeing the light source. The filters may include an infrared transmissive material that allows desirable wavelengths of light to pass through, but blocks undesirable wavelengths, such as visible wavelengths. In this manner, the illuminator can illuminate the areawith a desired wavelength, while still preventing the users from seeing light emitted from the illuminator.

The enhancersmay emit light illuminating an areaor areas of the background to increase the visibility of objects to the cameras. To this end, the light emitted may be in various wavelengths or ranges, including those that are visible or invisible to the human eye. In some instances, wavelengths that are not perceptible by humans may be selected to avoid the lightfrom interfering with the entertainment effects of the environment or being perceived by the users. In some embodiments, the enhanceris a light source that emits light in one or more wavelengths, which may be emitted in a beam or may be diffuse, spread, patterned, or non-coherent as desired. Moreover, light output from enhancermay have uniform intensity, or may be pulse modulated, amplitude modulated, flashed, or implement a pattern useful for a particular application. In some instances, the lightis emitted by an enhancer with a beam spreadselected such that it overlaps with adjacent lightfrom other enhancers and may overlap the fields of viewof the cameras. Enhancers-may be placed in any suitable location to illuminate an areasuch that images may be captured by a camera. Any suitable number of enhancersmay be used, for instance to illuminate the areaevenly, or to provide reflections from the users bright enough for the camerasto detect images. In one example, three enhancers may be used with an areaaccommodating two users. The enhancersand/or camerasmay be mounted or fixed to the area, such as in a vehicle, or they may be stationary in the physical environment. In one embodiment, the enhancersemit light in the infrared band so as to be imperceptible by the human eye, e.g., wavelengths above 750 nanometers (“nm”), above 800 nm, above 840 nm, above 850 nm, above 900 nm, at or above 940 nm, or higher.

The cameramay be substantially any type of image detector or sensor that detects light at the wavelength emitted by enhancersand generates an electrical signal representing the detected light. Examples of the camerainclude a charge-coupled device, or a complementary metal oxide semiconductor device that converts photons of light that fall on the image sensor into a digital image including pixels that represent the intensity and/or color of the light incident on the image sensor. The cameramay also include optics to zoom or focus on a desired point or area within the physical environment.

In some embodiments, the camerashave a field of viewthat captures at least part of a pose (e.g., aimed at the areawithin the physical environment) and in instances where the systemincludes multiple cameras, the fields of viewof the camerasmay overlap one another, such as camerasandas shown for example in. Such overlapping fields of viewmay enable the pose detection systemto generate virtual posesin three dimensions, provide more accurate or precise detection of the physical pose, provide redundancy in the case of a failure or obstruction for a particular camera, and enhance detectable areas (e.g., left and right hands). In some examples, the cameramay have a field of viewwide enough to capture images of all poses within the area. A cameramay be provided at any suitable angle with respect to the area. For example, a cameramay be above, in front of, to the side of, below, behind, or combinations of these with respect to the areaConversely, the field of viewand position of cameramay be selected to prevent certain features in or near areafrom being visible to camera. For example, it may be desirable to not capture images of faces or other images that do not convey pose information.

The areamay be any area, location, or structure in the environmentadapted to accept a user. For example, an areamay be a seat, chair, a portion of a floor or wall, a portion of a vehicle (e.g., ride vehicle), or other supports to support a user, or the like. The areamay secure a user such as with a shoulder harness, lap bar, seatbelt, or the like. The areamay have a background, such as a back of a vehicle structure or the like, that optionally may define a contrast between the areaand the users, e.g., the backgroundmay be colored, textured, or otherwise configured to contrast from the usersin brightness or color of light reflected by the respective user. For example, as shown inand, the areahas a backgrounddarker in color than the users, such as being a navy blue, black, or other darker monotone color different from user clothing colors. In other examples, a background may use a chroma key color not normally associated with a user, e.g., a green screen. In other examples, the background may have a pattern or print helping to increase the contrast between the usersand the background. In some embodiments, the backgroundmay be made of a material or include a coating or other layer (e.g., paint) absorbing certain frequencies of light more readily reflected by the users. For example, some plastics, fabric, wood, brick, stone, or glass absorb infrared light, while a user's skin generally reflects infrared light. User supports may, similarly to the background, also be colored or treated to contrast relative to the users. Such contrast may assist the detection systemto more easily differentiate between the usersand the background, allowing more accurate or faster detection of a pose. The contrast may also assist the pose detection systemto detect the posesregardless of the color of their complexion, or the clothes they are wearing.

The cameras, enhancers, and/or imagersmay be in communication with one another, as shown for example in, via a computing device, which may be a server, distributed computer, cloud resource, laptop, tablet computer, smart phone, or the like.illustrates a simplified block diagram for the computing device. The computing devicemay include one or more processing elements, a video interface, an audio interface, one or more memory components, a network interface(optional), a power supply, and an input/output (“I/O”) interface. The various components may be in direct or indirect communication with one another, such as via one or more system buses, contact traces, wiring, or via wireless mechanisms.

The one or more processing elementsmay be substantially any electronic device capable of processing, receiving, and/or transmitting instructions. For example, the processing elementmay be a microprocessor, microcomputer, microcontroller, field programmable gate array, an application specific integrated circuit, graphics processing unit, or the like. The processing elementmay include one or more processing elements or modules that may or may not be in communication with one another. For example, a first processing element may control a first set of components of the computing deviceand a second processing element may control a second set of components of the computing devicewhere the first and second processing elements may or may not be in communication with each other. Relatedly, the processing elementsmay be configured to execute one or more instructions in parallel, locally, and/or across the network, such as through cloud computing resources.

The video interfaceprovides an input/output mechanism for the computing deviceto transmit visual content to one or more of the imagers. The video interfacemay transmit visual content (e.g., images, graphical user interfaces, videos, notifications, virtual environments, and objects, and the like) to the imagers, and in certain instances may also act to receive userinput in addition to a pose (e.g., via a touch screen or the like).

The audio interfaceprovides an input/output mechanism for the computing deviceto transmit audio information to other components of the pose detection system. The audio interfacemay transmit audio content (e.g., a soundtrack or dialog that accompanies the visual content) and it may also transmit sounds or alerts or auditory effects.

The memory componentsmay be a computer readable medium operable to store electronic data that may be utilized by the computing device, such as audio files, video files, document files, programming instructions, position information, motion information, cameracalibration information, authentication information, configuration information, and the like. The memory componentsmay be, for example, non-volatile storage, a magnetic storage medium, optical storage medium, magneto-optical storage medium, read only memory, random access memory, erasable programmable memory, flash memory, or a combination of one or more types of memory components.

The network interfaceis optional and can receive and transmit data to and from a network to the various devices (e.g., an imager, an enhancer, or a camera) in the pose detection system. The network interfacemay transmit and send data to the network directly or indirectly. For example, the network interfacemay transmit data to and from other computing devices through a network which may be a cellular, satellite, or other wireless network (Wi-Fi, WiMAX, Bluetooth) or a wired network (e.g., Ethernet, Ethernet Control Automation Technology (EtherCAT), Controller Area Network bus (CAN bus), Modbus, or the like), or a combination thereof. In some embodiments, the network interfacemay also include various modules, such as an API interfacing and translating requests across the network to other elements of the system.

The computing devicemay also include a power supply. The power supplyprovides power to various components of the computing device. The power supplymay include one or more rechargeable, disposable, or hardwire sources, e.g., batteries, power cord, AC/DC rectifier, DC/DC converter, or the like. Additionally, the power supplymay include one or more types of connectors or components that provide different types of power to the computing device. In some embodiments, the power supplymay include a connector (such as a universal serial bus) that provides power to the computer or batteries within the computer and also transmits data to and from the device to other devices.

The input/output (“I/O”) interfaceallows the computing deviceto receive input from a user such as an operator of the pose detection systemand provide output to the user. For example, the input/output (“I/O”) interfacemay include a capacitive touch screen, keyboard, mouse, pedal, stylus, hotkey, button, joystick, or the like. The type of devices that interact via the input/output (“I/O”) interfacemay be varied as desired.

The computing devicemay activate the enhancers. In examples where the enhancer is an illuminator the computing device may adjust the intensity, modulation, or color of light emitted. The computing devicereceives images of the physical environmentfrom a cameraand may cause an imagerto display images of the virtual environment. Any or all of the camerasand/or imagersmay have a dedicated graphics processor, such as a GPU to assist in processing input or output image data. In some examples, camerasand/or imagersmay share a GPU, or may use no GPU.

Optionally, the pose detection systemmay include one or more imagersthat may generally be substantially any type of device that displays a visual image. In some examples, an imagermay include a projectorthat emits light toward a display surface, such as a screen, scrim, wall, mannequin, or other object that receives light from the projectorto generate an image. Some examples of projectorsinclude digital light processors, lasers, or LCD. In some examples, a display surfacemay reflect light emitted by the projectortoward a user. In other examples, a display surfacemay transmit light emitted by the projectorthrough, toward the user, such as a rear projection screen. In some examples, a display surface may reflect and transmit light. In some examples an imagermay be a light, such as a stage or theatrical light. In some examples, an imagermay be a display or monitor such as a cathode ray tube, plasma, LCD, or LED display. Such displays may generate and display visual images without a separate display surface, or with an internal display surface that generates images. An imagermay be mounted to the physical environment, such as a wall, floor, or similar structure. The imagers be mounted to a vehicle or other mobile object. Some imagersmay be included in a headset configured to be worn by a user, such as a virtual reality or 3D headset.

In some examples, the pose detection systemmay allow a userto interact with a virtual environment. For example, the pose detection systemmay display a virtual environment, with virtual objects, on a display surface. In response to detecting a physical pose, the pose detection systemdetermines a virtual pose, models the position and movements of the physical pose, and uses the virtual poseto interact with virtual objects.

The pose detection systemdetects the physical poseof various parts of a user'sfigure. For example, as shown in, the pose detection systemmay detect the pose of the head, torso, or major limbs (e.g., arms or legs). The pose detection systemmay also detect the pose in finer detail, such as the pose of a hand, foot, finger, and toe.-show examples of physical hand poses that may be detected by the pose detection system. For example, in, a first gestureis shown; ina thumbs-up gestureis shown; ina pointing gestureis shown; and in, a character gestureassociated with a character or virtual characteristic in the virtual environment is shown. Any other gesture that may be made by a human may be detected, as desired. After the pose detection systemdetects a gesture or predetermined set of movements or positions, the system may link the gesture with a virtual effect (e.g., an audio or other effect), and output the corresponding effect. For example, if the pose detection systemdetects a pointing gesture, such as shown for example in, the system may display a visual effect of an energy beam emanating from the user's hand performing the gesture.

In some examples, the pose detection system may segment a detected image and process just a portion of the image of interest. For example, if the pose detection system is configured to detect hand gestures, the system may prioritize analyzing portions of an image associated with hands, or may analyze only such portions. Such segmentation may have the advantage of saving processing throughput or computing time of a processing element, thereby enabling the use of a lower-powered processing element. Segmentation may allow a processing element to process images from more cameras or process images at a higher frame rate than without segmentation.

shows a methodof detecting a pose and outputting a virtual effect using the pose detection system. The operations of the methodmay be executed in an order other than as shown in.

The methodmay begin in operationand the pose detection systemactuates the one or more enhancersto increase the detectability of a physical posewithin the environment. For example, one or more enhancers may illuminate the physical environment, e.g., activate a lightso as to reflect light off of the areaand the users. Individual enhancersmay be turned on at different times or all enhancersin the physical environmentmay be turned on at substantially the same time, the actuation of the enhancersmay depend on the experience, user movement, or the like. The intensity of the light from the enhancersmay be adjusted to provide desired illumination of a portion or all of the physical environment, such as the area. For example, illumination levels may be monitored by one or more camerasand adjusted based on illumination detected by the cameras. In one example, the computing devicemay control the on/off state or intensity of the enhancers, either directly, through a network, a controller, or other suitable device. In other examples, the enhancersmay be turned on manually such as with a switch, or may be powered any time power is applied to a portion of the physical environment. For example, when the areais a vehicle, such as for use in a theme park ride or attraction, the enhancersmay be powered any time the ride is powered or as the vehicle approaches certain waypoints that correspond to select content presentation and user interaction areas.

In operationthe computing devicereceives an image of the physical environment, including the area, captured by a camera. The captured image typically includes the userand captures the position or physical poseat the moment the image was captured. More than one cameramay capture an image of the areafrom different locations or angles at substantially the same time, e.g., the cameras may be synchronized to capture images at the same points in time but at different locations, such that images from the cameras will correspond to different views of areaat the same point in time. For example, one cameramay be positioned to the left of the userand a second camera positioned to the right of the user. Both such camerasmay capture images of a pose of the head and torso at the same time, but from the left and right positions, respectively.

Depending on the number of users, desired sensitivity, detected poses, etc., additional camerasmay be used to capture images of the same features of the physical poseat similar points in time or at alternating points in time from one another. For example, the physical environmentmay include two camerasandthat capture images the areaat substantially the same time, but from different positions. In some examples, the camerasandmay be synchronized, such as with a time signal, such that images are captured at substantially the same time by each cameraandIn other examples, camerasandmay be operated in a sequential manner capturing frames one after the other in sequence, which operation may be controlled by a time signal. Such time signals may be provided by the computing deviceor other processing element. Images captured by a cameramay be transmitted for further processing to the computing deviceor another processing element, by any suitable method, such as via a wired or wireless network, or dedicated wires, fiber optics or cabling.

In operationthe computing deviceor other processing element analyzes the captured images to detect a skeletal feature of the physical pose. The computing deviceuses the captured images to detect the position of the bones and joints of the physical pose. In some examples, the computing devicemay detect these portions of the physical posein the captured images using machine learning (“ML”) or artificial intelligence (“AI”) algorithms that have been trained to recognize such features, such as a convolutional neural network. In some examples, the images are individually analyzed to detect skeletal features using AI or ML algorithms. For example, an AI or ML algorithms may be trained to recognize and/or classify joints, bones, eyes, the head, shoulders, the neck, or other features, as well as poses of those features. In some examples, an AI may be trained by feeding it training images of physical poseswhere the locations of bone and/or joint objects have already been identified. Training images may be images in which points of significance (e.g. joints and/or bones) have been labeled. The AI may learn from the training images to detect portions or pixels of an image associated with bone or joint objects. The training images may be adapted or varied for detecting different skeletal features. For example, there may be separate training sets provided for whole body, upper body, lower body, arms, feet, hands, or the like. The training images may include images of physical posesenhanced by an enhancer. The training images may have backgroundsthat contrast in shape, color, luminosity, pattern, or other characteristics from users. For example, if the AI is trained on images with dark backgrounds and relatively brightly lit users, the AI may ignore dark pixels of images and focus on brightly lit areas to detect the pose.

Upon identifying a feature of the physical poseassociated with a bone, the pose detection systemmay generate a bone objectin the virtual environment representing the physical bone in the physical environment. The pose detection systemmay detect joints between bones, such as a knuckle, knee, or the like. The pose detection systemmay generate a joint objectconnecting two or more bone objects. Bone objectsconnected to a common joint objectmay move, twist, pivot, or rotate about the joint objectwith respect to one another. The pose detection systemmay generate a terminal objectat the end of a bone object, representing an end of a limb, such as a fingertip, foot, or hand. The various bone objects, joint objects, and terminal objectscollectively represent the virtual pose. The system may assign bone objects and/or joint objects to a structure, e.g. a virtual skeleton.

The pose detection systemmay capture a first image from a camera, such as in operationand determine a virtual poseof major joints and features of the figure of the user, such as shown for example, inin operation. The pose detection systemmay capture a second image from a camera, such as in operation. Additionally or alternately, the pose detection system may capture images of a physical posefrom different positions and/or angles at substantially the same time using more than one camera. The pose detection system may use a second image (either captured sequentially with the same camera and/or captured at the same time with more than one camera) to determine a virtual pose with more accuracy and/or determine virtual poses of smaller parts of the body (e.g., hands, wrists, fingers, feet, or toes). Portions of the AI trained with training images corresponding to certain skeletal features may be selectively activated when a relevant feature is to be detected. For example, if the skeletal feature of interest is a hand, the pose detection system may activate only the portion of the AI trained to detect hands. Alternately or additionally, the pose detection systemmay determine a virtual posein greater detail. For example, the pose detection systemmay detect a virtual poseof the head, arms, eyes and torso in a first image and then, preferably before the usermoves a significant amount, capture another image and determine the pose of the hand and fingers, as shown for example in-. In some examples, the pose detection systemmay detect bone and/or joint objects from a single frame, or from two or more frames captured at different times. In instances where multiple frames may be used, the method may return to the image capture operations to capture additional images as needed before proceeding to other operations.

In operationthe computing deviceor other processing element determines position characteristics of a joint objectand/or bone object. The pose detection systemmay define a coordinate system with respect to an origin in the physical environment. For example, the origin may be a corner or other suitable location of the area. The coordinate system may include one or more axes, two or more of which may be mutually orthogonal, such as a Cartesian coordinate system. Other coordinate systems, such as polar coordinate systems may be defined as well, such as may be useful for specifying a distance and direction of a body feature relative to a camera. The computing devicemay translate coordinates between suitable coordinate systems as desired.

In some examples, when a bone objector joint objectis detected in images captured at substantially the same time by different cameras, the computing devicemay use triangulation techniques, given known coordinates and orientations of the cameraswithin the physical environment, to locate the joint objectsand/or bone objectsin three dimensional (“3D”) space. In some examples, after skeletal features are detected in individual images, such as in operation, results from each camera are then combined to reconstruct the locations of physical poses. For example, returning to, the cameramay capture an image of a left hand from a point of view to the left of the hand. The cameramay capture an image of the same left hand, but from a point of view to the right of the hand. The two images from the camerasandmay be captured at substantially the same time when the hand is in the same location in the physical environmentin both images. In operation, the pose detection systemmay individually detect bone and/or joint objects in each of the two images from the camerasandThe systemmay recognize that the bone and/or joint objects detected in the two images from the camerasandare images of the same bones or joints taken from different points of view. The pose detection systemmay, for example in operation, combine the results of the detection of the bone and/or joint objects in each of the images to reconstruct the location of the bone and/or joint objects in 3D space. Capturing two or more images of a physical poseat substantially the same time, from different cameraswith different points of view has the advantage over prior systems of enabling the pose detection systemto detect physical posesin three dimensions. Likewise, when position information on a physical poseis detected in three dimensions, three dimensional movement may be detected as well. Generally, the more camerasthat detect a particular object, the more accurately the 3D position and/or motion of the object can be placed.

The positions of the bone objectsand joint objectsmay be stored in memoryin the form of a position vector that represents coordinates of an aspect of the bone objector joint objectwithin the coordinate system. For example, the computing devicemay have determined a bone object corresponding to a humerus bone in the arm. The computing devicemay determine a position of the proximal aspect of the humerus (the end of the humerus connected to the shoulder) in the coordinate system and store that position in a vector with values representing the position of the proximal aspect with respect to an axis of the coordinate system. For example, the vector may contain x, y, and z values with respect to mutually orthogonal x, y, and z axes of the coordinate system. Likewise, the computing devicemay determine the position of the distal aspect of the humerus (the end at the elbow) and store that location in the same vector or another vector as the proximal aspect. Similarly, the computing devicemay determine a joint object associated with an elbow and may store the position of the elbow joint object in a similar vector. The coordinates of the position of the elbow joint object may be substantially the same as those of the distal aspect of the humerus.

In operationthe computing devicedetermines motion characteristics of joint objectsand/or bone objects, which can be used to determine motion characteristics for the virtual pose. In some examples, the pose detection systemmay detect movement between two frames captured at different times, separated by a known or predetermined amount of time. For example, a cameramay operate at a frame rate, capturing a number of images per second. In one example, a cameracaptures images at a rate of 60 frames per second (“FPS”) or higher. In other examples, a cameracaptures images at a rate of 100 FPS or higher. The pose detection systemmay calculate motion based on the movement of bone objectsor joint objectsbetween such frames. For example, if the time between a first frame and a second frame is known or predetermined, and a change in position of a bone objector joint objectis detected between the two frames, the computing devicemay determine a speed or velocity (velocity including a speed and a direction) of the bone objector joint object. Likewise, the pose detection systemmay determine changes in speed and/or velocity of a bone objector joint object. For example, if the computing devicedetermines a velocity of a bone object, as described above, between two successive frames captured by a camera, and determines another velocity between another two successive frames (one of which may be a common frame with the first velocity determination), the computing devicecan use the change between those velocities and the time between the frames to determine an acceleration, including a change in direction, speed, and/or velocity, of the bone objectand/or joint object. Similarly, the pose detection systemmay determine changes in acceleration. The computing devicemay represent such motion characteristics of the bone objectsand joint objectsof the virtual poseas vectors in 2D or 3D space to represent the motion of the physical pose. The pose detection systemmay detect direction of movement of the physical pose, based on determining the location or motion of the shoulder, the bones of the arm, the wrist, and the fingers. In some examples of the method, the operationmay be optional such that motion characteristics of bone and/or joint objects are not determined or are ignored.

In either or both of operationand operationthe computing devicemay use the motion characteristics and/or position characteristics, to determine gestures or momentum of the physical pose. For example, a rapid acceleration of a finger where the finger uncurls and is straight, may be determined to be a flick gesture. The pose detection systemmay detect any number of poses for which it has been trained, for example, the fist gesture, thumbs-up gesture, the pointing gesture, the character gesture, or the like. The pose detection systemmay determine compound poses that combine two or more poses, such as a userwith crossed arms and also using a pointing gesturelike in, or a userextending an arm out in a direction and making a pose associated with a character, such as the character gesturein.

The pose detection systemmay predict a future motion and/or position of a physical posebased on a current or previous motion and/or position of the physical poseand generate a corresponding predicted motion or position characteristic. For example, if the systemdetects that a userbegins to move an arm at a speed and/or direction consistent with the arm extending away from the user, the systemmay predict the continuation or completion of the movement (e.g., the arm being fully extended from the user). The systemmay use a detected a motion or position characteristic of one part of a physical poseto predict a motion or position characteristic of another part of a physical pose. For example, the systemmay detect a torso movement indicating the usermay be preparing to extend an arm out and may generate a predicted motion or position characteristic related to the arm. The systemmay compare a predicted motion and/or position characteristic to an actual motion and/or position characteristic and compare errors therebetween to learn a user's behavior. Such learning function may allow the systemto more accurately predict motion and/or position of physical poseover time. The systemmay store predicted motion and/or position characteristics using methods similar to those used for detected position and motion characteristics.

Patent Metadata

Filing Date

Unknown

Publication Date

October 2, 2025

Inventors

Unknown

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search