A motion generation system includes a tracking model, executed by a processor, configured to track at least one kinematic reference motion of a robotic device; a reward surrogate model, executed by the processor, that evaluates a performance of the tracking model with respect to the at least one kinematic reference motion and estimates at least one reward for the tracking model based on the performance; and a generative model, executed by the processor, configured to generate a motion for the robotic device based on a contextual input and the estimated at least one reward, wherein the generative model is trained with a pre-training operation and a refinement operation separate from the pre-training operation.
Legal claims defining the scope of protection, as filed with the USPTO.
. A motion generation system comprising:
. The motion generation system of, further comprising the robotic device that executes the motion generated by the generative model.
. The motion generation system of, wherein the generative model comprises a motion diffusion model.
. The motion generation system of, wherein the pre-training operation comprises:
. The motion generation system of, wherein the refinement operation comprises:
. The motion generation system of, wherein the reinforcement signal comprises a negative sum of the estimated at least one reward.
. The motion generation system of, wherein the tracking model comprises a trained machine learning model.
. The motion generation system of, wherein the tracking model is trained separately from the reward surrogate model, and both of the tracking model and the reward surrogate model are frozen with respect to the generative model.
. The motion generation system of, wherein the contextual input comprises one or more of a textual input or an auditory input.
. The motion generation system of, wherein the tracking model is further configured to track the at least one kinematic reference motion based on a state of the robotic device.
. A non-transitory computer-readable storage medium, the computer-readable storage medium including instructions that when executed by a computer, cause the computer to:
. The non-transitory computer-readable storage medium of, wherein the instructions further cause the computer to instruct the robotic device to perform the motion.
. The non-transitory computer-readable storage medium of, wherein the generative model comprises a motion diffusion model.
. The non-transitory computer-readable storage medium of, wherein the instructions further cause the computer to execute a pre-training operation comprising:
. The non-transitory computer-readable storage medium of, wherein the instructions further cause the computer to execute a refinement operation comprising:
. The non-transitory computer-readable storage medium of, wherein the reinforcement signal comprises a negative sum of the estimated at least one reward.
. The non-transitory computer-readable storage medium of, wherein the tracking model comprises a trained machine learning model.
. The non-transitory computer-readable storage medium of, wherein the tracking model is trained separately from the reward surrogate model, and both of the tracking model and the reward surrogate model are fixed with respect to the generative model.
. The non-transitory computer-readable storage medium of, wherein the contextual input comprises one or more of a textual input or an auditory input.
. The non-transitory computer-readable storage medium of, wherein the tracking model is further configured to track the kinematic reference motion based on a state of the robotic device.
Complete technical specification and implementation details from the patent document.
This application claims the benefit of priority under 35 U.S.C. § 119 (e) and 37 C.F.R. § 1.78 to provisional application No. 63/649,214 filed on May 17, 2024, titled “MOTION GENERATION FOR ROBOTIC CHARACTERS” which is hereby incorporated by reference herein in its entirety.
Recent advancements in generative motion models have achieved remarkable results, enabling the synthesis of lifelike human motions from textual descriptions. These kinematic approaches, while visually appealing, often produce motions that fail to adhere to physical constraints, resulting in artifacts that impede real-world deployment.
The automated generation of realistic motions based on high-level user input is a crucial task in physics-based character animation and robotics. Traditionally, computer animation has emphasized kinematic-based approaches, which are well-suited for animated film and video games where visual storytelling takes precedence. Recent advances in generative models have demonstrated the ability to synthesize diverse and visually appealing motions when trained on large datasets. However, these kinematic-based generated motions do not strictly satisfy the constraints that come with a physics-based environment. As a result, the motions often contain artifacts such as floating, foot sliding, self-collisions, violations of joint limits, and dynamic imbalance, making it challenging to deploy these models in the real world. Although robust motion tracking controllers and tracking models exist, the resulting motion is inherently limited by the quality of the provided target motion.
Current systems that generate motion for simulated and real robotic devices from an input prompt suffer from many deficiencies. For example, while systems exist that can convert a string of text or a voice command into a set of robot poses, those poses may not be feasible within the capabilities of a real robotic figure. In some cases, such generated motions may result in instability of the robotic device during execution of a generated motion, which can cause the robotic device to fail to perform a desired motion. For example, existing systems may not respect the constraints of the real-world robotic device and its components, such as actuator velocity, acceleration, and torque limits. In addition, or alternately, existing systems may not be aware of mass, force, acceleration, and balance, which can lead to issues with a real robotic device.
This situation gives rise to a generation-to-real (or gen-to-real) gap. Such gaps can exist, for example, where a motion generator has been trained on human motion capture (or MoCap) data and then applied to a robotic device. Robotic devices seldom have the same joint flexibility, strength, fine motor control, range of motion, speed, weight, etc. as the humans whose captured motion was used to train the generator. Simply applying a model trained this way to a robotic device often results in unstable or undesirable motions, and may even cause the robotic device to fall, stumble, or collide with itself in unexpected ways. Similar problems can occur when applying a motion generation system between different robotic devices.
Improved systems and methods are desired that can close the gen-to-real gap for motion generators.
In one embodiment, a motion generation system includes: a tracking model, executed by a processor, configured to track at least one kinematic reference motion of a robotic device; a reward surrogate model, executed by the processor, configured to evaluate a performance of the tracking model with respect to at least one kinematic reference motion and estimate at least one reward for the tracking model based on the performance; and a generative model, executed by the processor, configured to generate a motion for the robotic device based on a contextual input and the estimated at least one reward, wherein the generative model is trained with a pre-training operation and a refinement operation separate from the pre-training operation.
In some embodiments, the motion generation system further includes the robotic device that executes the motion generated by the generative model.
In some embodiments, the generative model includes a motion diffusion model.
In some embodiments, the pre-training operation includes providing, via the processor, a motion sequence to the generative model; adding noise, via the processor, to the motion sequence to generate a noisy motion sequence; and gradually removing, via the processor, the noise from the noisy motion sequence to reconstruct the motion sequence.
In some embodiments, the refinement operation includes generating, via the processor, a second motion sequence with the generative model; providing, via the processor, a reinforcement signal to the generative model based on the estimated at least one reward.
In some embodiments, the reinforcement signal includes a negative sum of the estimated at least one reward.
In some embodiments, the tracking model includes a trained machine learning model.
In some embodiments, the tracking model is trained separately from the reward surrogate model, and both of the tracking model and the reward surrogate model are frozen with respect to the generative model.
In some embodiments, the contextual input includes one or more of a textual input or an auditory input.
In some embodiments, the tracking model is further configured to track the at least one kinematic reference motion based on a state of the robotic device.
In one embodiment, a non-transitory computer-readable storage medium, the computer-readable storage medium including instructions that when executed by a computer, cause the computer to: execute a tracking model configured to track at least one kinematic reference motion of a robotic device; execute a reward surrogate model that evaluates a performance of the tracking model with respect to the at least one kinematic reference motion and estimates at least one reward for the tracking model based on the performance; execute a generative model configured to generate a motion for the robotic device based on a contextual input and the estimated at least one reward, wherein the generative model is trained with a pre-training operation and a refinement operation separate from the pre-training operation.
In some embodiments, the instructions further cause the computer to instruct the robotic device to perform the motion.
In some embodiments, the generative model includes a motion diffusion model.
In some embodiments, the instructions further cause the computer to execute a pre-training operation includes providing a motion sequence to the generative model; adding noise to the motion sequence to generate a noisy motion sequence; and gradually removing the noise from the noisy motion sequence to reconstruct the motion sequence.
In some embodiments, the instructions further cause the computer to execute a refinement operation including generating a second motion sequence with the generative model; providing a reinforcement signal to the generative model based on the estimated at least one reward.
In some embodiments, the reinforcement signal includes a negative sum of the estimated at least one reward.
In some embodiments, the tracking model includes a trained machine learning model.
In some embodiments, the tracking model is trained separately from the reward surrogate model, and both of the tracking model and the reward surrogate model are fixed with respect to the generative model.
In some embodiments, the contextual input includes one or more of a textual input or an auditory input.
In some embodiments, the tracking model is further configured to track the kinematic reference motion based on a state of the robotic device.
The systems and methods disclosed close the generation-to-real gap for motion generation systems, including textual input based motion generation systems. The systems and methods disclosed integrate kinematic generative models with physics-based character control.
The motion generation systems and methods disclosed include a tracking model, or actor, which tracks one or more kinematic reference motions of a robotic device. In many embodiments, the tracking model takes a desired motion (e.g., from an animation) and outputs an action into an environment (real or virtual) including the robotic device. The action creates a state change in the robotic device in the environment. This state change is fed back to the tracking model in a closed loop. In many embodiments, the tracking model is a trained machine learning algorithm. In some embodiments, the tracking model is another type of algorithm.
The motion generation systems and methods include a reward surrogate model, or critic, which estimates at least one reward for the tracking model based on the performance of the tracking model at executing the kinematic reference motion. In many embodiments, the reward surrogate model is trained using a reinforcement learning (RL) algorithm. The reward surrogate model estimates the performance of the downstream non-differentiable control task, offering an efficient and differentiable loss function. This reward model is then employed to fine-tune a baseline generative model, ensuring that the generated motions are not only diverse but also physically plausible for real-world scenarios
The motion generation systems and methods include a generative model that generates a motion for the robotic device based on a contextual input such as a language input and the reward estimated by the reward surrogate model. This reward system accounts for the kinematic and dynamic aspects of the robotic device and is used to refine the training of the generative model to close the gen-to-real gap. In many embodiments, the generative model is a text-conditioned kinematic diffusion model that interfaces with the reinforcement learning-based tracking model.
The systems and methods disclosed align the output of kinematic generative models with the downstream task of tracking these motions with a physics-based or robotic character or robotic device. Evaluating the performance of a controlled character on generated motions requires long-horizon simulations, which are computationally expensive and non-differentiable. Even if a differentiable simulation is available, the highly non-linear nature of the articulated rigid-body system and the contact dynamics results in poorly behaved gradients.
The systems and methods estimate the expected performance of the downstream task. This estimation provides a differentiable and computationally efficient loss function to fine-tune the generative model. During deployment, the systems and methods interface the fine-tuned generative model with the existing tracking model. This processing is in contrast with training a generative controller directly, which typically results in a controller with a latent space that can be sampled. However, since these controllers are trained with reinforcement learning (RL), the network is typically limited to a shallow Multilayer Perceptron (MLP). By decoupling the generative model from the tracking model, we can employ more advanced networks and specialized training strategies. The systems and methods include a text-conditioned diffusion-based approach, although the disclosed fine-tuning strategy is applicable to generative models in general.
In summary, the present disclosure includes: a fine-tuning method for generative kinematic motion models using a reward surrogate model that offers an efficient, differentiable estimate of the downstream task. Examples of results of deploying the disclosed system and training methods to a real-world physical robotic device are included. As described further herein, the disclosed models, systems, and methods have been beneficially applied to the practical application of controlling real-world robotic systems. See, Tables 1 and 2, and related description.
Turning to the figures,is a schematic of an embodiment of a motion generation system. The motion generation systemincludes a user device, a robotic device, and may optionally include a network. The robotic devicemay be either a simulated robotic deviceor a real, physical robotic device. The user devicemay be any suitable computing device including a processing element(e.g., described with respect to) that can execute instructions to carry out the methods disclosed herein, such as training or deploying any machine learning algorithm disclosed herein. The motion generation systemincludes a tracking model, a reward surrogate model, and a generative model. As described herein in further detail, the tracking modeland the reward surrogate modelare used to train the generative modelto close the gen-to-real gap.
The user devicemay be in electronic communication with the robotic device, either directly, or via the network. The user devicemay receive input from a userto cause the robotic deviceto perform a certain motion or task. As shown for example in, the robotic devicemay receive a command such as “Wave to the crowd” and the robotic devicemay execute one or more methods herein that cause the robotic deviceto wave its hand as though waving to a crowd.
is a simplified schematic showing portions of an embodiment of the motion generation systemand inputs and outputs thereof. The motion generation systemincludes a tracking model(i.e., an actor). The motion generation systemincludes a reward surrogate model(i.e., a critic).
As described further herein, to train the tracking model, motion sequencefor a robotic deviceis provided. In many embodiments, the tracking modelincludes a trained machine learning algorithm such as a neural network. In some embodiments, the tracking modelis another type of algorithm, such as a kinematic model of the robotic device. In many embodiments, tracking models may be shallow MLPs (e.g., with only one or two layers), or advanced transformer-based models trained using RL methods. In many embodiments, the kinematic model is a generative model, which may be fully-connected, convolutional or transformer-based architectures. The kinematic model may be trained through adversarial methods such as done with generative adversarial networks (GAN), or by learning to inverse the diffusion process as disclosed herein.
The motion sequenceis typically a virtual or animated motion sequence, but may be a real motion sequenceof a physical robotic devicein some embodiments. From the motion sequence, at least one kinematic reference motionis extracted and provided to the tracking model. In some embodiments, the motion sequenceis sampled (e.g., uniformly sampled over time) during training. At inference time, a user can decide which kinematic reference motionto track such as by providing a text or other contextual input to select the kinematic reference motionto be extracted. The tracking modelgenerates an actionfor the robotic devicebased on the kinematic reference motion. The actionis provided to an environmentincluding the robotic device. Again, the environmentis virtual and the robotic deviceis a virtual model of the robotic devicein many embodiments. However, where the robotic deviceis a real, physical device, the environmentis any surroundings of the robotic device(either indoors or outdoors) and the actionis provided to the robotic devicerather than the environment. The robotic deviceexecutes the actionwhich creates a statethat is fed back to the tracking model, in a closed loop. The tracking modelmay be trained with many kinematic reference motionsfrom the motion sequenceand may also be trained with kinematic reference motionsfrom different motion sequences. After the tracking modelis trained, its parameters may be frozen, such that training of other portions of the motion generation systemdoes not affect the parameters of the tracking model. As such, the tracking modelcan be a “black box” with respect to the rest of the motion generation system, in that the motion generation systemdoes not have, or need, knowledge about the inner workings of the tracking model.
The reward surrogate modelmay receive the same kinematic reference motionsas the tracking model. The tracking modelevaluates the stateof the environmentin response to the action. The statecan be any observable data that describes the environmentand/or the robotic device. Examples of statescan include joint positions, joint velocities, root linear and angular velocity, root orientation, etc. Based on how well the tracking modelperforms the kinematic reference motion, the reward surrogate modelgenerates an estimated reward. For example, the reward surrogate modelmay be a type or part of reinforcement learning algorithm.
The systems and methods assume the availability of a tracking controller conditioned on the kinematic reference motion, and a generative modelthat produces kinematic motions. The systems and methods train the tracking controller and the generative model on the motion sequencedataset. As described further herein, the development of the generative modelincludes three parts: (i) training the reward surrogate model for the motion tracking task, (ii) aligning the generative modelwith this reward surrogate model, and (iii) sequencing the generative modelwith the tracking controller during deployment.
Motions over duration T are encoded with a T×(7+2J), where/presents the number of joints. This matrix includes measurements for root height, root linear velocity XY-plane, root angular velocity about z-axis, root pose (3-dimensional), and joint positions and velocities. This representation may be consistently applied across all stages of the method. Furthermore, motion data is normalized to the local pose of the character (e.g., by removing the heading direction from the pose so that the motion aligns with the x-axis being forward and z axis upward), where the x-axis aligns with the heading direction and the z-axis points upward. For example, a robotic devicemay have a predefined “forward” heading direction (e.g., looking straight). This normalization helps assure that the poses performed are consistent with the direction the robotic device is facing. This normalization strategy decouples each pose from its absolute position and orientation in global coordinates thereby facilitating a more efficient utilization of the data resources. Mis a subset of columns from matrix M corresponding to either a single pose or window of poses. If a motion is shorter, the systems and methods may pad the matrix with “zero” columns, restricting evaluations to loss or reward functions to the number of non-zero columns.
The tracking controller is a probability function, π(a|s, m), where ais the action taken, sis the observed state at time t, and mrepresents the target kinematic motion to the tracking controller. The environment reacts to the action by transitioning to the next state, s, and providing a scalar reward r=r(s, a, s, m). The reward reflects how accurately the resulting physical motion tracks the kinematic input.
During training, an episode (e.g., a prescribed motion sequence or time) is initialized by randomly choosing a motion and starting frame from the dataset. Then the motion sequenceis shifted by one frame within the same motion sequenceto retrieve the next reference. This process continues until the end of a motion sequence, randomly jumping to a new motion sequence if the episode has not terminated yet. Additionally, the domain is randomized to increase the robustness of the tracking controller and avoid overfitting to a single set of simulation parameters of the environment, randomizing rigid body masses, friction coefficients, and by introducing random disturbance forces (e.g., such as the robotic device may be subjected to from a random gust of wind). To further reduce the gen-to-real gap, the tracking model may include actuator models.
After training, the parameters of the tracking modelare frozen and the same environment is used to learn a function that estimates the performance of the tracking modelgiven a motion reference. The systems and methods estimate the expected discounted cumulative reward given the current motion reference.
where the expectation is performed over state-action trajectories and future motion references, and y∈[0, 1] is the discount factor ris the reward at time t, which has the same reward as during RL. In principle, though, the reward function can be altered this stage. The estimate in equation (1) is closely related to the value function used during RL, which is given by
However, note that the RL value function has access to the current state of the character. The estimated rewardcan therefore be understood as the value function averaged over the distribution of states, thus establishing a differentiable link between kinematic motion and expected reward. With the observation that the proposed critic is an RL critic with partial observations, the systems and methods apply standard value function estimation algorithms to train a network, v(m), that approximates equation (1). The systems and methods disclosed may use the approach from proximal policy optimization (PPO), which estimates a value function target using truncated Generalized Advantage Estimation (GAE), corresponding to a truncated temporal difference (TD(λ)) estimate. Given a finite roll-out of the current tracking controller of length T, and given a current set of parameters θ, an updated value function estimate,, is computed as
where δis the TD error at time t′, given by
Unknown
November 20, 2025
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.