10776542

Method and Device for Calibrating Physics Engine of Virtual World Simulator to Be Used for Learning of Deep Learning-Based Device, and a Learning Method and Learning Device for Real State Network Used Therefor

PublishedSeptember 15, 2020
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
22 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method for calibrating a physics engine of a virtual world simulator to be used for learning of a deep learning-based device, comprising: (a) if virtual current frame information corresponding to a virtual current state in a virtual environment is acquired from the virtual world simulator: (i) transmitting, by a calibrating device, the virtual current frame information to the deep learning-based device, wherein the transmitting of the virtual current frame information to the deep learning-based device instructs the deep learning-based device to apply its operation using previous learned parameters to the virtual current frame information and output virtual action information corresponding to the virtual current frame information, and wherein the deep learning-based device, in response to receiving the virtual current frame information from the calibrating device, applies the deep learning-based device operation using previous learned parameters to the virtual current frame information and outputs virtual action information corresponding to the virtual current frame information; (ii) transmitting, by the calibrating device, the virtual current frame information and the virtual action information to the physics engine of the virtual world simulator, wherein the transmitting of the virtual current frame information to the physics engine instructs the physics engine to apply its operation using previous calibrated parameters to the virtual current frame information and the virtual action information and to output virtual next frame information corresponding to the virtual current frame information and the virtual action information, wherein the physics engine, in response to receiving the virtual current frame information and the virtual action information, applies the physics engine operation using previous calibrated parameters to the virtual current frame information and the virtual information and outputs virtual next frame information corresponding to the virtual current frame information and the virtual action information; and (iii) transmitting, by the calibrating device, the virtual current frame information and the virtual action information to a real state network that has been trained to output multiple pieces of predicted next frame information in response to real action information on a real action performed in multiple pieces of real recent frame information by the deep learning-based device in a real environment, wherein the transmitting of the virtual current frame information and the virtual action information to the real state network instructs the real state network to apply its operation using learned prediction parameters to the virtual action information and multiple pieces of virtual recent frame information corresponding to the virtual current frame information and to output predicted real next frame information, wherein the real state network, in response to receiving the virtual current frame information and the virtual action information, applies the real state network operation using learned prediction parameters to the virtual action information and multiple pieces of the virtual recent frame information corresponding to the virtual current frame information and outputs predicted real next frame information; and (b) calibrating and optimizing, by the calibrating device, the previous calibrated parameters of the physics engine, such that at least one loss that is created by referring to the virtual next frame information and the predicted real next frame information is minimized, wherein the calibrating and optimizing generates current calibrated parameters as optimized parameters.

Plain English Translation

The method involves calibrating a physics engine in a virtual world simulator to improve the accuracy of deep learning-based devices trained in simulated environments. The physics engine simulates interactions in a virtual environment, but discrepancies between virtual and real-world physics can degrade the performance of deep learning models when deployed in real-world scenarios. The method addresses this by aligning the physics engine's behavior with real-world outcomes. The process begins by acquiring virtual current frame information representing the state of the virtual environment. A calibrating device sends this information to a deep learning-based device, which processes it using previously learned parameters to generate virtual action information. The calibrating device then transmits the virtual current frame information and the virtual action information to the physics engine, which applies its calibrated parameters to produce virtual next frame information. Simultaneously, the calibrating device sends the same data to a real state network—a model trained on real-world data to predict real-world outcomes based on actions taken in recent frames. The real state network outputs predicted real next frame information, representing how the real world would respond to the virtual action. The calibrating device compares the virtual next frame information from the physics engine with the predicted real next frame information from the real state network. It then adjusts the physics engine's parameters to minimize the discrepancy between the two, ensuring the virtual simulator more accurately reflects real-world physics. This calibration improves the transferability of deep learning models trained in simulation to real-world applications.

Claim 2

Original Legal Text

2. The method of claim 1 , further comprising: (c) transmitting, by the calibrating device, the virtual next frame information and reward information corresponding to the virtual action information to the deep learning-based device, wherein the transmitting of the virtual next frame information and reward information instructs the deep learning-based device to update the previous learned parameters via on-policy reinforcement learning, and wherein the on-policy reinforcement learning uses the virtual next frame information and the reward information, and wherein the deep learning-based device, in response to receiving the virtual next frame information and reward information, updates the previous learned parameters via the on-policy reinforcement learning.

Plain English Translation

This invention relates to reinforcement learning systems, specifically methods for updating deep learning models using on-policy reinforcement learning. The problem addressed is the need for efficient parameter updates in reinforcement learning systems, particularly when training models to make decisions based on virtual environments or simulated data. The method involves a calibrating device that generates virtual next frame information and reward information corresponding to virtual action information. This data is transmitted to a deep learning-based device, which uses it to update previously learned parameters. The update process employs on-policy reinforcement learning, where the virtual next frame information and reward information are used to refine the model's decision-making capabilities. The deep learning-based device processes the received data and adjusts its parameters accordingly, ensuring continuous improvement in performance. The system ensures that the reinforcement learning model adapts dynamically to new data, improving accuracy and decision-making efficiency in virtual or simulated environments. This approach is particularly useful in applications where real-world data is scarce or expensive to obtain, such as robotics, autonomous systems, or gaming AI. The method enables real-time learning and adaptation, making it suitable for environments where rapid decision-making is critical.

Claim 3

Original Legal Text

3. The method of claim 1 , wherein the multiple pieces of the virtual recent frame information are generated by referring to the virtual current frame information and k pieces of virtual previous frame information received beforehand.

Plain English Translation

This invention relates to generating virtual frame information in video processing systems, particularly for applications requiring high frame rates or interpolation between existing frames. The problem addressed is the computational complexity and latency involved in generating smooth, high-quality virtual frames from limited input data, especially in real-time or near-real-time scenarios. The method involves generating multiple pieces of virtual recent frame information by referencing virtual current frame information and k pieces of virtual previous frame information received beforehand. The virtual current frame information is derived from a combination of a current input frame and one or more previously generated virtual frames. The virtual previous frame information consists of previously generated virtual frames that are stored and used as reference points for generating subsequent virtual frames. By leveraging these stored virtual frames, the method reduces the need for extensive real-time computations, improving efficiency and frame consistency. The process ensures that the generated virtual frames maintain temporal coherence with both the input frames and the previously generated virtual frames, resulting in smoother motion and higher visual quality. This approach is particularly useful in applications such as video interpolation, frame rate conversion, and motion estimation, where maintaining smooth transitions between frames is critical. The use of multiple reference frames enhances the accuracy of motion prediction and interpolation, leading to more realistic and visually pleasing results.

Claim 4

Original Legal Text

4. The method of claim 3 , wherein, (iii), the real state network: generates a 1-st dimension vector by applying convolution operation to a virtual current frame state sum created by concatenating the virtual current frame information and the k pieces of the virtual previous frame information; generates a 2-nd dimension vector by applying fully-connected operation to the virtual action information; and generates the predicted real next frame information by applying deconvolution operation to a concatenation of the 1-st dimension vector and the 2-nd dimension vector.

Plain English Translation

This invention relates to a method for predicting real next frame information in a video sequence using a neural network architecture. The method addresses the challenge of accurately forecasting future video frames by leveraging both current and previous frame information along with action data. The system employs a real state network that processes virtual current frame information and virtual previous frame information to generate a predicted real next frame. Specifically, the real state network first creates a virtual current frame state sum by concatenating the virtual current frame information with k pieces of virtual previous frame information. A convolution operation is then applied to this concatenated data to produce a 1-st dimension vector. Separately, a fully-connected operation is applied to virtual action information to generate a 2-nd dimension vector. These two vectors are concatenated and passed through a deconvolution operation to produce the predicted real next frame information. This approach integrates temporal dependencies and action context to enhance frame prediction accuracy. The method is particularly useful in applications requiring real-time video synthesis or forecasting, such as autonomous systems, surveillance, and virtual reality.

Claim 5

Original Legal Text

5. The method of claim 4 , wherein the virtual current frame state sum is an H×W×(K+1) tensor created by concatenating (i) the k pieces of the virtual previous frame information and (ii) the virtual current frame information, and wherein the virtual current frame information is an H×W×C tensor, and wherein the 1-st dimension vector is an HWC-dimension vector, and wherein the 2-nd dimension vector is an L-dimension vector, and wherein the predicted real next frame information is an H×W×C tensor created by applying deconvolution operation to a 1×1×(HWC+L) tensor generated by concatenating the 1-st dimension vector and the 2-nd dimension vector.

Plain English Translation

This invention relates to video frame prediction using deep learning techniques, specifically addressing the challenge of accurately predicting future frames in a video sequence. The method involves generating a virtual current frame state sum, which is an H×W×(K+1) tensor formed by combining K pieces of virtual previous frame information with virtual current frame information. The virtual current frame information itself is an H×W×C tensor, where H, W, and C represent height, width, and channel dimensions, respectively. The method further processes this tensor by concatenating it with a 1-st dimension vector (an HWC-dimension vector) and a 2-nd dimension vector (an L-dimension vector) to form a 1×1×(HWC+L) tensor. A deconvolution operation is then applied to this concatenated tensor to produce the predicted real next frame information, which is an H×W×C tensor. This approach leverages temporal dependencies from previous frames and current frame information to enhance prediction accuracy in video sequences. The use of deconvolution helps reconstruct high-resolution frame details from compressed intermediate representations.

Claim 6

Original Legal Text

6. The method of claim 1 , wherein, (b), the calibrating device repeats, until the loss decreases: (b-1) a flat process of selecting one previous calibrated parameter among the previous calibrated parameters; (b-2) a second process of calibrating the selected one previous calibrated parameter with a preset learning rate by using the loss, wherein the second process generates one current calibrated parameter as an optimized parameter; (b-3) a third process of instructing the physics engine to apply its operation using the one current calibrated parameter and a rest of the previous calibrated parameters excluding the one previous calibrated parameter to the virtual current frame information and the virtual action information, wherein the third process generates new virtual next frame information; and (b-4) a fourth process of determining whether the loss decreases by using at least one new loss created by referring to the new virtual next frame information and the predicted real next frame information.

Plain English Translation

This invention relates to a method for optimizing parameters in a physics engine used for simulating virtual environments, particularly in applications like robotics, gaming, or virtual training. The problem addressed is the need for efficient calibration of physics engine parameters to minimize discrepancies between simulated and real-world outcomes, improving accuracy and performance. The method involves an iterative calibration process where a device repeatedly adjusts parameters to reduce a defined loss metric. First, a previous calibrated parameter is selected from a set of existing parameters. This parameter is then recalibrated using a preset learning rate, generating an optimized current parameter. The physics engine applies this current parameter along with the remaining unmodified parameters to virtual frame and action data, producing new virtual next-frame information. The system then evaluates whether the loss decreases by comparing the new virtual next-frame data with predicted real-world next-frame data. If the loss does not decrease, the process repeats with another parameter selection. This iterative approach ensures continuous refinement of parameters to minimize simulation errors, enhancing the accuracy of virtual physics simulations. The method is particularly useful in scenarios requiring high-fidelity simulations, such as robotics training or virtual prototyping.

Claim 7

Original Legal Text

7. The method of claim 6 , wherein, if the loss is determined as not decreased for any of the previous calibrated parameters, the calibrating device decreases the preset learning rate and performs the first process, the second process, the third process, and the fourth process.

Plain English Translation

This invention relates to a machine learning calibration method for optimizing model performance. The method addresses the challenge of efficiently adjusting model parameters to minimize loss while avoiding excessive computational overhead. The process involves iteratively calibrating parameters, evaluating loss, and dynamically adjusting the learning rate to improve convergence. The method begins by performing a first process to calibrate a set of parameters in a machine learning model. A second process then evaluates the loss function to determine if the loss has decreased compared to previous iterations. If the loss has not decreased, a third process reduces the preset learning rate. A fourth process then repeats the calibration, evaluation, and adjustment steps until the loss is minimized or a stopping condition is met. This adaptive learning rate adjustment ensures efficient convergence without manual intervention. The method ensures that if no loss reduction is observed across all previously calibrated parameters, the learning rate is systematically decreased to refine parameter adjustments. This iterative approach balances computational efficiency and model accuracy, making it suitable for applications requiring real-time or resource-constrained optimization. The dynamic learning rate adjustment prevents stagnation and ensures continuous improvement in model performance.

Claim 8

Original Legal Text

8. A method for learning a real state network capable of generating predicted next frame information corresponding to real action information on a real action performed in multiple pieces of real recent frame information by a deep learning-based device in a real environment, comprising steps of: (a) if multiple pieces of trajectory information corresponding to multiple pieces of the real action information on the real action performed by the deep learning-based device in the real environment are acquired as training data, generating, by a learning device, multiple pieces of recent frame information for training by referring to k pieces of previous real frame information and real current frame information at a point of time in specific trajectory information; (b) inputting into the real state network, by the learning device, the multiple pieces of the recent frame information for training and action information for training, wherein the action information for training is acquired by referring to real current action information of the specific trajectory information at the point of the time, and wherein the inputting allows the real state network to apply its operation using prediction parameters to the multiple pieces of the recent frame information for training and the action information for training and to output the predicted next frame information, and wherein the real network, in response to the inputting, applies the real state network operation using prediction parameters to the multiple pieces of the recent frame information for training and the action information for training and outputs the predicted next frame information; and (c) updating, by the learning device, the prediction parameters such that at least one loss is minimized, wherein the at least one loss is created by referring to the predicted next frame information and real next frame information, and wherein the real next frame information is next to the real current frame information in the specific trajectory information.

Plain English Translation

This invention relates to a deep learning-based method for training a real state network to predict future frames in a real environment. The method addresses the challenge of accurately forecasting the next frame in a sequence based on recent frames and action information, which is critical for applications like robotics, autonomous systems, and real-time simulations. The method involves acquiring multiple trajectories of real action information performed by a deep learning-based device in a real environment as training data. A learning device generates training data by selecting k previous real frames and a current real frame from a specific trajectory. The training data includes recent frame information and action information derived from the current action in the trajectory. The real state network processes this training data, applying its prediction parameters to generate predicted next frame information. The network's output is compared to the actual next frame in the trajectory to compute a loss. The prediction parameters are then updated to minimize this loss, improving the network's accuracy over time. This approach enables the network to learn from real-world interactions, enhancing its ability to predict future states based on past and current observations and actions. The method is particularly useful for systems requiring real-time decision-making in dynamic environments.

Claim 9

Original Legal Text

9. The method of claim 8 , wherein, at the step of (b), the learning device inputs a current frame state sum for training created by concatenating the multiple pieces of the recent frame information for training into a convolutional neural network of the real state network; and wherein the inputting of the current frame state sum allows the convolutional neural network to output a 1-st feature by applying a convolution operation to the current frame state sum for training; and wherein the convolutional neural network in response to the inputting of the current frame state sum for training, outputs the 1-st feature by applying the convolution operation to the current frame state sum for training; and wherein the learning device inputs the action information for training into at least one fully connected layer of the real state network; and wherein the inputting of the action information allows the at least one fully connected layer to output a 2-nd feature by applying a fully-connected operation to the action information for training; and wherein the at least one fully connected layer, in response to the inputting of the action information for training, outputs the 2-nd feature by applying the fully-connected operation to the action information for training; and wherein the learning device inputs a concatenated feature created by concatenating the 1-st feature and the 2-nd feature into a deconvolutional layer; and wherein the inputting of the concatenated feature allows the deconvolutional layer to output the predicted next frame information by applying a deconvolution operation to the concatenated feature; and wherein the deconvolutional layer, in response to the inputting of the concatenated feature, outputs the predicted next frame information by applying the deconvolution operation to the concatenated feature.

Plain English Translation

This invention relates to a method for predicting the next frame in a sequence using a neural network architecture. The method addresses the challenge of accurately forecasting future states in dynamic systems, such as video frames or time-series data, by leveraging convolutional and deconvolutional neural networks to process both spatial and temporal information. The method involves a learning device that processes recent frame information for training by concatenating multiple pieces of this data into a convolutional neural network (CNN) within a real state network. The CNN applies a convolution operation to this concatenated input, referred to as the current frame state sum, to generate a first feature. Simultaneously, the learning device inputs action information for training into at least one fully connected layer of the real state network, which applies a fully-connected operation to produce a second feature. These two features are then concatenated and fed into a deconvolutional layer, which applies a deconvolution operation to generate the predicted next frame information. This approach combines spatial feature extraction from the CNN with action-dependent transformations from the fully connected layers to improve prediction accuracy. The method is particularly useful in applications requiring real-time frame prediction, such as video synthesis, autonomous systems, or reinforcement learning environments.

Claim 10

Original Legal Text

10. The method of claim 9 , wherein the learning device instructs the convolutional neural network to output the current frame state sum for training, which is an H×W×(K+1) tensor created by concatenating the multiple pieces of the recent frame information for training which are H×W×C tensors, as the 1-st feature which is an HWC-dimension vector, and wherein the learning device instructs the at least one fully connected layer to output the action information for training, which is a 3-dimension vector, as the 2-nd feature which is an L-dimension vector, and wherein the learning device instructs the deconvolutional layer to output a 1×1×(HWC+L) tensor, created by concatenating the 1-st feature and the 2-nd feature, as the predicted next frame information which is an H×W×C tensor.

Plain English Translation

This invention relates to a method for training a convolutional neural network (CNN) to predict the next frame in a sequence, addressing challenges in video frame prediction and action-based state transitions. The method involves a learning device that processes recent frame information, represented as H×W×C tensors, where H is height, W is width, and C is the number of channels. The CNN outputs a current frame state sum, an H×W×(K+1) tensor formed by concatenating multiple pieces of recent frame information. This tensor serves as the first feature, an HWC-dimension vector. The learning device also instructs at least one fully connected layer to output action information as a 3-dimension vector, which is the second feature, an L-dimension vector. These features are concatenated into a 1×1×(HWC+L) tensor by a deconvolutional layer, producing the predicted next frame information as an H×W×C tensor. The method enables the CNN to learn both spatial-temporal dependencies and action-based transitions, improving frame prediction accuracy in dynamic environments. The approach integrates convolutional and fully connected layers to combine spatial features with action-related data, enhancing the model's ability to generate realistic future frames.

Claim 11

Original Legal Text

11. The method of claim 9 , wherein the learning device updates one or more parameters of at least one of: the convolutional neural network, the at least one fully connected layer, or the deconvolutional layer, and wherein the updating is according to a gradient descent using the loss.

Plain English Translation

A method for training a neural network system to process input data, such as images or signals, involves updating parameters of a convolutional neural network, fully connected layers, or a deconvolutional layer. The system processes input data through the convolutional neural network, which extracts features, followed by one or more fully connected layers for classification or regression tasks. A deconvolutional layer may reconstruct or upsample the processed data. During training, the system calculates a loss function based on the difference between the predicted output and the ground truth. The parameters of the neural network components are then adjusted using gradient descent optimization to minimize this loss. This iterative process refines the network's ability to accurately map inputs to desired outputs, improving performance in tasks like image recognition, segmentation, or signal reconstruction. The method ensures efficient parameter updates by leveraging gradient-based optimization, enhancing the network's learning efficiency and generalization.

Claim 12

Original Legal Text

12. A calibrating device for calibrating a physics engine of a virtual world simulator to be used for learning of a deep learning-based device, comprising: at least one memory that stores instructions; and at least one processor configured to execute the instructions to perform: (I) if virtual current frame information corresponding to a virtual current state in a virtual environment is acquired from the virtual world simulator, (i) a process of transmitting the virtual current frame information to the deep learning-based device, wherein the transmitting of the virtual current frame information instructs the deep learning-based device to apply its operation using previous learned parameters to the virtual current frame information and output virtual action information corresponding to the virtual current frame information, and wherein the dean learning-based device, in response to receiving the virtual current frame information from the calibrating device, applies the deep learning-based device operation using previous learned parameters to the virtual current frame information and outputs virtual action information corresponding to the virtual current frame information; (ii) a process of transmitting the virtual current frame information and the virtual action information to the physics engine of the virtual world simulator, wherein the transmitting of the virtual current frame information and the virtual action information to the physics engine instructs the physics engine to apply its operation using previous calibrated parameters to the virtual current frame information and the virtual action information and output virtual next frame information corresponding to the virtual current frame information and the virtual action information, and wherein the physics engine, in response to receiving the virtual current frame information and the virtual action information, applies the physics engine operation using previous calibrated parameters to the virtual current frame information and the virtual information and outputs virtual next frame information corresponding to the virtual current frame information and the virtual action information; and (iii) a process of transmitting the virtual current frame information and the virtual action information to a real state network that has been trained to output multiple pieces of predicted next frame information in response to real action information on a real action performed in multiple pieces of real recent frame information by the deep learning-based device in a real environment, wherein the transmitting of the virtual current frame information and the virtual action information to the real state network instructs the real state network to apply its operation using learned prediction parameters to the virtual action information and multiple pieces of virtual recent frame information corresponding to the virtual current frame information and output predicted real next frame information, wherein the real state network, in response to receiving the virtual current frame information and the virtual action information, applies the real state network operation using learned prediction parameters to the virtual action information and multiple pieces of the virtual recent frame information corresponding to the virtual current frame information and outputs predicted real next frame information; and (II) a process of calibrating and optimizing the previous calibrated parameters of the physics engine, such that at least one loss that created by referring to the virtual next frame information and the predicted real next frame information is minimized, wherein the calibrating and optimizing generates current calibrated parameters as optimized parameters.

Plain English Translation

A calibrating device is designed to adjust the parameters of a physics engine used in a virtual world simulator for training deep learning-based devices. The system addresses the challenge of ensuring the simulator accurately reflects real-world physics, which is critical for effective training of deep learning models. The device includes a processor and memory that execute a calibration process. When the simulator provides virtual current frame information representing a state in the virtual environment, the device transmits this data to the deep learning-based device, which processes it using its learned parameters to generate virtual action information. The device then sends both the virtual current frame and the action information to the physics engine, which applies its calibrated parameters to produce virtual next frame information. Simultaneously, the device forwards the same data to a real state network, a pre-trained model that predicts real-world outcomes based on recent frame data and actions. The real state network outputs predicted real next frame information. The calibrating device compares the virtual next frame information from the physics engine with the predicted real next frame information from the real state network. It then optimizes the physics engine's parameters to minimize the discrepancy between these outputs, generating updated calibrated parameters. This ensures the virtual simulator's physics engine closely mimics real-world behavior, improving the accuracy of deep learning training.

Claim 13

Original Legal Text

13. The calibrating device of claim 12 , wherein the processor further performs: (III) a process of transmitting the virtual next frame information and reward information corresponding to the virtual action information to the deep learning-based device, wherein the transmitting instructs the deep learning-based device to update the previous learned parameters via on-policy reinforcement learning, and wherein the on-policy reinforcement learning uses the virtual next frame information and the reward information, and wherein the deep learning-based device, in response to receiving the virtual next frame information and reward information, updates the previous learned parameters via the on-policy reinforcement learning.

Plain English Translation

This invention relates to a calibrating device for a deep learning-based system, specifically for updating learned parameters in reinforcement learning environments. The device addresses the challenge of efficiently training deep learning models by enabling on-policy reinforcement learning updates using virtual frame and reward data. The calibrating device includes a processor that generates virtual next frame information and reward information based on virtual action information. The processor then transmits this data to a deep learning-based device, instructing it to update its previously learned parameters. The deep learning-based device, upon receiving the virtual next frame information and reward information, performs on-policy reinforcement learning to adjust its parameters. This process ensures that the model learns from the most recent interactions, improving its performance in dynamic environments. The invention enhances the training efficiency of reinforcement learning systems by leveraging virtual data to refine model parameters in real-time.

Claim 14

Original Legal Text

14. The calibrating device of claim 12 , wherein the multiple pieces of the virtual recent frame information are generated by referring to the virtual current frame information and k pieces of virtual previous frame information received beforehand.

Plain English Translation

This invention relates to a calibrating device for generating virtual frame information in a system that processes sequential data, such as video frames or sensor readings. The problem addressed is the need to accurately calibrate or adjust virtual frame information by incorporating both current and historical data to improve consistency and accuracy in applications like video processing, augmented reality, or sensor fusion. The calibrating device generates multiple pieces of virtual recent frame information by referencing virtual current frame information and k pieces of virtual previous frame information that were received beforehand. The device includes a storage unit that stores the virtual current frame information and the k pieces of virtual previous frame information. A generation unit then processes this stored data to produce the virtual recent frame information. The generation unit may use interpolation, extrapolation, or other computational techniques to derive the virtual recent frame information from the stored frames. The device may also include a transmission unit to send the generated virtual recent frame information to another system component for further processing or display. This approach ensures that the virtual recent frame information is derived from a combination of the most recent and past data, improving temporal coherence and reducing artifacts in applications requiring smooth transitions between frames. The system is particularly useful in scenarios where real-time processing of sequential data is required, such as in video encoding, motion tracking, or sensor-based navigation.

Claim 15

Original Legal Text

15. The calibrating device of claim 14 , wherein, at the process of (iii), the real state network: generates a 1-st dimension vector by applying convolution operation to a virtual current frame state sum created by concatenating the virtual current frame information and the k pieces of the virtual previous frame information; generates a 2-nd dimension vector by applying fully-connected operation to the virtual action information; generates the predicted real next frame information by applying deconvolution operation to a concatenation of the 1-st dimension vector and the 2-nd dimension vector.

Plain English Translation

This invention relates to a calibrating device for a real state network used in video prediction or frame generation systems. The problem addressed is improving the accuracy of predicted video frames by refining the network's ability to model temporal dependencies and actions. The calibrating device processes virtual current and previous frame information along with virtual action information to generate predicted real next frame information. During calibration, the device generates a 1-st dimension vector by applying a convolution operation to a virtual current frame state sum, which is created by concatenating virtual current frame information with k pieces of virtual previous frame information. This captures temporal dependencies across multiple frames. Simultaneously, a 2-nd dimension vector is generated by applying a fully-connected operation to the virtual action information, encoding action-specific features. The predicted real next frame information is then produced by applying a deconvolution operation to a concatenation of the 1-st and 2-nd dimension vectors, combining temporal and action-based features for accurate frame prediction. This approach enhances the network's ability to generate realistic and coherent video frames by leveraging both historical context and action dynamics.

Claim 16

Original Legal Text

16. The calibrating device of claim 15 , wherein the virtual current frame state sum is an H×W×(K+1) tensor created by concatenating (i) the k pieces of the virtual previous frame information and (ii) the virtual current frame information, and wherein the virtual current frame information is an H×W×C tensor, and wherein the 1-st dimension vector is an HWC-dimension vector, and wherein the 2-nd dimension vector is an L-dimension vector, and wherein the predicted real next frame information is an H×W×C tensor created by applying deconvolution operation to a 1×1×(HWC+L) tensor generated by concatenating the 1-st dimension vector and the 2-nd dimension vector.

Plain English Translation

This invention relates to a calibrating device for video processing, specifically for predicting future frames in a video sequence. The device addresses the challenge of accurately estimating subsequent frames by leveraging both current and historical frame data. The core innovation involves generating a virtual current frame state sum, which is an H×W×(K+1) tensor. This tensor is constructed by combining K pieces of virtual previous frame information with virtual current frame information, where the virtual current frame information is an H×W×C tensor. The device then processes this combined data to produce a predicted real next frame. The prediction process involves transforming the concatenated tensor into an HWC-dimension vector and an L-dimension vector. These vectors are further concatenated into a 1×1×(HWC+L) tensor, which undergoes a deconvolution operation to generate the final predicted frame as an H×W×C tensor. This approach enhances frame prediction accuracy by integrating temporal and spatial information from multiple frames.

Claim 17

Original Legal Text

17. The calibrating device of claim 12 , wherein, at the process of (II), the processor repeats, until the loss decreases: (II-1) a first process of selecting one previous calibrated parameter among the previous calibrated parameters; (II-2) a second process of calibrating the selected one previous calibrated parameter with a preset learning rate by using the loss, wherein the second process generates one current calibrated parameter as an optimized parameter; (II-3) a third process of instructing the physics engine to apply its operation using the one current calibrated parameter and a rest of the previous calibrated parameters excluding the one previous calibrated parameter to the virtual current frame information and the virtual action information, wherein the third process generates new virtual next frame information; and (II-4) a fourth process of determining whether the loss decreases by using at least one new loss created by referring to the new virtual next frame information and the predicted real next frame information.

Plain English Translation

This invention relates to a calibrating device for optimizing parameters in a physics engine used for simulating virtual environments. The problem addressed is the need for efficient and accurate calibration of physics engine parameters to minimize discrepancies between simulated and real-world outcomes. The device includes a processor that iteratively adjusts calibrated parameters to reduce a defined loss function, which measures the difference between predicted and actual results. The calibration process involves selecting a previous calibrated parameter, adjusting it with a preset learning rate based on the loss, and then applying the updated parameter along with other unchanged parameters to virtual frame and action data. The physics engine generates new virtual next frame information, which is compared to predicted real next frame information to compute a new loss. This iterative loop continues until the loss decreases, ensuring the parameters are optimized for accurate simulations. The method ensures that each parameter is individually refined while maintaining the integrity of the overall simulation model. This approach improves the precision of virtual environment simulations by dynamically adjusting parameters to minimize prediction errors.

Claim 18

Original Legal Text

18. The calibrating device of claim 17 , wherein, if the loss is determined as not decreased for any of the previous calibrated parameters, the processor decreases the preset learning rate and performs the first process, the second process, the third process, and the fourth process.

Plain English Translation

This invention relates to a calibrating device for optimizing parameters in a system, particularly in scenarios where initial calibration attempts fail to reduce loss. The device includes a processor configured to perform a series of processes to adjust parameters iteratively. If the loss does not decrease after calibrating all previous parameters, the processor reduces a preset learning rate and repeats the calibration steps. The first process involves selecting a parameter to calibrate, the second process adjusts the selected parameter, the third process evaluates the effect of the adjustment on system performance, and the fourth process determines whether the loss has decreased. If no improvement is observed, the learning rate is decreased to refine the calibration process further. This iterative approach ensures gradual and controlled adjustments, preventing overshooting or instability in parameter optimization. The device is particularly useful in machine learning, control systems, or any application requiring precise parameter tuning to minimize loss or error. The method ensures robustness by dynamically adapting the learning rate based on performance feedback, enhancing convergence efficiency.

Claim 19

Original Legal Text

19. A learning device for learning a real state network capable of generating predicted next frame information corresponding to real action information on a real action performed in multiple pieces of real recent frame information by a deep learning-based device in a real environment, comprising: at least one memory that stores instructions; and at least one processor configured to execute the instructions to perform: (I) if multiple pieces of trajectory information corresponding to multiple pieces of the real action information on the real action performed by the deep learning-based device in the real environment are acquired as training data, a process of generating multiple pieces of recent frame information for training by referring to k pieces of previous real frame information and real current frame information at a point of time in specific trajectory information; (II) a process of inputting into the real state network the multiple pieces of the recent frame information for training and action information for training, wherein the action information for training is acquired by referring to real current action information of the specific trajectory information at the point of the time, and wherein the inputting allows the real state network to apply its operation using prediction parameters to the multiple pieces of the recent frame information for training and the action information for training and to output the predicted next frame information, and wherein the real state network, in response to the performing of the second process, (1) applies the real state network operation using prediction parameters to the multiple pieces of the recent frame information for training and the action information for training, and (2) outputs the predicted next frame information; and (III) a third process of updating the prediction parameters such that at least one loss is minimized, wherein the at least one loss is created by referring to the predicted next frame information and real next frame information, and wherein the real next frame information is next to the real current frame information in the specific trajectory information.

Plain English Translation

A learning device trains a real state network to predict future frames based on recent frames and actions in a real environment. The system addresses challenges in deep learning-based devices that need to anticipate outcomes of actions in dynamic environments. The device includes memory and processors that execute training processes. First, it generates training data by selecting multiple pieces of recent frame information from a specific trajectory, referencing previous and current real frames. Second, it inputs this training data along with corresponding action information into the real state network, which processes the inputs using prediction parameters to output predicted next frames. Third, it updates the prediction parameters to minimize loss, comparing predicted frames against real next frames from the trajectory. This iterative process refines the network's ability to accurately forecast future states based on observed actions and environmental data. The system enables deep learning devices to improve action prediction in real-world scenarios by leveraging historical trajectory data.

Claim 20

Original Legal Text

20. The learning device of claim 19 , wherein, at the process of (II), the processor performs: a process of inputting a current frame state sum for training created by concatenating the multiple pieces of the recent frame information for training into a convolutional neural network of the real state network, wherein the inputting of the current frame state sum allows the convolutional neural network to output a 1-st feature by applying a convolution operation to the current frame state sum for training, and wherein the convolutional neural network, in response to the inputting of the current frame state sum, outputs the 1-st feature by applying the convolution operation to the current frame state sum for training; a process of inputting the action information for training into at least one fully connected layer of the real state network, wherein the inputting of the action information allows the at least one fully connected layer to output a 2-nd feature by applying fully-connected operation to the action information for training, and wherein the at least one fully connected layer, in response to the inputting of the action information, outputs the 2-nd feature by applying the fully-connected operation to the action information for training; and a process of inputting a concatenated feature created by concatenating the 1-st feature and the 2-nd feature into a deconvolutional layer, wherein the inputting of the concatenated feature allows the deconvolutional layer to output the predicted next frame information by applying a deconvolution operation to the concatenated feature, and wherein the deconvolutional layer, in response to the inputting of the concatenated feature, outputs the predicted next frame information by applying the deconvolution operation to the concatenated feature.

Plain English Translation

This invention relates to a learning device for predicting future frames in a sequence, such as video frames, using a neural network architecture. The device addresses the challenge of accurately forecasting subsequent frames based on recent frame information and action inputs, which is useful in applications like video prediction, robotics, and autonomous systems. The learning device processes multiple pieces of recent frame information for training by concatenating them into a current frame state sum. This sum is input into a convolutional neural network (CNN) within a real state network, where a convolution operation generates a first feature. Simultaneously, action information for training is fed into at least one fully connected layer of the real state network, producing a second feature through a fully connected operation. The first and second features are then concatenated and passed into a deconvolutional layer, which applies a deconvolution operation to output predicted next frame information. This architecture enables the device to learn and predict future frames by combining spatial-temporal features from the input frames with action-related features, improving prediction accuracy. The system is designed to enhance frame prediction in dynamic environments where actions influence future states.

Claim 21

Original Legal Text

21. The learning device of claim 20 , wherein the processor performs: a process of instructing the convolutional neural network to output the current frame state sum for training, which is an H×W×(K+1) tensor created by concatenating the multiple pieces of the recent frame information for training which are H×W×C tensors, as the 1-st feature which is an HWC-dimension vector; a process of instructing the at least one fully connected layer to output the action information for training, which is a 3-dimension vector, as the 2-nd feature which is an L-dimension vector; and a process of instructing the deconvolutional layer to output a 1×1×(HWC+L) tensor, created by concatenating the 1-st feature and the 2-nd feature, as the predicted next frame information which is an H×W×C tensor.

Plain English Translation

This invention relates to a learning device for predicting the next frame in a sequence, particularly in video processing or time-series data analysis. The device addresses the challenge of accurately forecasting future frames by leveraging convolutional neural networks (CNNs) and fully connected layers to generate high-fidelity predictions. The learning device includes a processor that processes recent frame information, represented as H×W×C tensors, where H is height, W is width, and C is the number of channels. The processor instructs a convolutional neural network to output a current frame state sum, which is an H×W×(K+1) tensor formed by concatenating multiple pieces of recent frame information. This tensor serves as the first feature, an HWC-dimension vector. Additionally, the processor directs at least one fully connected layer to output action information, a 3-dimension vector, which acts as the second feature, an L-dimension vector. These features are then concatenated into a 1×1×(HWC+L) tensor. A deconvolutional layer processes this concatenated tensor to produce the predicted next frame information, formatted as an H×W×C tensor. This approach integrates spatial and temporal features to enhance prediction accuracy, making it suitable for applications in video frame interpolation, autonomous systems, and other domains requiring precise frame forecasting.

Claim 22

Original Legal Text

22. The learning device of claim 20 , wherein the processor performs a process of updating one or more parameters of at least one of: the convolutional neural network, the at least one fully connected layer, or the deconvolutional layer; and wherein the updating is according to a gradient descent using the loss.

Plain English Translation

A learning device is designed to process and analyze data using a neural network architecture. The device includes a convolutional neural network for feature extraction, at least one fully connected layer for classification or regression tasks, and a deconvolutional layer for reconstructing or interpreting features. The device is configured to update the parameters of these components based on a loss function, which measures the difference between predicted and actual outputs. The updating process involves gradient descent optimization, where the parameters are adjusted iteratively to minimize the loss. This iterative refinement improves the accuracy and performance of the neural network over time. The device is particularly useful in applications requiring high-dimensional data processing, such as image recognition, natural language processing, or autonomous systems, where efficient learning and adaptation are critical. The gradient descent method ensures that the neural network converges toward optimal parameters, enhancing its predictive capabilities. The system may be implemented in hardware or software, depending on the application requirements, and can be integrated into larger machine learning frameworks for real-time or batch processing.

Patent Metadata

Filing Date

Unknown

Publication Date

September 15, 2020

Inventors

Kye-Hyeon Kim
Yongjoong Kim
Hak-Kyoung Kim
Woonhyun Nam
SukHoon Boo
Myungchul Sung
Dongsoo Shin
Donghun Yeo
Wooju Ryu
Myeong-Chun Lee
Hyungsoo Lee
Taewoong Jang
Kyungjoong Jeong
Hongmo Je
Hojin Cho

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “METHOD AND DEVICE FOR CALIBRATING PHYSICS ENGINE OF VIRTUAL WORLD SIMULATOR TO BE USED FOR LEARNING OF DEEP LEARNING-BASED DEVICE, AND A LEARNING METHOD AND LEARNING DEVICE FOR REAL STATE NETWORK USED THEREFOR” (10776542). https://patentable.app/patents/10776542

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10776542. See llms.txt for full attribution policy.