Method and Device for Calibrating Physics Engine of Virtual World Simulator to Be Used for Learning of Deep Learning-Based Device, and a Learning Method and Learning Device for Real State Network Used Therefor

PublishedSeptember 15, 2020

Assigneenot available in USPTO data we have

InventorsKye-Hyeon Kim Yongjoong Kim Hak-Kyoung Kim Woonhyun Nam SukHoon Boo+10 more

Technical Abstract

Patent Claims

22 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method for calibrating a physics engine of a virtual world simulator to be used for learning of a deep learning-based device, comprising: (a) if virtual current frame information corresponding to a virtual current state in a virtual environment is acquired from the virtual world simulator: (i) transmitting, by a calibrating device, the virtual current frame information to the deep learning-based device, wherein the transmitting of the virtual current frame information to the deep learning-based device instructs the deep learning-based device to apply its operation using previous learned parameters to the virtual current frame information and output virtual action information corresponding to the virtual current frame information, and wherein the deep learning-based device, in response to receiving the virtual current frame information from the calibrating device, applies the deep learning-based device operation using previous learned parameters to the virtual current frame information and outputs virtual action information corresponding to the virtual current frame information; (ii) transmitting, by the calibrating device, the virtual current frame information and the virtual action information to the physics engine of the virtual world simulator, wherein the transmitting of the virtual current frame information to the physics engine instructs the physics engine to apply its operation using previous calibrated parameters to the virtual current frame information and the virtual action information and to output virtual next frame information corresponding to the virtual current frame information and the virtual action information, wherein the physics engine, in response to receiving the virtual current frame information and the virtual action information, applies the physics engine operation using previous calibrated parameters to the virtual current frame information and the virtual information and outputs virtual next frame information corresponding to the virtual current frame information and the virtual action information; and (iii) transmitting, by the calibrating device, the virtual current frame information and the virtual action information to a real state network that has been trained to output multiple pieces of predicted next frame information in response to real action information on a real action performed in multiple pieces of real recent frame information by the deep learning-based device in a real environment, wherein the transmitting of the virtual current frame information and the virtual action information to the real state network instructs the real state network to apply its operation using learned prediction parameters to the virtual action information and multiple pieces of virtual recent frame information corresponding to the virtual current frame information and to output predicted real next frame information, wherein the real state network, in response to receiving the virtual current frame information and the virtual action information, applies the real state network operation using learned prediction parameters to the virtual action information and multiple pieces of the virtual recent frame information corresponding to the virtual current frame information and outputs predicted real next frame information; and (b) calibrating and optimizing, by the calibrating device, the previous calibrated parameters of the physics engine, such that at least one loss that is created by referring to the virtual next frame information and the predicted real next frame information is minimized, wherein the calibrating and optimizing generates current calibrated parameters as optimized parameters.

2. The method of claim 1 , further comprising: (c) transmitting, by the calibrating device, the virtual next frame information and reward information corresponding to the virtual action information to the deep learning-based device, wherein the transmitting of the virtual next frame information and reward information instructs the deep learning-based device to update the previous learned parameters via on-policy reinforcement learning, and wherein the on-policy reinforcement learning uses the virtual next frame information and the reward information, and wherein the deep learning-based device, in response to receiving the virtual next frame information and reward information, updates the previous learned parameters via the on-policy reinforcement learning.

3. The method of claim 1 , wherein the multiple pieces of the virtual recent frame information are generated by referring to the virtual current frame information and k pieces of virtual previous frame information received beforehand.

4. The method of claim 3 , wherein, (iii), the real state network: generates a 1-st dimension vector by applying convolution operation to a virtual current frame state sum created by concatenating the virtual current frame information and the k pieces of the virtual previous frame information; generates a 2-nd dimension vector by applying fully-connected operation to the virtual action information; and generates the predicted real next frame information by applying deconvolution operation to a concatenation of the 1-st dimension vector and the 2-nd dimension vector.

5. The method of claim 4 , wherein the virtual current frame state sum is an H×W×(K+1) tensor created by concatenating (i) the k pieces of the virtual previous frame information and (ii) the virtual current frame information, and wherein the virtual current frame information is an H×W×C tensor, and wherein the 1-st dimension vector is an HWC-dimension vector, and wherein the 2-nd dimension vector is an L-dimension vector, and wherein the predicted real next frame information is an H×W×C tensor created by applying deconvolution operation to a 1×1×(HWC+L) tensor generated by concatenating the 1-st dimension vector and the 2-nd dimension vector.

6. The method of claim 1 , wherein, (b), the calibrating device repeats, until the loss decreases: (b-1) a flat process of selecting one previous calibrated parameter among the previous calibrated parameters; (b-2) a second process of calibrating the selected one previous calibrated parameter with a preset learning rate by using the loss, wherein the second process generates one current calibrated parameter as an optimized parameter; (b-3) a third process of instructing the physics engine to apply its operation using the one current calibrated parameter and a rest of the previous calibrated parameters excluding the one previous calibrated parameter to the virtual current frame information and the virtual action information, wherein the third process generates new virtual next frame information; and (b-4) a fourth process of determining whether the loss decreases by using at least one new loss created by referring to the new virtual next frame information and the predicted real next frame information.

7. The method of claim 6 , wherein, if the loss is determined as not decreased for any of the previous calibrated parameters, the calibrating device decreases the preset learning rate and performs the first process, the second process, the third process, and the fourth process.

8. A method for learning a real state network capable of generating predicted next frame information corresponding to real action information on a real action performed in multiple pieces of real recent frame information by a deep learning-based device in a real environment, comprising steps of: (a) if multiple pieces of trajectory information corresponding to multiple pieces of the real action information on the real action performed by the deep learning-based device in the real environment are acquired as training data, generating, by a learning device, multiple pieces of recent frame information for training by referring to k pieces of previous real frame information and real current frame information at a point of time in specific trajectory information; (b) inputting into the real state network, by the learning device, the multiple pieces of the recent frame information for training and action information for training, wherein the action information for training is acquired by referring to real current action information of the specific trajectory information at the point of the time, and wherein the inputting allows the real state network to apply its operation using prediction parameters to the multiple pieces of the recent frame information for training and the action information for training and to output the predicted next frame information, and wherein the real network, in response to the inputting, applies the real state network operation using prediction parameters to the multiple pieces of the recent frame information for training and the action information for training and outputs the predicted next frame information; and (c) updating, by the learning device, the prediction parameters such that at least one loss is minimized, wherein the at least one loss is created by referring to the predicted next frame information and real next frame information, and wherein the real next frame information is next to the real current frame information in the specific trajectory information.

9. The method of claim 8 , wherein, at the step of (b), the learning device inputs a current frame state sum for training created by concatenating the multiple pieces of the recent frame information for training into a convolutional neural network of the real state network; and wherein the inputting of the current frame state sum allows the convolutional neural network to output a 1-st feature by applying a convolution operation to the current frame state sum for training; and wherein the convolutional neural network in response to the inputting of the current frame state sum for training, outputs the 1-st feature by applying the convolution operation to the current frame state sum for training; and wherein the learning device inputs the action information for training into at least one fully connected layer of the real state network; and wherein the inputting of the action information allows the at least one fully connected layer to output a 2-nd feature by applying a fully-connected operation to the action information for training; and wherein the at least one fully connected layer, in response to the inputting of the action information for training, outputs the 2-nd feature by applying the fully-connected operation to the action information for training; and wherein the learning device inputs a concatenated feature created by concatenating the 1-st feature and the 2-nd feature into a deconvolutional layer; and wherein the inputting of the concatenated feature allows the deconvolutional layer to output the predicted next frame information by applying a deconvolution operation to the concatenated feature; and wherein the deconvolutional layer, in response to the inputting of the concatenated feature, outputs the predicted next frame information by applying the deconvolution operation to the concatenated feature.

10. The method of claim 9 , wherein the learning device instructs the convolutional neural network to output the current frame state sum for training, which is an H×W×(K+1) tensor created by concatenating the multiple pieces of the recent frame information for training which are H×W×C tensors, as the 1-st feature which is an HWC-dimension vector, and wherein the learning device instructs the at least one fully connected layer to output the action information for training, which is a 3-dimension vector, as the 2-nd feature which is an L-dimension vector, and wherein the learning device instructs the deconvolutional layer to output a 1×1×(HWC+L) tensor, created by concatenating the 1-st feature and the 2-nd feature, as the predicted next frame information which is an H×W×C tensor.

11. The method of claim 9 , wherein the learning device updates one or more parameters of at least one of: the convolutional neural network, the at least one fully connected layer, or the deconvolutional layer, and wherein the updating is according to a gradient descent using the loss.

12. A calibrating device for calibrating a physics engine of a virtual world simulator to be used for learning of a deep learning-based device, comprising: at least one memory that stores instructions; and at least one processor configured to execute the instructions to perform: (I) if virtual current frame information corresponding to a virtual current state in a virtual environment is acquired from the virtual world simulator, (i) a process of transmitting the virtual current frame information to the deep learning-based device, wherein the transmitting of the virtual current frame information instructs the deep learning-based device to apply its operation using previous learned parameters to the virtual current frame information and output virtual action information corresponding to the virtual current frame information, and wherein the dean learning-based device, in response to receiving the virtual current frame information from the calibrating device, applies the deep learning-based device operation using previous learned parameters to the virtual current frame information and outputs virtual action information corresponding to the virtual current frame information; (ii) a process of transmitting the virtual current frame information and the virtual action information to the physics engine of the virtual world simulator, wherein the transmitting of the virtual current frame information and the virtual action information to the physics engine instructs the physics engine to apply its operation using previous calibrated parameters to the virtual current frame information and the virtual action information and output virtual next frame information corresponding to the virtual current frame information and the virtual action information, and wherein the physics engine, in response to receiving the virtual current frame information and the virtual action information, applies the physics engine operation using previous calibrated parameters to the virtual current frame information and the virtual information and outputs virtual next frame information corresponding to the virtual current frame information and the virtual action information; and (iii) a process of transmitting the virtual current frame information and the virtual action information to a real state network that has been trained to output multiple pieces of predicted next frame information in response to real action information on a real action performed in multiple pieces of real recent frame information by the deep learning-based device in a real environment, wherein the transmitting of the virtual current frame information and the virtual action information to the real state network instructs the real state network to apply its operation using learned prediction parameters to the virtual action information and multiple pieces of virtual recent frame information corresponding to the virtual current frame information and output predicted real next frame information, wherein the real state network, in response to receiving the virtual current frame information and the virtual action information, applies the real state network operation using learned prediction parameters to the virtual action information and multiple pieces of the virtual recent frame information corresponding to the virtual current frame information and outputs predicted real next frame information; and (II) a process of calibrating and optimizing the previous calibrated parameters of the physics engine, such that at least one loss that created by referring to the virtual next frame information and the predicted real next frame information is minimized, wherein the calibrating and optimizing generates current calibrated parameters as optimized parameters.

13. The calibrating device of claim 12 , wherein the processor further performs: (III) a process of transmitting the virtual next frame information and reward information corresponding to the virtual action information to the deep learning-based device, wherein the transmitting instructs the deep learning-based device to update the previous learned parameters via on-policy reinforcement learning, and wherein the on-policy reinforcement learning uses the virtual next frame information and the reward information, and wherein the deep learning-based device, in response to receiving the virtual next frame information and reward information, updates the previous learned parameters via the on-policy reinforcement learning.

14. The calibrating device of claim 12 , wherein the multiple pieces of the virtual recent frame information are generated by referring to the virtual current frame information and k pieces of virtual previous frame information received beforehand.

15. The calibrating device of claim 14 , wherein, at the process of (iii), the real state network: generates a 1-st dimension vector by applying convolution operation to a virtual current frame state sum created by concatenating the virtual current frame information and the k pieces of the virtual previous frame information; generates a 2-nd dimension vector by applying fully-connected operation to the virtual action information; generates the predicted real next frame information by applying deconvolution operation to a concatenation of the 1-st dimension vector and the 2-nd dimension vector.

16. The calibrating device of claim 15 , wherein the virtual current frame state sum is an H×W×(K+1) tensor created by concatenating (i) the k pieces of the virtual previous frame information and (ii) the virtual current frame information, and wherein the virtual current frame information is an H×W×C tensor, and wherein the 1-st dimension vector is an HWC-dimension vector, and wherein the 2-nd dimension vector is an L-dimension vector, and wherein the predicted real next frame information is an H×W×C tensor created by applying deconvolution operation to a 1×1×(HWC+L) tensor generated by concatenating the 1-st dimension vector and the 2-nd dimension vector.

17. The calibrating device of claim 12 , wherein, at the process of (II), the processor repeats, until the loss decreases: (II-1) a first process of selecting one previous calibrated parameter among the previous calibrated parameters; (II-2) a second process of calibrating the selected one previous calibrated parameter with a preset learning rate by using the loss, wherein the second process generates one current calibrated parameter as an optimized parameter; (II-3) a third process of instructing the physics engine to apply its operation using the one current calibrated parameter and a rest of the previous calibrated parameters excluding the one previous calibrated parameter to the virtual current frame information and the virtual action information, wherein the third process generates new virtual next frame information; and (II-4) a fourth process of determining whether the loss decreases by using at least one new loss created by referring to the new virtual next frame information and the predicted real next frame information.

18. The calibrating device of claim 17 , wherein, if the loss is determined as not decreased for any of the previous calibrated parameters, the processor decreases the preset learning rate and performs the first process, the second process, the third process, and the fourth process.

19. A learning device for learning a real state network capable of generating predicted next frame information corresponding to real action information on a real action performed in multiple pieces of real recent frame information by a deep learning-based device in a real environment, comprising: at least one memory that stores instructions; and at least one processor configured to execute the instructions to perform: (I) if multiple pieces of trajectory information corresponding to multiple pieces of the real action information on the real action performed by the deep learning-based device in the real environment are acquired as training data, a process of generating multiple pieces of recent frame information for training by referring to k pieces of previous real frame information and real current frame information at a point of time in specific trajectory information; (II) a process of inputting into the real state network the multiple pieces of the recent frame information for training and action information for training, wherein the action information for training is acquired by referring to real current action information of the specific trajectory information at the point of the time, and wherein the inputting allows the real state network to apply its operation using prediction parameters to the multiple pieces of the recent frame information for training and the action information for training and to output the predicted next frame information, and wherein the real state network, in response to the performing of the second process, (1) applies the real state network operation using prediction parameters to the multiple pieces of the recent frame information for training and the action information for training, and (2) outputs the predicted next frame information; and (III) a third process of updating the prediction parameters such that at least one loss is minimized, wherein the at least one loss is created by referring to the predicted next frame information and real next frame information, and wherein the real next frame information is next to the real current frame information in the specific trajectory information.

20. The learning device of claim 19 , wherein, at the process of (II), the processor performs: a process of inputting a current frame state sum for training created by concatenating the multiple pieces of the recent frame information for training into a convolutional neural network of the real state network, wherein the inputting of the current frame state sum allows the convolutional neural network to output a 1-st feature by applying a convolution operation to the current frame state sum for training, and wherein the convolutional neural network, in response to the inputting of the current frame state sum, outputs the 1-st feature by applying the convolution operation to the current frame state sum for training; a process of inputting the action information for training into at least one fully connected layer of the real state network, wherein the inputting of the action information allows the at least one fully connected layer to output a 2-nd feature by applying fully-connected operation to the action information for training, and wherein the at least one fully connected layer, in response to the inputting of the action information, outputs the 2-nd feature by applying the fully-connected operation to the action information for training; and a process of inputting a concatenated feature created by concatenating the 1-st feature and the 2-nd feature into a deconvolutional layer, wherein the inputting of the concatenated feature allows the deconvolutional layer to output the predicted next frame information by applying a deconvolution operation to the concatenated feature, and wherein the deconvolutional layer, in response to the inputting of the concatenated feature, outputs the predicted next frame information by applying the deconvolution operation to the concatenated feature.

21. The learning device of claim 20 , wherein the processor performs: a process of instructing the convolutional neural network to output the current frame state sum for training, which is an H×W×(K+1) tensor created by concatenating the multiple pieces of the recent frame information for training which are H×W×C tensors, as the 1-st feature which is an HWC-dimension vector; a process of instructing the at least one fully connected layer to output the action information for training, which is a 3-dimension vector, as the 2-nd feature which is an L-dimension vector; and a process of instructing the deconvolutional layer to output a 1×1×(HWC+L) tensor, created by concatenating the 1-st feature and the 2-nd feature, as the predicted next frame information which is an H×W×C tensor.

22. The learning device of claim 20 , wherein the processor performs a process of updating one or more parameters of at least one of: the convolutional neural network, the at least one fully connected layer, or the deconvolutional layer; and wherein the updating is according to a gradient descent using the loss.

Patent Metadata

Filing Date

Unknown

Publication Date

September 15, 2020

Inventors

Kye-Hyeon Kim

Yongjoong Kim

Hak-Kyoung Kim

Woonhyun Nam

SukHoon Boo

Myungchul Sung

Dongsoo Shin

Donghun Yeo

Wooju Ryu

Myeong-Chun Lee

Hyungsoo Lee

Taewoong Jang

Kyungjoong Jeong

Hongmo Je

Hojin Cho

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search