Legal claims defining the scope of protection, as filed with the USPTO.
2. The system of claim 1, wherein the recorded observations include video recordings of the robotic system performing the task.
3. The system of claim 1, wherein the computing system, in selecting the first and second trials for pairwise evaluation, is programmed to compare vector representations of the first and second trials.
4. The system of claim 1, wherein the computing system, in selecting the first and second trials for pairwise evaluation, is programmed to use a k-nearest neighbors algorithm to compare the first and second trials.
5. The system of claim 1, wherein the computing system is programmed with executable instructions to generate a difference vector representing said differences between the first trial and the second trial, and to weight a policy update from a reinforcement learning system by the difference vector.
6. The system of claim 1, wherein the computing system is configured to use a trained machine learning classifier to determine that the first trial achieved a higher degree of success than the second trial in performing the task.
7. The system of claim 1, wherein the computing system is configured to use feedback from a human to determine that the first trial achieved a higher degree of success than the second trial in performing the task.
9. The method of claim 8, wherein the recorded observations include video recordings of the robotic system performing the task.
10. The method of claim 8, wherein the computing system, in performing the pairwise comparison, compares vector representations of the recorded observations.
11. The method of claim 8, wherein the computing system, in selecting the first and second trials for pairwise comparison, uses a k-nearest neighbors algorithm to measure trial similarity.
12. The method of claim 8, further comprising, by the computing system, generating a difference vector representing said differences between the first trial and the second trial, and weighting a policy update from a reinforcement learning system based on the difference vector.
13. The method of claim 8, wherein determining that the first trial achieved a higher degree of success comprises using a trained machine learning classifier to evaluate the recorded observations.
14. The method of claim 8, wherein determining that the first trial achieved a higher degree of success comprises receiving feedback from a human.
17. The non-transitory computer-readable medium of claim 16, wherein updating the control policy comprises weighting an update from a reinforcement learning system based on differences between the first and second vectors.
18. The non-transitory computer-readable medium of claim 17, the operations further comprising generating the update using the reinforcement learning system based at least partly on the pairwise comparison.
19. The non-transitory computer-readable medium of claim 15, wherein selecting the first and second trials comprises using a k-nearest neighbors algorithm to select the first and second trials.
20. The non-transitory computer-readable medium of claim 15, wherein the recorded observations comprise video recordings of the trials.
Unknown
July 25, 2023
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.