11501157

Action Shaping from Demonstration for Fast Reinforcement Learning

PublishedNovember 15, 2022
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
17 claims

Legal claims defining the scope of protection, as filed with the USPTO.

2

2. The computer-implemented method of claim 1, wherein the neural network is trained such that the first set satisfies each of the plurality of action constraints and the second set violates at least one of the plurality of action constraints, evaluated with each of the plurality of real-valued constraint functions.

3

3. The computer-implemented method of claim 1, wherein training the policy comprises calculating, by using each of the plurality of real-valued constraint functions, an action closest to the action predicted by the policy among actions which satisfy each of the plurality of action constraints and executing the calculated action on an environment to obtain a reward for the reinforcement learning.

4

4. The computer-implemented method of claim 1, wherein of the plurality of action constraints is an inequality constraint.

5

5. The computer-implemented method of claim 1, wherein the first set is relaxed to allow non-optimal demonstrations that are directed closer towards succeeding than failing.

6

6. The computer-implemented method of claim 1, wherein the evaluation of each of the plurality of action constraints is performed relative to a violation margin and a satisfaction margin, wherein for a given one of the restricted actions, the violation margin represents a margin of violation between the action and the plurality of action constraints, and the satisfaction margin represents a margin of satisfaction between the action and the plurality of action constraints.

7

7. The computer-implemented method of claim 1, wherein the first set and the second set of state-action tuples are used as action ranges during the exploration in the reinforcement learning.

9

9. The computer program product of claim 8, wherein the neural network is trained such that the first set satisfies each of the plurality of action constraints and the second set violates at least one of the plurality of action constraints, evaluated with each of the plurality of real-valued constraint functions.

10

10. The computer program product of claim 8, wherein training the policy comprises calculating, by using each of the plurality of real-valued constraint functions, an action closest to the action predicted by the policy among actions which satisfy each of the plurality of action constraints and executing the calculated action on an environment to obtain a reward for the reinforcement learning.

11

11. The computer program product of claim 8, wherein each of the plurality of action constraints is an inequality constraint.

12

12. The computer program product of claim 8, wherein the first set is relaxed to allow non-optimal demonstrations that are directed closer towards succeeding than failing.

13

13. The computer program product of claim 8, wherein the evaluation of each of the plurality of action constraints is performed relative to a violation margin and a satisfaction margin, wherein for a given one of the restricted actions, the violation margin represents a margin of violation between the action and the plurality of action constraints, and the satisfaction margin represents a margin of satisfaction between the action and the plurality of action constraints.

14

14. The computer program product of claim 8, wherein the first set and the second set of state-action tuples are used as action ranges during the exploration in the reinforcement learning.

16

16. The computer processing system of claim 15, wherein the processor device trains the neural network such that the first set satisfies each of the plurality of action constraints and the second set violates at least one of the plurality of action constraints, evaluated with each of the plurality of real-valued constraint functions.

17

17. The computer processing system of claim 15, wherein the processor device trains the policy by calculating, by using each of the plurality of real-valued constraint functions, an action closest to the action predicted by the policy among actions which satisfy each of the plurality of action constraints and executing the calculated action on an environment to obtain a reward for the reinforcement learning.

18

18. The computer processing system of claim 15, wherein each of the plurality of action constraints is an inequality constraint.

19

19. The computer processing system of claim 15, wherein the first set is relaxed to allow non-optimal demonstrations that are directed closer towards succeeding than failing.

20

20. The computer processing system of claim 15, wherein the evaluation of each of the plurality of action constraints is performed relative to a violation margin and a satisfaction margin, wherein for a given one of the restricted actions, the violation margin represents a margin of violation between the action and the plurality of action constraints, and the satisfaction margin represents a margin of satisfaction between the action and the plurality of action constraints.

Patent Metadata

Filing Date

Unknown

Publication Date

November 15, 2022

Inventors

Tu-Hoa Pham
Don Joven Ravoy Agravante
Giovanni De Magistris
Ryuki Tachibana

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “ACTION SHAPING FROM DEMONSTRATION FOR FAST REINFORCEMENT LEARNING” (11501157). https://patentable.app/patents/11501157

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.