Unsupervised Method to Generate Annotations for Natural Language Understanding Tasks

PublishedJune 3, 2025

Assigneenot available in USPTO data we have

InventorsHany Mohamed Hassan AWADALLA Subhabrata Mukherjee Ahmed Awadallah

Technical Abstract

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A computer implemented method for training a machine learning model with generated annotations of source instances and while facilitating security of the source instances, the method being implemented by a computing system that includes at least one hardware processor and the method comprising: the computing system receiving electronic content comprising (i) a set of target templates comprising a plurality of keys and (ii) a set of vocabulary words comprising a plurality of values, the plurality of values corresponding to the plurality of keys; the computing system automatically populating the set of target templates with the set of vocabulary words to generate training data comprising synthetically populated target templates of key-value pairings formatted as annotated machine-readable text; the computing system training a machine learning model with the training data, the machine learning model being configured to understand an association between the plurality of keys and the plurality of values of the key-value pairings included in the populated target templates; and the computing system combining the machine learning model with a different machine learning model that was trained to understand a semantic structure of unannotated natural language, the machine learning model and the different machine learning model being combined into a coupled machine learning model by aligning word embeddings output from the machine learning model and word embeddings output from the different machine learning model, the coupled machine learning model being configured to transform new unannotated natural language into annotated machine-readable text.

2. The method of claim 1, wherein the coupled machine learning model is further configured to transform machine-readable code into natural language.

3. The method of claim 1, further comprising: the computing system using the coupled machine learning model to transform unannotated natural language into annotated machine-readable text.

4. The method of claim 1, wherein the unannotated natural language comprises an unstructured query and the annotated machine-readable text comprises a query structured according to a particular target schema or particular target programming language.

5. The method of claim 1, further comprising: the computing system performing a natural language understanding task by executing the annotated machine-readable text.

6. The method of claim 1, wherein the different machine learning model is trained with unsupervised training.

7. The method of claim 1, wherein the word embeddings are aligned by mapping tokens included in the unannotated natural language with machine-readable text.

8. The method of claim 7, wherein the word embeddings are aligned by aligning an entire context of a sequence of tokens included in the unannotated natural language and annotated machine-readable text.

9. The method of claim 1, wherein the machine learning model and the different machine learning model are combined by formulating at least a shared encoder.

10. The method of claim 1, wherein the method further includes training the coupled machine learning model to learn a source decoder configured to decode unannotated natural language and a target decoder configured to decode machine-readable code.

11. The method of claim 10, further comprising the computing system refining the source decoder and the target decoder by employing a feedback loop between the source decoder and the target decoder to facilitate an improvement in accuracy of a natural language understanding transformation performed between the source decoder and the target decoder.

12. The method of claim 1, wherein the second set of training data comprises syntax corresponding to a particular programming language.

13. The method of claim 12, wherein the particular programming language comprises one of the following: REST API, custom XML, SQL or JSON.

14. The method of claim 1, wherein the plurality of values included in the set of vocabulary words are received from one or more vocabulary databases.

15. The method of claim 14, wherein a particular target template included in the set of target templates comprises a mapping to a particular vocabulary database of the one or more vocabulary databases.

16. A computing system comprising: one or more processors; one or more computer-readable hardware storage devices having stored thereon computer-executable instructions that are executable by the one or more processors to cause the computer system to at least: receive electronic content comprising (i) a set of target templates comprising a plurality of keys and (ii) a set of vocabulary words comprising a plurality of values, the plurality of values corresponding to the plurality of keys; automatically populate the set of target templates with the set of vocabulary words to generate training data comprising synthetically populated target templates of key-value pairings formatted as annotated machine-readable text; train a machine learning model with the training data, the machine learning model being configured to understand an association between the plurality of keys and the plurality of values of the key-value pairings included in the populated target templates; combine the machine learning model and the different machine learning model into a coupled machine learning model, the coupled machine learning model being configured to transform unannotated natural language into annotated machine-readable text; and operating the coupled machine learning model to generate annotated machine-readable text directly from unannotated natural language and without requiring a use of any intermediate annotated representation when generating the annotated machine-readable text directly from the unannotated natural language.

17. The computing system of claim 16, wherein the computer-executable instructions are executable by the one or more processors to further cause the computer system to perform a natural language understanding task by executing the annotated machine-readable text.

18. The computing system of claim 16 further comprising one or more of the following: a data retrieval engine, a template population engine, a training engine, a stacking engine, an encoding engine, a decoding engine, a refinement engine or an implementation engine.

19. A computer implemented method for training a machine learning model with generated annotations of source instances and while facilitating security of the source instances, the method being implemented by a computing system that includes at least one hardware processor and the method comprising: the computing system receiving a set of training data comprising a plurality of source instances corresponding to a particular language; the computing system training a machine learning model with the training data, the machine learning model being configured to understand a semantic structure of the training data in the particular language; the computing system combining the machine learning model with a different machine learning model into a coupled machine learning model by aligning word embeddings output from the first machine learning model and word embeddings output from the different machine learning model, the coupled machine learning model being configured to transform source instances in a first language that the different machine learning model is trained to understand the semantic structure of into source instances in the particular language that the machine learning model is trained to understand the semantic structure of.

20. One or more computer-readable hardware storage devices having stored thereon computer-executable instructions that are executable by the one or more processors to cause a computer system to at least: operate a coupled machine learning model configured to transform unannotated natural language into machine-readable text; and transform unannotated natural language into machine readable text; wherein the coupled machine learning model comprises: a first machine learning model trained on a first set of training data comprising unannotated natural language, the first machine learning model configured to understand a semantic structure of the first set of data; and a second machine learning model trained on a second set of training data comprising a plurality of target templates populated with a plurality of values, the plurality of target templates comprising a plurality of keys corresponding to the plurality of values; wherein the second machine learning model is configured to understand an association between the plurality of keys and the plurality of values of one or more key-value pairings included in the populated target templates; and wherein the first machine learning model and second machine learning model are combined to form the coupled machine learning model which further comprises a source decoder trained to decode unannotated natural language and a target decoder trained to decode target templates, and an encoder shared between the source decoder and the target decode.

Patent Metadata

Filing Date

Unknown

Publication Date

June 3, 2025

Inventors

Hany Mohamed Hassan AWADALLA

Subhabrata Mukherjee

Ahmed Awadallah

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search