Metadata Based Mapping Assist

PublishedSeptember 9, 2025

Assigneenot available in USPTO data we have

InventorsRamkumar Ramalingam Subhojeet Pramanik Jothiponsundar Radhakrishnan Saptarshi Misra Nagarjuna Surabathina+1 more

Technical Abstract

Patent Claims

18 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A computer-implemented method for data or schema mapping, the computer-implemented method comprising: obtaining, by a server computer, source schema metadata, wherein the source schema metadata is associated with fields of a source schema; obtaining, by the server computer, target schema metadata with a target schema, wherein the target schema metadata is associated with fields of a target schema; determining, by the server computer, for each field of the source schema and each field of the target schema, a representation for each field based, at least in part, on the source schema metadata or the target schema metadata associated with each field, and generating schema field representations for a dynamic object schema, wherein the dynamic object schema enable extending schemas of an existing object, and wherein the determining of the representation for each field of the source schema and each field of the target schema comprises: training a machine learning model, wherein training the machine learning model comprises: obtaining a first training dataset; generating a first stage machine learning model based on the first training dataset, wherein the first stage machine learning model is trained to encode a sentence included in a dataset into a vector representation; obtaining a second training dataset, wherein the second training dataset includes schema metadata from various schema objects; generating a trained machine learning model based on the second training dataset and the first stage machine learning model, wherein the trained machine learning model is trained to encode sentences associated with metadata columns of a schema field into a single vector representation, wherein an input layer of the machine learning model utilizes a same encoder to encode a premise and a hypothesis; and utilizing triplets of related data in an unsupervised fashion to learn sentence representations directly from the metadata columns; generating, through a sequence neural network, sequence aware embeddings based on the representation for each field by combining the representations from the source schema metadata and the target schema metadata; and providing, by the server computer, the representation for each field of the source schema and the representation for each field of the target schema for use in generating data mappings between the source schema and the target schema.

2. The computer-implemented method of claim 1, wherein the representation for each field of the source schema and each field of the target schema comprises a fixed size vector embedding that describes a combination of metadata associated with a particular field of the source schema or the target schema.

3. The computer-implemented method of claim 1, further comprising: calculating field-to-field matches between the source schema and the target schema using the representations for the fields of the source schema and the target schema, wherein the representations are generated based on multiple metadata columns associated with a schema field; and generating data mapping suggestions between fields of the source schema and the target schema based on the field-to-field matches.

4. The computer-implemented method of claim 1, wherein the representations are determined for fields of a dynamic object schema.

5. The computer-implemented method of claim 1, wherein the determining the representation for each field of the source schema and each field of the target schema comprises: providing the source schema metadata or the target schema metadata associated with each field individually as input to a machine learning model, the machine learning model having been trained to generate the representation for each field based on one or more metadata columns associated with each field; and receiving, for each field as output from the machine learning model, a vector representation that describes a combination of the one or more metadata columns associated with each field as the representation for each field and be used as the representation for that field in determining matches between fields of the source and target schemas for use in data mapping.

6. The computer-implemented method of claim 5, wherein the machine learning model is trained to identify relevant information from multiple metadata columns associated with a schema field to generate a single vector representation for the schema field.

7. The computer-implemented method of claim 1, further comprising: generating a confidence score for each field of the target schema as compared to each field of the source schema, wherein the confidence score for each field of the target schema is based on comparing the representations for each field of the source schema to the representation for the field of the target schema, and an overall calculated score between two fields using cosine similarity; and generating data mapping suggestions between the source schema and the target schema for the fields of the target schema based, at least in part, on the confidence scores.

8. The computer-implemented method of claim 7, wherein the generating the confidence score for each field of the target schema comprises calculating an overall confidence score between two fields using cosine similarity.

9. A computer-implemented method comprising: determining, by a server computer, for each field of a source schema and each field of a target schema, a representation for each field based, at least in part, on source schema metadata or target schema metadata associated with each field, and generating schema field representations for a dynamic object schema, wherein the dynamic object schema enable extending schemas of an existing object, and wherein the determining of the representation for each field of the source schema and each field of the target schema comprises: training a machine learning model, wherein training the machine learning model comprises: obtaining a first training dataset, wherein the first training dataset is a sentence related dataset; generating a first stage machine learning model based on the first training dataset, wherein the first stage machine learning model is trained to encode a sentence included in a dataset into a vector representation, wherein the machine learning model generates fixed length sentence representations by utilizing a natural language inference dataset; obtaining a second training dataset, wherein the second training dataset includes schema metadata from various schema objects; generating a trained machine learning model based on the second training dataset and the first stage machine learning model, wherein the trained machine learning model is trained to encode metadata columns associated with a schema field into a single vector representation and to learn domain specific semantic representations relating to a schema domain, wherein an input layer of the machine learning model utilizes a same encoder to encode a premise and a hypothesis; and utilizing triplets of related data in an unsupervised fashion to learn sentence representations directly from the metadata columns providing the trained machine learning model for use in generating data mapping suggestions between the source schema and the target schema.

10. The computer-implemented method of claim 9, wherein the generating the trained machine learning model for use in generating data mapping does not require any prior schema mapping data.

11. The computer-implemented method of claim 9, wherein the second training dataset includes schema metadata that describes fields included in a schema.

12. A computer program product for data or schema mapping comprising a computer readable storage medium having stored thereon: program instructions programmed to obtain source schema metadata, wherein the source schema metadata is associated with fields of a source schema; program instructions programmed to obtain, by a server computer, target schema metadata with a target schema, wherein the target schema metadata is associated with fields of a target schema; program instructions programmed to determine, by the server computer, for each field of the source schema and each field of the target schema, a representation for each field based, at least in part, on the source schema metadata or the target schema metadata associated with each field, and generating schema field representations for a dynamic object schema, wherein the dynamic object schema enable extending schemas of an existing object, and wherein the determining of the representation for each field of the source schema and each field of the target schema comprises: program instructions programmed to train a machine learning model to learn domain specific semantic representations relating to a schema domain, wherein training the machine learning model comprises: program instructions programmed to obtain a first training dataset; program instructions programmed to generate a first stage machine learning model based on the first training dataset, wherein the first stage machine learning model is trained to encode a sentence included in a dataset into a vector representation; program instructions programmed to obtain a second training dataset, wherein the second training dataset includes schema metadata from various schema objects; program instructions programmed to generate a trained machine learning model based on the second training dataset and the first stage machine learning model, wherein the trained machine learning model is trained to encode sentences associated with metadata columns of a schema field into a single vector representation, wherein an input layer of the machine learning model utilizes a same encoder to encode a premise and a hypothesis, and wherein the trained machine learning model does not require use of prior schema mapping data; and utilizing triplets of related data in an unsupervised fashion to learn sentence representations directly from the metadata columns; program instructions programmed to generating, through a sequence neural network, sequence aware embeddings based on the representation for each field by combining the representations from the source schema metadata and the target schema metadata; and program instructions programmed to provide, by the server computer, the representation for each field of the source schema and the representation for each field of the target schema for use in generating data mappings between the source schema and the target schema.

13. The computer program product of claim 12, wherein the representation for each field of the source schema and each field of the target schema comprises a fixed size vector embedding that describes a combination of metadata associated with a particular field of the source schema or the target schema.

14. The computer program product of claim 12, the computer readable storage medium having further stored thereon: program instructions programmed to calculate field-to-field matches between the source schema and the target schema using the representations for the fields of the source schema and the target schema, wherein the representations are generated based on multiple metadata columns associated with a schema field; and program instructions programmed to generate data mapping suggestions between fields of the source schema and the target schema based on the field-to-field matches.

15. The computer program product of claim 12, wherein the program instructions programmed to determine, for each field of the source schema and each field of the target schema, the representation for each field, further comprise: program instructions programmed to provide the source schema metadata or the target schema metadata associated with each field individually as input to a machine learning model, the machine learning model having been trained to generate the representation for each field based on one or more metadata columns associated with each field; and program instructions programmed to receive, for each field as output from the machine learning model, a vector representation that describes a combination of the one or more metadata columns associated with each field as the representation for each field.

16. The computer program product of claim 15, wherein the machine learning model is trained to identify relevant information from multiple metadata columns associated with a schema field to generate a single vector representation for the schema field.

17. The computer program product of claim 12, the computer readable storage medium having further stored thereon: program instructions programmed to generate a confidence score for each field of the target schema as compared to each field of the source schema, wherein the confidence score for each field of the target schema is based on comparing the representations for each field of the source schema to the representation for the field of the target schema; and program instructions programmed to generate data mapping suggestions between the source schema and the target schema for the fields of the target schema based, at least in part, on the confidence scores.

18. The computer program product of claim 17, wherein the generating the confidence score for each field of the target schema comprises calculating an overall confidence score between two fields using cosine similarity.

Patent Metadata

Filing Date

Unknown

Publication Date

September 9, 2025

Inventors

Ramkumar Ramalingam

Subhojeet Pramanik

Jothiponsundar Radhakrishnan

Saptarshi Misra

Nagarjuna Surabathina

Matu Agarwal

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search