Computer-implemented method and apparatus for automatically annotating columns of a table with semantic types

PublishedMay 7, 2024

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Provided is a computer-implemented method for generating automatically annotations for tabular cell data of a table having column and rows, wherein the method includes: supplying raw cell data of cells of a row of the table as input to an embedding layer of a semantic type annotation neural network which transforms the received raw cell data of the cells of the supplied row into cell embedding vectors; processing the cell embedding vectors to calculate attentions among the cells of the respective row of the table encoding a context within the row output as cell context vectors; and processing the cell context vectors generated by the self-attention layer by a classification layer of the semantic type annotation neural network to predict semantic column type annotations and/or to predict relations between semantic column type annotations for the columns of the table.

Patent Claims

13 claims

Legal claims defining the scope of protection, as filed with the USPTO.

2. The computer-implemented method according to claim 1 wherein a bidirectional recurrent neural network, RNN, trained as an encoder of an autoencoder on cell embeddings provided by a byte-pair encoding model, BPE, is used as an encoder of the embedding layer of the semantic type annotation neural network.

3. The computer-implemented method according to claim 1 wherein the generated annotations and the tabular cell data of the table, T, are supplied to an extract, transform, load (ETL) process used to generate a knowledge graph instance stored in a memory.

4. The computer-implemented method according to claim 1 wherein the classification layer calculates column type vectors, y, comprising for the cell data of each cell, C, of the respective supplied row, R, predicted semantic column type probabilities.

5. The computer-implemented method according to claim 4 wherein a mean pooling of the column type vectors, y, of all rows, R, of the table, T, is performed to predict a semantic column type for each column, C, of the table, T.

6. The computer-implemented method according to claim 1 wherein the self-attention layer of the semantic type annotation neural network comprises a stack of transformers to calculate attentions among the cells, C, of the respective row, R, of the table, T.

7. The computer-implemented method according to claim 1 wherein the semantic type annotation neural network is trained in a supervised learning process using labeled rows, R, as samples.

8. A computer program product comprising a computer readable storage device have computer readable program code stored therein, the program code executable by a processor of a computer system to perform the computer-implemented method according to claim 1 to generate automatically annotations for tabular cell data received from a data source.

10. The apparatus according to claim 9 wherein a bidirectional recurrent neural network, RNN, trained as an encoder of an autoencoder on cell embeddings provided by a byte-pair encoding model, BPE, is implemented as an encoder of the embedding layer of the semantic type annotation neural network of the apparatus.

11. The apparatus according to claim 9 wherein the generated annotations and the tabular cell data of the table, T, are supplied to an extract, transform, load (ETL) process used to generate a knowledge graph instance of the knowledge base.

12. The apparatus according to claim 9 wherein the classification layer of the semantic type annotation neural network is adapted to calculate column type vectors, y, comprising for the cell data of each cell, C, of the respective supplied row, R, predicted semantic column type probabilities.

13. The apparatus according to claim 9, wherein a mean pooling of column type vectors, y, of all rows, R, of the table, T, is performed to predict the semantic type annotation of each column, C, of the table, T.

14. The apparatus according to claim 9 wherein the self-attention layer of the semantic type annotation neural network comprises a stack of transformers adapted to calculate attentions among the cells, C, of the respective row, R, of the table, T.

15. The apparatus according to claim 9 wherein the semantic type annotation neural network of the apparatus is trained in a supervised learning process using labeled rows, R, as samples.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06F G06N

Patent Metadata

Filing Date

June 17, 2021

Publication Date

May 7, 2024

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search