10983967

Creation of a Cumulative Schema Based on an Inferred Schema and Statistics

PublishedApril 20, 2021
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A system, comprising: one or more computing devices comprising one or more hardware processors and a memory, the one or more computing devices configured to implement: a schema inference module configured to: infer a schema from one or more objects received from a data source, wherein each of the one or more objects includes data and metadata; create a cumulative schema based on the inferred schema from the one or more objects, wherein the cumulative schema identifies different combinations of individual attributes and corresponding type of data that exist for the one or more objects received from the data source; collect statistics on occurrences of the types of the data of the one or more objects received from the data source; and modify the cumulative schema based on the statistics, wherein to modify the cumulative schema, the schema inference module is configured to incorporate into the cumulative schema a number of occurrences of the different combinations of individual attributes and corresponding type of data.

2

2. The system of claim 1 , wherein the schema inference module is further configured to determine whether the types of the data are typed correctly based on the statistics.

3

3. The system of claim 1 , wherein the one or more computing devices are further configured to implement: an export module configured to output the data of the one or more objects to a data destination according to the cumulative schema.

4

4. The system of claim 3 , wherein the export module is further configured to: convert the cumulative schema to a relational schema; and output the data of the one or more objects to the data destination according to the relational schema.

5

5. The system of claim 3 , wherein the schema inference module is configured to infer the schema in a first pass through the one or more objects, and wherein the export module is configured to output the data of the one or more objects in a second pass through the one or more objects.

6

6. The system of claim 3 , wherein the one or more computing devices are further configured to implement an index store, wherein the schema inference module is configured to store the data of the one or more objects to the index storage service, and wherein the export module is further configured to retrieve the data of the one or more objects from the index store prior to outputting the data to the data destination.

7

7. A method, comprising: performing with one or more computing devices: inferring a schema from one or more objects received from a data source, wherein each of the one or more objects includes data and metadata; creating a cumulative schema based on the inferred schema from the one or more objects, wherein the cumulative schema identifies different combinations of individual attributes and corresponding type of data that exist for the one or more objects received from the data source; collecting statistics on occurrences of types of the data of the one or more objects received from the data source; and modifying the cumulative schema based on the statistics, wherein the modifying of the cumulative schema comprises incorporating into the cumulative schema a number of occurrences of the different combinations of individual attributes and corresponding type of data.

8

8. The method of claim 7 , further comprising collecting additional statistics on the metadata of the one or more objects.

9

9. The method of claim 7 , further comprising, in response to the statistics, recasting a type of the data of at least one object of the one or more objects.

10

10. The method of claim 7 , further comprising collecting additional statistics on values of the data having a particular type.

11

11. The method of claim 10 , wherein the additional statistics comprise a minimum value of the values, a maximum value of the values, an average of the values, a standard deviation of the values, or any combination thereof.

12

12. The method of claim 10 , further comprising updating the cumulative schema based on the additional statistics.

13

13. The method of claim 12 , wherein the cumulative schema comprises a metadata store, wherein the method further comprises storing the additional statistics to the metadata store.

14

14. The method of claim 7 , wherein collecting the statistics occurs subsequent to receiving all of the one or more objects.

15

15. The method of claim 7 , wherein collecting the statistics occurs in response to receiving each of the one or more objects.

16

16. A non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, cause the one or more processors to: infer a schema from one or more objects received from a data source, wherein each of the one or more objects includes data and metadata; create a cumulative schema based on the inferred schema from the one or more objects, wherein the cumulative schema identifies different combinations of individual attributes and corresponding type of data that exist for the one or more objects received from the data source; collect statistics on occurrences of types of the data of the one or more objects received from the data source; and modify the cumulative schema based on the statistics, wherein to modify the cumulative schema, the instructions further cause the one or more processors to incorporate into the cumulative schema a number of occurrences of the different combinations of individual attributes and corresponding type of data.

17

17. The non-transitory computer-readable storage medium of claim 16 , wherein the instructions further cause the one or more processors to: determine a casting directive configured to recast a particular object from a first type to a second type based on the statistics; determine that the particular object is of the first type; and recast the particular object to the second type based on the casting directive.

18

18. The non-transitory computer-readable storage medium of claim 16 , wherein the instructions further cause the one or more processors to: transform the one or more objects based on the cumulative schema; and store the transformed one or more objects to an index store.

19

19. The non-transitory computer-readable storage medium of claim 16 , wherein the instructions further cause the one or more processors to: determine that an error has occurred with the one or more objects; and collect statistics on the error.

20

20. The non-transitory computer-readable storage medium of claim 19 , wherein the error comprises a decompression error, a formatting error, a parsing error, a string encoding error, a locked file error, a network access error, or any combination thereof.

Patent Metadata

Filing Date

Unknown

Publication Date

April 20, 2021

Inventors

Dimitris Tsirogiannis
Nathan A. Binkert
Stavros Harizopoulos
Mehul A. Shah
Benjamin A. Sowell
Bryan D. Kaplan
Kevin R. Meyer

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Creation of a Cumulative Schema Based on an Inferred Schema and Statistics” (10983967). https://patentable.app/patents/10983967

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.