10896172

Batch Data Ingestion in Database Systems

PublishedJanuary 19, 2021
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
21 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method comprising: obtaining, at a database system, an ingest request to ingest one or more files into a table of a database; after obtaining the ingest request and prior to the ingesting of the one or more files, persisting the one or more files in a first file queue that corresponds to the table, the first file queue further corresponding to a client account, and the database system further comprising a second file queue that corresponds to both a second client account and a second table; assigning the one or more files to one or more execution nodes to be ingested into the table; ingesting, by the one or more execution nodes, the one or more files into one or more micro-partitions of the table, each of the one or more micro-partitions comprising contiguous units of storage of a storage device; and registering metadata after the one or more files are ingested into the one or more micro-partitions of the table, the metadata identifying the one or more files and the one or more micro-partitions.

Plain English translation pending...
Claim 2

Original Legal Text

2. The method of claim 1 , wherein the ingest request comprises a notification that includes a list of the one or more files.

Plain English translation pending...
Claim 3

Original Legal Text

3. The method of claim 2 , wherein obtaining the ingest request comprises receiving the notification on behalf of a client account that is associated with the one or more files.

Plain English translation pending...
Claim 4

Original Legal Text

4. The method of claim 1 , wherein obtaining the ingest request comprises polling a data lake for added files, the data lake being associated with a client account that is associated with the one or more files, the data lake comprising data storage containing a plurality of files, the plurality of files comprising the one or more files.

Plain English translation pending...
Claim 5

Original Legal Text

5. The method of claim 1 , wherein ingesting, by the one or more execution nodes, the one or more files into one or more micro-partitions of the table comprises: operating an ingest poller to poll the first file queue; and ingesting the one or more files into one or more micro-partitions of the table via one or more pipes.

Plain English translation pending...
Claim 6

Original Legal Text

6. The method of claim 1 , wherein assigning the one or more files to the one or more execution nodes to be ingested into the table comprises: generating an ingest task for each of the one or more execution nodes, each generated ingest task identifying the table and one or more of the one or more files; and assigning each generated ingest task to an execution node in the one or more execution nodes.

Plain English translation pending...
Claim 7

Original Legal Text

7. The method of claim 6 , wherein assigning each generated ingest task to an execution node in the one or more execution nodes comprises assigning each generated ingest task to a different core of an execution node in the one or more execution nodes.

Plain English translation pending...
Claim 8

Original Legal Text

8. A database system comprising: at least one processor; and one or more non-transitory computer readable storage media containing instructions executable by the at least one processor for causing the at least one processor to perform operations comprising: obtaining, at the database system, an ingest request to ingest one or more files into a table of a database; after obtaining the ingest request and prior to the ingesting of the one or more files, persisting the one or more files in a first file queue that corresponds to the table, the first file queue further corresponds to a client account, and the database system further comprising a second file queue that corresponds to both a second client account and a second table; assigning the one or more files to one or more execution nodes to be ingested into the table; ingesting, by the one or more execution nodes, the one or more files into one or more micro-partitions of the table, each of the one or more micro-partitions comprising contiguous units of storage of a storage device; and registering metadata after the one or more files are ingested into the one or more micro-partitions of the table, the metadata identifying the one or more files and the one or more micro-partitions.

Plain English translation pending...
Claim 9

Original Legal Text

9. The database system of claim 8 , wherein the ingest request comprises a notification that includes a list of the one or more files.

Plain English translation pending...
Claim 10

Original Legal Text

10. The database system of claim 9 , wherein obtaining the ingest request comprises receiving the notification on behalf of a client account that is associated with the one or more files.

Plain English Translation

A database system is designed to manage and process file ingestion requests, particularly for client accounts that own or are associated with the files. The system includes a notification service that monitors for file changes or new file uploads, triggering an ingestion process when such events occur. When a file change or upload is detected, the notification service generates an ingest request, which the database system receives on behalf of the client account linked to the file. The system then processes the request by extracting metadata from the file, validating the file against predefined criteria, and storing the file and its metadata in a structured database. The system may also support additional features such as file versioning, access control, and integration with external storage services. The goal is to automate and streamline the ingestion of files into a database while ensuring data integrity and security for the associated client accounts.

Claim 11

Original Legal Text

11. The database system of claim 8 , wherein obtaining the ingest request comprises polling a data lake for added files, the data lake being associated with a client account that is associated with the one or more files, the data lake comprising data storage containing a plurality of files, the plurality of files comprising the one or more files.

Plain English Translation

A database system monitors a data lake storage system to identify newly added files for ingestion. The data lake is linked to a specific client account and contains multiple files, including the target files to be processed. The system periodically checks the data lake for new additions, retrieving metadata and content from any detected files. This automated polling mechanism ensures continuous synchronization between the data lake and the database system, enabling real-time or near-real-time data integration. The system may also validate file formats, extract structured data, and transform the content into a database-compatible format before storage. This approach simplifies data management by eliminating manual file transfers and reducing latency in data availability. The solution is particularly useful for organizations handling large volumes of unstructured or semi-structured data across distributed storage systems. The polling process can be configured to run at specified intervals or triggered by specific events, ensuring flexibility in data ingestion workflows. The system may also support error handling and retry mechanisms to address transient issues during file retrieval.

Claim 12

Original Legal Text

12. The database system of claim 8 , wherein ingesting, by the one or more execution nodes, the one or more files into one or more micro-partitions of the table comprises: operating an ingest poller to poll the first file queue; and ingesting the one or more files into one or more micro-partitions of the table via one or more pipes.

Plain English Translation

This invention relates to a database system designed to efficiently ingest and process large volumes of data. The system addresses the challenge of handling high-throughput data ingestion while maintaining performance and scalability. The database system includes one or more execution nodes responsible for managing data ingestion. These nodes operate an ingest poller that continuously monitors a first file queue to detect new files. When files are detected, the system ingests them into one or more micro-partitions of a table. The ingestion process is facilitated by one or more pipes, which streamline the transfer of data from the file queue into the micro-partitions. The use of micro-partitions allows for efficient data organization and retrieval, while the polling mechanism ensures timely processing of incoming files. This approach optimizes data ingestion performance, reduces latency, and enhances the system's ability to handle large-scale data workloads. The system may also include additional features such as data validation, transformation, and partitioning strategies to further improve efficiency and reliability.

Claim 13

Original Legal Text

13. The database system of claim 8 , wherein assigning the one or more files to the one or more execution nodes to be ingested into the table comprises: generating an ingest task for each of the one or more execution nodes, each generated ingest task identifying the table and one or more of the one or more files; and assigning each generated ingest task to an execution node in the one or more execution nodes.

Plain English Translation

This invention relates to database systems, specifically to methods for efficiently distributing and processing file ingestion tasks across multiple execution nodes in a distributed database environment. The problem addressed is the need to optimize the assignment of files to execution nodes for ingestion into a database table, ensuring balanced workload distribution and efficient resource utilization. The system involves a database with a table and multiple execution nodes responsible for processing data. Files containing data to be ingested into the table are distributed across these nodes. The invention improves this process by generating an ingest task for each execution node, where each task specifies the target table and the files assigned to that node. These tasks are then assigned to the respective execution nodes, ensuring that each node processes only its designated files. This approach enhances parallelism, reduces bottlenecks, and improves overall ingestion performance by avoiding redundant or overlapping file assignments. The system may also include mechanisms for dynamically adjusting task assignments based on node availability or workload, ensuring flexibility and scalability. The invention is particularly useful in large-scale distributed databases where efficient file distribution is critical for maintaining performance and reliability.

Claim 14

Original Legal Text

14. The database system of claim 13 , wherein assigning each generated ingest task to an execution node in the one or more execution nodes comprises assigning each generated ingest task to a different core of an execution node in the one or more execution nodes.

Plain English Translation

A database system is designed to efficiently process data ingestion tasks by distributing workloads across multiple execution nodes. The system generates ingest tasks for processing data and assigns each task to a different core within an execution node. This approach ensures that tasks are distributed evenly across available processing resources, preventing bottlenecks and improving overall system performance. By leveraging multiple cores within a single execution node, the system maximizes parallel processing capabilities, reducing latency and increasing throughput. The method involves dynamically assigning tasks to cores based on availability and workload, ensuring optimal resource utilization. This distribution strategy is particularly useful in high-volume data processing environments where efficient task scheduling is critical. The system may also include mechanisms to monitor task execution and reassign tasks if necessary, further enhancing reliability and performance. The core-based assignment method improves scalability, allowing the system to handle larger datasets and higher ingestion rates without significant performance degradation. This approach is applicable in various database systems, including distributed and cloud-based architectures, where efficient resource management is essential.

Claim 15

Original Legal Text

15. One or more non-transitory computer readable storage media containing instructions executable by at least one processor for causing the at least one processor to perform operations comprising: obtaining, at a database system, an ingest request to ingest one or more files into a table of a database; after obtaining the ingest request and prior to the ingesting of the one or more files, persisting the one or more files in a first file queue that corresponds to the table, the first file queue further corresponds to a client account, and the database system further comprising a second file queue that corresponds to both a second client account and a second table; assigning the one or more files to one or more execution nodes to be ingested into the table; ingesting, by the one or more execution nodes, the one or more files into one or more micro-partitions of the table, each of the one or more micro-partitions comprising contiguous units of storage of a storage device; and registering metadata after the one or more files are ingested into the one or more micro-partitions of the table, the metadata identifying the one or more files and the one or more micro-partitions.

Plain English translation pending...
Claim 16

Original Legal Text

16. The non-transitory computer readable storage media of claim 15 , wherein the ingest request comprises a notification that includes a list of the one or more files.

Plain English Translation

A system and method for managing file ingestion in a distributed computing environment addresses the challenge of efficiently tracking and processing multiple files across distributed systems. The invention provides a mechanism for generating and transmitting an ingest request that includes a notification containing a list of one or more files to be processed. This notification allows a receiving system to identify and prioritize the files for ingestion, ensuring that all necessary files are accounted for and processed in an organized manner. The system may also include a file tracking module that monitors the status of each file in the list, verifying that each file is successfully ingested and processed. Additionally, the system may generate a confirmation message upon completion of the ingestion process, providing feedback to the requesting system. The invention improves efficiency in distributed file processing by reducing the need for repeated requests and ensuring that all files are properly tracked and processed. This solution is particularly useful in environments where large volumes of data must be managed across multiple systems, such as cloud computing or big data analytics platforms.

Claim 17

Original Legal Text

17. The non-transitory computer readable storage media of claim 16 , wherein obtaining the ingest request comprises receiving the notification on behalf of a client account that is associated with the one or more files.

Plain English Translation

The invention relates to a system for managing file ingestion in a cloud-based storage environment. The problem addressed is efficiently processing file upload requests while ensuring proper access control and resource allocation. The system involves a non-transitory computer-readable storage medium storing instructions that, when executed, cause a processor to perform operations including receiving an ingest request for one or more files from a client account. The request is processed by validating the client account's permissions and determining available storage resources. The system then initiates the file transfer, monitors progress, and updates the client account's storage usage metrics. The storage medium also includes instructions for handling notifications related to the ingest request, ensuring that the client account is properly informed of the file transfer status. The system may further include mechanisms for prioritizing requests, managing concurrent transfers, and enforcing storage quotas. The overall goal is to streamline file ingestion while maintaining security and system efficiency.

Claim 18

Original Legal Text

18. The non-transitory computer readable storage media of claim 15 , wherein obtaining the ingest request comprises polling a data lake for added files, the data lake being associated with a client account that is associated with the one or more files, the data lake comprising data storage containing a plurality of files, the plurality of files comprising the one or more files.

Plain English Translation

This invention relates to a system for managing data ingestion from a data lake, addressing the challenge of efficiently detecting and processing new files added to a client's data storage. The system includes a non-transitory computer-readable storage medium storing instructions that, when executed, cause a processor to obtain an ingest request by polling a data lake associated with a client account. The data lake contains a plurality of files, including one or more files to be processed. The polling mechanism monitors the data lake for newly added files, ensuring timely detection and ingestion of data. The system further processes the ingest request by identifying the one or more files from the data lake, extracting metadata from the files, and storing the metadata in a metadata repository. This allows for organized tracking and retrieval of file information. The system also generates a data ingestion pipeline that includes a data processing job for the one or more files, where the pipeline is configured to process the files based on the extracted metadata. The pipeline may include multiple stages, such as data validation, transformation, and loading into a target system. The system further executes the data processing job within the pipeline to process the one or more files, ensuring efficient and automated data handling. This approach streamlines data ingestion workflows, reducing manual intervention and improving data processing efficiency.

Claim 19

Original Legal Text

19. The non-transitory computer readable storage media of claim 15 , wherein ingesting, by the one or more execution nodes, the one or more files into one or more micro-partitions of the table comprises: operating an ingest puller to poll the first file queue; and ingesting the one or more files into one or more micro-partitions of the table via one or more pipes.

Plain English translation pending...
Claim 20

Original Legal Text

20. The non-transitory computer readable storage media of claim 15 , wherein assigning the one or more files to the one or more execution nodes to be ingested into the table comprises: generating an ingest task for each of the one or more execution nodes, each generated ingest task identifying the table and one or more of the one or more files; and assigning each generated ingest task to an execution node in the one or more execution nodes.

Plain English translation pending...
Claim 21

Original Legal Text

21. The non-transitory computer readable storage media of claim 20 , Wherein assigning each generated ingest task to an execution node in the one or more execution nodes comprises assigning each generated ingest task to a different core of an execution node in the one or more execution nodes.

Plain English Translation

This invention relates to distributed data processing systems, specifically optimizing task assignment in parallel computing environments. The problem addressed is inefficient resource utilization in distributed systems where tasks are not optimally distributed across available computing cores, leading to bottlenecks and underutilized processing power. The system generates ingest tasks for processing data in a distributed computing environment. Each ingest task is assigned to a different core within an execution node, ensuring that multiple cores of a single execution node are utilized simultaneously. This approach maximizes parallel processing by leveraging the multi-core architecture of modern processors, reducing idle time and improving overall system throughput. The execution nodes are part of a distributed system where tasks are dynamically generated and distributed. The assignment process ensures that tasks are evenly spread across available cores, preventing overloading of any single core while maintaining balanced workload distribution. This method enhances performance by minimizing inter-core communication overhead and reducing latency in task execution. The system also includes mechanisms for monitoring task execution and dynamically adjusting assignments based on real-time performance metrics, such as core utilization and task completion rates. This adaptive approach ensures sustained efficiency even as workloads fluctuate. The invention is particularly useful in high-performance computing environments where large-scale data processing requires optimal resource allocation.

Patent Metadata

Filing Date

Unknown

Publication Date

January 19, 2021

Inventors

Benoit Dageville
Varun Ganesh
Jiansheng Huang
Jiaxing Liang
Haowei Yu
Scott Ziegler

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “BATCH DATA INGESTION IN DATABASE SYSTEMS” (10896172). https://patentable.app/patents/10896172

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10896172. See llms.txt for full attribution policy.