Client-side partition-aware batch insert operations are presented. For example, a server generates partition metadata, which is provided to a client. The client uses the partition metadata to determine the database nodes to which to send batch insert requests. For example, the client divides batch insert data, such as records for a partitioned table, among multiple database nodes having partitions of the table. The client issues batch insert requests to the respective database nodes for execution. When executed by the database nodes, batch insert operations can be performed in parallel.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. One or more tangible computer-readable media storing computer-executable instructions for causing a client programmed thereby to perform a method comprising: with the client, receiving a batch insert request comprising a plurality of insert operations for a partitioned table; with the client, splitting the plurality of insert operations into a plurality of operation batches according to partition metadata, wherein the partition metadata indicates how to partition the plurality of insert operations between a plurality of database nodes; and from the client, issuing the plurality of operation batches to the plurality of database nodes for execution according to the partition metadata.
A client-side system efficiently inserts data into a partitioned database. The client receives a batch insert request containing multiple insert operations targeted for a partitioned table. The client divides these insert operations into multiple batches based on partition metadata (rules for distributing data across database nodes). This metadata indicates which database node is responsible for which data partition. Finally, the client sends each batch of insert operations directly to the appropriate database node for execution, allowing for parallel insertion across the nodes. This avoids central server bottlenecks.
2. The one or more computer-readable media of claim 1 wherein the splitting includes, for an insert operation of the plurality of insert operations: determining a database node of the database nodes; and adding the insert operation to an operation batch of the plurality of operation batches that is for the determined database node.
In the client-side batch insert system described above, the process of splitting the insert operations involves determining the correct database node for each individual insert operation and adding that insert operation to the batch designated for that specific database node. This ensures each insert operation ends up at the correct destination for processing.
3. The one or more computer-readable media of claim 1 wherein the splitting includes, for an insert operation of the plurality of insert operations: determining a location; and adding the insert operation to an operation batch of the plurality of operation batches that is for the location.
In the client-side batch insert system described above, the process of splitting the insert operations involves determining a specific storage location (potentially within a database node or a separate storage system) for each individual insert operation and adding that insert operation to the batch designated for that location. This ensures each insert operation ends up at the correct destination for processing.
4. The one or more computer-readable media of claim 1 wherein the splitting includes, for an insert operation of the plurality of insert operations: based on partitioning criteria for one or more data values of the insert operation, selecting one of the database nodes or a location.
In the client-side batch insert system described above, the selection of a database node or location for each insert operation during splitting is based on partitioning criteria applied to the data values within that insert operation. These criteria could involve the type of data, range of values, or other attributes defined in the partition metadata to determine where the insert operation should be executed.
5. The one or more computer-readable media of claim 4 wherein the selecting uses a hash key calculated from the one or more data values using the partitioning criteria.
In the client-side batch insert system where database node/location selection is based on partitioning criteria, the selection process uses a hash key. This key is calculated from the data values within the insert operation, using a hashing algorithm defined in the partitioning criteria. The resulting hash is then used to determine the appropriate database node or location for the insert operation.
6. The one or more computer-readable media of claim 1 wherein the splitting includes, for an insert operation of the plurality of insert operations: selecting one of the operation batches according to a round-robin pattern.
In the client-side batch insert system described above, the splitting process can select an operation batch for an insert operation using a round-robin approach. This means the insert operations are distributed sequentially across the available operation batches in a circular fashion, ensuring relatively even distribution across the database nodes without sophisticated partitioning logic.
7. The one or more computer-readable media of claim 6 wherein the method further comprises: from the client, providing, along with the given operation batch, a flag that indicates the given operation batch has already been partitioned and should not be partitioned by the database node.
In the client-side batch insert system that uses round-robin partitioning, the client adds a flag to each operation batch indicating that it has already been partitioned on the client side. This prevents the database node from attempting to re-partition the data, streamlining processing and reducing overhead at the database node.
8. The one or more computer-readable media of claim 1 wherein the issuing depends at least in part on whether the given operation batch has been filled to a batch size.
In the client-side batch insert system, the client issues an operation batch to a database node only when the batch has been filled to a certain size. This batch size threshold helps optimize network communication by reducing the number of individual requests sent to the database nodes.
9. The one or more computer-readable media of claim 1 wherein the issuing depends at least in part on whether a last insert operation of the plurality of insert operations has been processed.
In the client-side batch insert system, the client issues operation batches to database nodes when all insert operations in the original batch insert request have been processed. This guarantees that all data is eventually sent to the database, even if some batches aren't yet full. It ensures that the last pieces of data get flushed and processed.
10. The one or more computer-readable media of claim 1 wherein the issuing depends at least in part on whether a timer for the given operation batch has expired.
In the client-side batch insert system, the client issues an operation batch to a database node if a timer associated with that batch has expired. This prevents data from being held indefinitely, even if the batch hasn't reached its full size or all operations haven't been processed. It enforces timely processing of the data.
11. The one or more computer-readable media of claim 1 wherein the method further comprises: with the client, updating the partition metadata to account for changes to one or more of partitioning criteria, the database nodes, or locations for the database nodes.
In the client-side batch insert system, the client updates its stored partition metadata to reflect changes in the database configuration. This includes changes to the partitioning criteria, the available database nodes, or the locations associated with those nodes. This dynamic adaptation keeps the client routing inserts correctly even as the database infrastructure evolves.
12. The one or more computer-readable media of claim 1 wherein, after the issuing, the client continues the splitting without waiting for a reply from a server for the operation batch.
In the client-side batch insert system, the client continues splitting and preparing operation batches after issuing a batch to a database node, without waiting for a response from the server. This asynchronous operation improves performance by allowing the client to prepare the next batch while the previous one is being processed, maximizing throughput.
13. The one or more computer-readable media of claim 1 wherein a single thread at the client performs the splitting.
In the client-side batch insert system, a single thread handles the splitting of the insert operations into batches. This simplifies the client-side logic, avoiding complex synchronization issues that could arise from multi-threaded processing. While limiting parallelism on the client, it avoids resource contention.
14. In a database system that includes a plurality of servers, each server operating a database node, to which a client device can issue requests for insert operations, a server of the plurality of servers, the server comprising a processing unit and memory, wherein the server is adapted to perform a method comprising: with the server, generating partition metadata that indicates how to partition insert operations between the plurality of database nodes, wherein the partition metadata includes partitioning criteria for insert operations, the partition criteria indicating, for the plurality of database nodes, which node is responsible for executing insert operations for particular partitions of a partitioned table; from the server, transferring the partition metadata to a client for use in client-side partition-aware routing, whereby the client issues batched insert operations directly to the plurality of database nodes according to the partition criteria of the partition metadata; and with the server, receiving a batch of insert operations from the client device, the batch of insert operations associated with a batch insert request executable at the plurality of servers, the batch of insert operations being executable at the server.
A database server generates partition metadata to guide client-side routing of batch insert operations. This metadata defines the partitioning criteria for distributing insert operations across multiple database nodes within the system. The server sends this metadata to the client, which then uses it to send batched insert operations directly to the appropriate nodes. The server then receives these batches and executes them. This offloads routing from the server to the client, improving scalability and reducing server load.
15. The server of claim 14 wherein the partition metadata is transferred as part of a reply to a request from the client to compile a query.
The database server sends partition metadata to the client as part of the response to a query compilation request. This allows the client to understand how to partition data related to the compiled query for subsequent batch insert operations, enabling efficient data loading into the partitioned table.
16. The server of claim 14 wherein the partition metadata is transferred as part of an update to previous partition metadata transferred to the client.
The database server sends partition metadata to the client as an update to previously provided metadata. This allows the server to dynamically adjust the partitioning scheme, and allows the client to adapt its routing behavior as the database configuration changes.
17. The server of claim 14 wherein a query optimizer of the server performs the generating.
A query optimizer on the database server generates the partition metadata. This ensures that the partitioning scheme is aligned with the overall query execution strategy and optimizes data distribution across the database nodes for efficient query processing.
18. A method comprising: with a client, receiving a batch insert request comprising a plurality of insert operations for a partitioned table; with the client, splitting the plurality of insert operations into a plurality of operation batches according to partition metadata received from a database server, including, for an insert operation of the plurality of insert operations: based at least in part on the partition metadata, determining, from a plurality of database nodes, a database node indicated for processing the insert operation; and adding the insert operation to the operation batch of the plurality of operation batches that is associated with the indicated database node; and from the client, after a given operation batch of the plurality of operation batches has been filled to a batch size, issuing the operation batch to the indicated node for the given operation batch for execution.
A client receives a batch insert request for a partitioned table and splits it into operation batches based on partition metadata received from a database server. For each insert operation, the client uses the metadata to determine the correct database node to handle the insert and adds it to the appropriate batch. Once a batch reaches a predefined size, the client sends it to the designated node for execution. This enables efficient client-side routing and parallel data loading.
19. The method of claim 18 wherein the determining uses round-robin partitioning or hash-based partitioning.
In the client-side batch insert system, the process of determining the appropriate database node for an insert operation utilizes either round-robin partitioning (distributing operations sequentially) or hash-based partitioning (using a hash of the data to determine the node). These are two common partitioning schemes for distributing data across nodes.
20. The method of claim 18 wherein a single thread at the client performs the splitting, and wherein, after the issuing, the client continues the splitting without waiting for a reply from a server for the operation batch.
The client-side batch insert system performs splitting operations using a single thread and continues processing even after a batch is issued, without waiting for server confirmation. Using a single thread simplifies client-side logic, while the non-blocking approach enhances performance by allowing continuous processing of insert requests.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
May 6, 2014
August 1, 2017
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.