Techniques are described for efficient transfer of large data type data (large data) among database servers for executing a distributed query. In an implementation, a database server receives object reference data, including an indication of whether the object reference data is a quasi-locator. The receipt of the quasi-locator indicates that the large data is to be streamed from the large database repository without any further request from the receiving database server. Accordingly, the receiving database server receives and stores the requested large data without generating and sending any further requests to the large data repository database server.
Legal claims defining the scope of protection, as filed with the USPTO.
. A computer-implemented method comprising:
. The method of, further comprising:
. The method of, further comprising:
. The method of, further comprising:
. The method of, wherein the particular large data is a large object type (LOB) column, which is stored on the large data repository database server, and the object reference data is an LOB locator to a location of storage of the LOB column.
. The method of, further comprising:
. The method of, wherein the object reference data includes a size of the particular large data, the method further comprising:
. The method of, wherein the object reference data includes character set information of the particular large data, the method further comprising:
. The method of, further comprising:
. The method of, wherein the particular large data includes extensible markup language (XML) data or JavaScript Object Notation (JSON) data.
. The method of, wherein the large data repository database server is a coordinator database server and the receiving database server is a shard database server, the method further comprising:
. One or more non-transitory computer-readable media storing a set of instructions, wherein the set of instructions includes instructions, which when executed by one or more hardware processors, cause:
. The one or more non-transitory computer-readable media of, wherein the set of instructions further includes instructions, which when executed by said one or more hardware processors, cause:
. The one or more non-transitory computer-readable media of, wherein the set of instructions further includes instructions, which when executed by said one or more hardware processors, cause:
. The one or more non-transitory computer-readable media of, wherein the particular large data is a large object type (LOB) column, which is stored on the large data repository database server, and the object reference data is an LOB locator to a location of storage of the LOB column.
. The one or more non-transitory computer-readable media of, wherein the set of instructions further includes instructions, which when executed by said one or more hardware processors, cause:
. The one or more non-transitory computer-readable media of, wherein the object reference data includes a size of the particular large data, and wherein the set of instructions further includes instructions, which when executed by said one or more hardware processors, cause:
. The one or more non-transitory computer-readable media of, wherein the set of instructions further includes instructions, which when executed by said one or more hardware processors, cause:
. The one or more non-transitory computer-readable media of, wherein the particular large data includes extensible markup language (XML) data or JavaScript Object Notation (JSON) data.
. The one or more non-transitory computer-readable media of, wherein the large data repository database server is a coordinator database server and the receiving database server is a shard database server, and wherein the set of instructions further includes instructions, which when executed by said one or more hardware processors, cause:
Complete technical specification and implementation details from the patent document.
This application claims the benefit under 35 U.S.C. § 119(c) of provisional application 63/573,136, filed Apr. 2, 2024, the entire contents of which is hereby incorporated by reference for all purposes as if fully set forth herein.
The present invention relates to the field of electronic database management, in particular to large data transfer among database servers.
The approaches described in this section are approaches that could be pursued but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
A database management system (DBMS) may include multiple database servers that may provide redundancy, load balancing, and/or distribute data logically across the database servers. Such a DBMS is referred to as a distributed database management system (DDBMS), and each database server may be referred to as a “shard”.
When a query is received by one of the database servers of DDBMS, the query may reference multiple database instances that are managed by different database servers. Accordingly, when the one database server executes the received query, the database server may request remote data from another database server or an execution of at least a portion of the query on the remote data of the different database server.
Such remote data may be a column of large object data (LOB). An LOB column may store a semi-structured data document in each row, such as an extensible Markup Language (XML) document or a Javascript Object Notation (JSON) document.
Because LOB columns and the tables that contain them are voluminous and, thus, resource-heavy for query operations, the retrieval of LOB column data occurs in two roundtrips. In the first roundtrip, the requesting entity, such as the client system or a coordinator database server, receives the object reference data that references the location of the LOB data, the LOB locator. The LOB locators may be database-specific physical addresses for accessing LOB data and may additionally include the generation timestamp on the originating database or other metadata. For example, an LOB locator may contain the LOB Identifier, LOB version, and the SCN (database logical timestamp) of the LOB generation.
In the first roundtrip, the requester of the LOB data may cache the LOB locator for the requested LOB data. In the second roundtrip, the client system may request at least a portion of the LOB data to be fetched using the LOB locator. The two roundtrips may repeated until the complete LOB data is fetched.
The advantage of the two roundtrips approach is that the client/requester may maintain the LOB locator and be able to fetch or update the LOB data on subsequent requests. Thus, if the data of the same LOB is repeatedly requested or updated, then this two-round trip approach saves significant processing time by proceeding directly to fetching LOB data in subsequent requests.
However, this approach introduces significant inefficiency when different LOBs are requested across many database servers. For example, the DDBMS may have's of shard database servers that partition logs stored in JSON, each database server storing a particular time period of logs in an LOB column. In such a scenario, executing a query that aggregates the logs incurs the cost of hundreds of additional roundtrips and introduces significant computational costs. Especially when database servers are located in different regions, the network latency may be significant, and each extra roundtrip may be very costly.
The problem is further exacerbated by “CREATE TABLE AS SELECT” (CTAS) or “INSERT TABLE AS SELECT” (ITAS) query constructs, in which temporary LOB data may be referenced. These constructs create local temporary tables that may include LOB columns from remote database servers. Even though CTAS and ITAS may specify criteria that may significantly reduce the amount of LOB data that needs to be fetched into the temporary tables, the querying database server has no choice but to perform the two roundtrip technique and, in the second roundtrip to transfer the full contents into the LOB column for applying the criteria on the querying database server. Such an approach further burdens the network and introduces significant latency in executing distributed queries with CTAS and ITAS constructs.
In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, that the present invention may be practiced without these specific details. In other instances, structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.
Techniques for transferring large data type data (large data) among database servers are described herein. Large data refers to unstructured or semi-structured data, such as text, images, videos, and documents, which may be too large to be stored in the arranged manner in memory (e.g., columnar/column-major or row store/row-major) and, thus, is accessed through the respective object reference data indicating the memory location of the storage. In an implementation, the object reference data (also referred to herein as “locator”) is extended to include an indication of whether the requested large data, such as LOB, is going to be streamed over a network without receiving additional request(s) for the large data. Rather than requiring the two-roundtrip approach, the large data may be streamed from the large data repository database server (LDBS) in response to the initial request for transfer of the large data. The requesting database server may not need to send the received object reference data to fetch at least a portion of the requested large data from the LDBS.
The extended object reference data is referred to herein as “quasi-locator.” The term “large data repository database server” (LDBS) refers to a database server in a DDBMS that stores the requested large data. The database server that requests the large data from the LDBS or is sent the large data from the LDBS in the DDBMS is referred to as the “receiving database server”. Non-limiting examples of receiving database servers may be a local database server or a coordinator database server that receives the query execution request from a client system and requests the large data from another database server, an LDBS, for the execution of the query.
A receiving database server of a DDBMS may generate and select an execution plan for a received query that specifies data operation on large data stored on a remote LDBS of the DDBMS. Such an operation may be a temporary table definition statement with CTAS or IAS construct referencing an LOB column of the LDBS. In this example, the execution of the query on the receiving database server includes a request to the LDBS to transmit the large data to the receiving database server for execution.
To provide the requested large data, in response to receiving the request from the receiving database server, the LDBS may generate a quasi-locator. The quasi-locator includes an indication of whether the quasi-locator indeed references the requested large data (and has to be resent to fetch at least a portion of the large data) or whether the requested data is to be streamed without any additional request. For example, the received LOB quasi-locator may include a bit, which, when set, would indicate that LOB data is being automatically streamed from the LDBS without additional receipt of the LOB locator to fetch the LOB data or a portion thereof.
In an implementation, after or in parallel with sending the object data reference, but without receiving any additional request for the large data, the LDBS automatically initiates the transfer (sending) of the large data to the receiving database server. The LDBS may stream the large data to the receiving database server, which may store the large data in temporary storage. Based on comparing the size of the large data to the buffer threshold, the receiving database server determines whether to store the large data in the fast-accessible buffer memory or slow-accessible disk storage memory.
is a block diagram that depicts a distributed database management system (DDBMS), in an implementation. Although the techniques described herein are applicable to any type of DBMS, the techniques are particularly beneficial to distributed database management systems. Distributed database management systems, such as DDBMS, have multiple nodes interconnected by a network, such as Network. Accordingly, to execute operations, data is constantly shuffled around the nodes through Network, which makes the size of data and number of interactions critical to reducing network latency. The techniques described herein improve the efficiency of the query execution in a distributed data environment by reducing the latency of large data transfer over the network.
In an implementation, DDBMSincludes one or more database servers, such as Coordinator Database Server Node, Database Server NodeA, and Database Server NodeB. Each of these database server nodes may collectively or individually be a separate database management system described in more detail in the “Database Management Systems” section. The cluster of nodes of DDBMSis further described in the “Computing Nodes and Clusters” section. Although only three nodes,,A andB, are depicted, other implementations may include more or less than two shards.
A client system (not depicted in) may issue a database query execution request to DDBMS. In response to the request, DDBMSexecutes the query and returns the result to the client system. In an implementation, Coordinator Database Server Nodeof DDBMSreceives the query request stores and serves up the response to the client system from DDBMS. To execute the query, Coordinatormay first convert the query into an execution plan. As used herein, an “execution plan” is an ordered set of operators used to access data in a database management system. Each operator may contain one or more data operations on the output data of another operator or on database objects. Query Optimizerreceives a query as an input and generates an execution plan for the query, in an implementation. Query Optimizermay include a plan generator that generates one or more alternative execution plans for the query. The plan generator may generate those plans based on an analysis of access paths for database objects, operations described in the query, and/or pre-execution statistics thereof.
Based on the analysis, the plan generator may produce a permutation of execution plans, one of which may be selected based on the execution cost (e.g., computing cost). After selecting an optimal execution plan for a query, Coordinator Nodeobtains and executes the execution plan, in an implementation.
The execution plan for a distributed query may reference CTAS/IAS/merge of LOBs operations and/or the operations involving temporary LOBs, JSON/XML datatype LOB columns, and value-based LOBs that are stored on DatabasesA and/orB. Those database instances are serviced by different database server nodes (shards), such as Database Server NodeA and Database Server NodeB, respectively. Each of their respective DatabasesA/B may store different logical portions of large data, or different large data all together. Accordingly, query optimizermay generate an execution plan that may include querying different (portions of) large data from databasesA/B and Coordinator's database.
To execute such an execution plan by Coordinator Database Server Node, Coordinatormay request Database Server NodeA and/or Database Server NodeB to transfer the requisite large data from respective DatabaseA and DatabaseB to Coordinator Database Server Node. Only after the transfer of such large data from the remote database server(s) can the optimal execution plan be performed on Coordinator Database Server Node.
Techniques described herein include the efficient transfer of large data between database servers of DDBMSfor query execution.is a sequence diagram depicting the process for performing large data transfer from an LDBS to a receiving database server to execute a client query, in one or more implementations.
Coordinator, as a receiving database server, receives a query execution request from Client Systemat step, in an implementation. At step, Coordinatordetermines that for the query execution, an operation is to be performed on large data stored on Remote LDBS NodeA. For example, the received query may include a DEFINE with a CTAS/ITAS operator referencing an LOB column stored on Remote LDBS NodeA. Thus, the generated optimal execution plan by Coordinatorincludes an operation on the LOB column stored on Remote LDBS NodeA.
At step, Coordinatormay send a request to Remote LDBS NodeA to retrieve the LOB column requested for the query execution. This request for the large data from Coordinatormay indicate to Remote LDBSA the support for the automatic streaming of large data by including the indication of the capability for receiving a stream of the large data. Upon the receipt of the request for large data, such as an LOB column, from Coordinatorat step, Remote LDBS NodeA, at step, retrieves the LOB column from DatabaseA to send to Coordinator.
In an implementation, the request from Coordinatoradditionally contains the operation to be performed on the large data (e.g., scan with a predicate) based on the execution plan. If Coordinator, rather than Remote LDBSA, performs the operation, then Coordinatorhas to wait until the large data is completely transferred from Remote LDBS NodeA. Instead, since the operation is part of the request to Remote LDBS NodeA, Remote LDBS NodeA may locally perform the operation on the large data. Since the operation may reduce the size of the resulting large data, such a technique decreases the amount of large data transferred and, thus, further improves the efficiency and the latency of the query execution on Coordinator.
For example, at step, Remote LDBS NodeA may receive a request for an LOB column of DatabaseA that further includes a request for a scan operation with a predicate operation to be performed thereon. Remote Nodemay execute the predicate on the LOB column and then send only the resulting sub-set of large data of the LOB column, resulting in quicker data transfer to Coordinatorand, thus, reduced latency in the query execution.
In an implementation, the received request at the LDBS, such as the request received at step, may also indicate the type of large data transfer supported by the receiving database server. The LDBS may use the more efficient automatic streaming type data transfer or the two-round trip request-based data transfer. In an implementation, if the LDBS and the receiving database server support the automatic streaming type data transfer, then the LDBS may automatically perform the efficient streaming of large data to the receiving database server without any additional request (object reference data containing request) from the receiving database server.
is a block diagram that depicts the process for determining the type of transfer for large data from an LDBS, in an implementation. At step, the LDBS determines whether the receiving database server supports the automatic streaming of large data. The request received by the LDBS may contain an indication of such support by the receiving database server. For example, the request received for LOB column operation on Remote LDBS NodeA may contain an indication of support for the automatic streaming of LOB data. Remote Nodemay examine this indication and determine the existence of such support by Coordinatorat step.
If, at step, the process determines that there is support for the automatic streaming of large data, the process proceeds to step. Otherwise, the process proceeds to stepto generate a regular locator. At step, the process sends the locator to the receiving database server for the receiving database server to use the locator when fetching the large data of the LDBS. At step, the LDBS has to wait to receive a request for the large data (or a portion thereof) that includes the previously sent locator. Upon the receipt of such a request at step, the LDBS retrieves the large data from the storage and sends to the receiving database server at step. The wait, receipt, and sending cycle of stepsandmay have to be repeated for each portion of the large data until the large data is completely sent.
Unlike regular locator request-based transfer of large data, if it is determined that the automatic streaming is supported by the receiving database server at step, at step, a quasi-locator is generated. In an implementation, in addition to indication about the support for automatic streaming of large data, the quasi-locator contains metadata about the large data, such as the size, data type (JSON, XML) and/or character set of the large data. The indication for streaming and metadata may be sent together or separately to the receiving database server at step.
At step, without any request from the receiving database server to initiate the transfer of the large data (or a portion thereof), the LDBS initiates the streaming of portions until all portions are sent to the receiving database server. For example, continuing with, at step, Remote LDBS NodeA may generate a quasi-locator for the requested LOB data by Coordinator. In addition to the indication that Remote LDBS NodeA supports automatic streaming, the generated quasi-locator may include metadata about the large data, such as the size of the resulting LOB data, the LOB column type, and the character set (encoding) of the content of the LOB data. If the LOB column contains semi-structured JSON or XML data, the data type may be indicated along with the character set (ANSI, UTF-8, UTF-16). That way, the receiving database, Coordinator, may be able to seamlessly integrate the received LOB data into its query processing for efficiently performing the query for Client System.
After sending the quasi-locator with metadata about the LOB column at stepto Coordinatorand without receiving from Coordinatorany further request for the LOB data, Remote LDBS NodeA initiates the streaming of the LOB data to Coordinatorby sending portion(s) of the requested LOB data to Coordinator, at step.
Coordinatorreceives the quasi-locator at step. Based on the quasi-locator, Coordinatorreceives, processes and stores the streamed portions of the LOB column until the complete large data is transferred from Remote LDBS NodeA to Coordinator.
is a block diagram that depicts the process for receiving, processing and storing streamed large data from a remote LDBS, in one or more implementations. At step, the receiving database server receives a locator in response to the request for large data stored on the remote LDBS.
At step, the receiving database server determines whether the received object reference data is a quasi-locator for streaming the requested large data or a regular locator for future fetching of portions of the requested large data. The indication included in the received object reference data may indicate the type of locator and, thus, the type of transfer to be performed by the LDBS. Such an indication may be a bit set/unset, respectively, within the object reference data by the LDBS. Additionally or alternatively, the non-existence of the indication in the object reference data may also be an indication of the locator being a regular locator.
If, at step, the receiving database server determines, based on the indication or the lack thereof, that the received object reference data is a regular locator, then the process transitions to step, and the receiving server sends a separate additional request with the locator to fetch the large data (or portions thereof) from the LDBS.
Otherwise, if, at step, it is determined that the received object reference data contains an indication that the LDBS supports automatic streaming of the requested large data to the receiving database server, then the process determines that the object reference data is a quasi-locator. The process transitions to stepto process the streamed (portions of) large data.
Additionally, the quasi-locator may contain metadata about the size of the large data being streamed. In such an implementation, at step, the process determines whether the size of the large data, as indicated in the quasi-locator, is above or below a memory threshold. If the size is above the memory threshold (e.g., greater or equal), then the process allocates memory space for large data in a slower, disk-based storage of the receiving database server at step. Otherwise, the process allocates the memory space in a fast-access buffer storage of the receiving database server at step. The memory threshold may be selected based on the availability of fast-access memory storage and/or pre-defined (e.g., by user input).
As each portion of the large data is received from the LDBS at step, the portion is processed and stored in the allocated memory at stepuntil all the portions of the stream are received at step, and thereby, the large data is fully streamed to the receiving database server.
In an implementation, the large data may be processed by the receiving server before or after portion(s) of the large data is stored. Since the quasi-locator may contain metadata about the large data, the receiving database server may determine, based on the metadata, that additional processing is necessary for the client query execution.
In an implementation, the quasi-locator may include information on the character set encoding of the streamed large data. Accordingly, at stepor after step, when the receiving database server receives the large data or a portion thereof, the receiving database server may convert the large data from the current character set/encoding indicated in the quasi- locator to the desired character set/encoding for the query operation.
Once the large data is efficiently streamed to the receiving database server, the query requested by the client may be executed according to the execution plan at step. The stored large data may be cast into an appropriate type of temporary data structure of the receiving database server according to the execution plan of the client query.
For example, continuing with, at step, Coordinatorcompletes storing the received large data of the LOB column in the local storage and propagates the stored data as temporary LOB column data for the query execution. Having access to the received data, Coordinatormay execute the operation of the execution plan of the query that references the temporary LOB column at step. At step, Coordinatorreturns the result of the query execution to Client System. If the result of the query execution includes another LOB column, the same techniques described herein may be used to transfer the resulting LOB data to Client Systemat step. For this transfer of large data to Client System, Client Systemhas a receiving database server role, while Coordinatoris an LDBS.
Additionally, or alternatively, the coordinator database server that receives a query that references large data may be the database server that stores the large data and thus is the LDBS of the DDBMS. A query received by the LDBS may specify a predicate on its stored large data that is based on the evaluation performed on another database server of the DDBMS. The execution plan of such a query may contain a BIND operation on an LOB column stored locally with column(s) stored remotely on another shard database server. In such an example, the operation on the LOB and the LOB data itself are routed to the other database servers of the DDBMS for execution of the operation.
is a sequence diagram that depicts the transferring of large data to a remote database server for remote processing, in one or more implementations. At step, Coordinator LDBSreceives a request for query execution from Client System. At step, Coordinator LDBSmay determine to perform a large data operation on Remote NodeB by generating and selecting an execution plan with a BIND operation of a local temporary LOB of Coordinator LDBSwith a column on DatabaseB of Remote NodeB. At step, Coordinator LDBSsends a request to Remote NodeB to perform the operation on the LOB.
This request to perform the operation may include an indication of the capability of Coordinator LDBSfor streaming the large data without additional requests from Remote NodeB for such large data. Upon the receipt of the request, such as an LOB column, from Coordinator LDBSat step, Remote NodeB, at step, generates a response that includes Remote NodeB′s capability of receiving streaming portions of the large data without requests for each portion.
Similar to the techniques described above, Coordinator LDBSdetermines the support for the streaming of the large data and generates a quasi-locator for the large data to be transferred to the receiving database server. Coordinator LDBSmay perform one or more steps ofto generate and send a quasi-locator and stream the large data itself to the receiving Remote NodeB.
Accordingly, at step, Coordinator LDBSgenerates and sends object reference data and at step, streams temporary LOB data to Remote NodeB. By performing one or more steps of, Remote NodeB stores the LOB data at stepwithout sending any additional request to Coordinator LDBSfor the LOB data and executes the operation on the received LOB data at step.
Unknown
October 2, 2025
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.