In one respect, there is provided a distributed database system. The distributed database system can include a plurality of hosts configured to store and/or manage data, a first query processing engine comprising a master node and an executor node, and a second query processing engine. The master node can be implemented on one or more programmable processors and configured to perform operations. The operations can include: generating an execution plan for a query on data that is stored at and/or managed by one or more of the plurality of hosts; determining to push down, to the second query processing engine, at least one data processing operations in the execution plan; and dispatching, to the executor node, at least a portion of the execution plan, the portion of the execution plan including the at least one data processing operation that is pushed down to the second query processing engine.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A distributed database system, comprising: a first query processing engine comprising a master node and a plurality of executor nodes; a second query processing engine comprising a plurality of instances of the second query processing engine; and a plurality of hosts storing data, each of the plurality of hosts including one of the plurality of executor nodes and one of the plurality of instances of the second query processing engine; and wherein the master node of the first query processing engine is implemented on one or more programmable processors and configured to perform operations comprising: generating an execution plan for a query on data that is stored at a host from the plurality of hosts, the execution plan including a first data processing operation and a second data processing operation; determining to push down, to the second query processing engine, the second data processing operation such that the second data processing operation is performed by an instance of the second query processing engine deployed at the host instead of an executor node of the first query processing engine deployed at the host, the master node determining to push down the second data processing operation based at least on the second data processing operation not requiring data stored at any other one of the plurality of hosts, and the master node determining not to push down the first data processing operation based at least on the first data processing operation requiring data stored at another one of the plurality of hosts; and dispatching, to the executor node, at least a portion of the execution plan, the portion of the execution plan including the first data processing operation such that the first data processing operation is performed by the executor node of the first query processing engine deployed at the host, and the portion of the execution plan further including the second data processing operation such that the second data processing operation is forwarded, by the executor node, to the instance of the second query processing engine deployed at the host to be performed by the instance of the second query processing engine deployed at the host.
2. The distributed database system of claim 1 , wherein pushing down the second data processing operation includes rewriting the second data processing operation in structured query language (SQL).
3. The distributed database system of claim 1 , wherein the second data processing operation is forwarded to the instance of the second query processing engine deployed at the host via an application programming interface (API).
4. The distributed database system of claim 1 , wherein the master node determines to push down the second data processing operation based at least on the second data processing operation requiring data residing at the second query processing engine.
5. The distributed database system of claim 1 , wherein the master node determines to push down the second data processing operation based at least on the second query processing engine supporting the second data processing operation.
6. The distributed database system of claim 1 , wherein the master node determines to push down the second data processing operation based at least on the second data processing operation operating on a result of one or more other data processing operations on data residing at the second query processing engine.
7. The distributed database system of claim 1 , wherein the first query processing engine comprises a distributed query processing engine, and wherein the second query processing engine comprises a distributed query processing engine or a non-distributed query processing engine.
8. A computer-implemented method, comprising: generating, at a master node of a first query processing engine, an execution plan for a query on data that is stored at a host from a plurality of hosts in a distributed database system, the execution plan including a first data processing operation and a second data processing operation, the distributed database system including the first query processing engine and a second query processing engine, the first query processing engine comprising the master node and a plurality of executor nodes, the second query processing engine comprising a plurality of instances of the second query processing engine, and each of the plurality of hosts including one of the plurality of executor nodes and one of the plurality of instances of the second query processing engine; determining to push down, to the second query processing engine, the second data processing operation such that the second data processing operation is performed by an instance of the second query processing engine deployed at the host instead of an executor node of the first query processing engine deployed at the host, the master node determining to push down the second data processing operation based at least on the second data processing operation not requiring data stored at any other one of the plurality of hosts, and the master node determining not to push down the first data processing operation based at least on the first data processing operation requiring data stored at another one of the plurality of hosts; and dispatching, to the executor node, at least a portion of the execution plan, the portion of the execution plan including the first data processing operation such that the first data processing operation is performed by the executor node of the first query processing engine deployed at the host, and the portion of the execution plan further including the second data processing operation such that the second data processing operation is forwarded, by the executor node, to the instance of the second query processing engine deployed at the host to be performed by the instance of the second query processing engine deployed at the host.
9. The method of claim 8 , wherein pushing down the second data processing operation includes rewriting the second data processing operation in structured query language (SQL).
10. The method of claim 8 , wherein the second data processing operation is forwarded to the instance of the second query processing engine deployed at the host via an application programming interface (API).
11. The method of claim 8 , wherein the second data processing operation is determined to be pushed down to the second query processing engine based at least on the second data processing operation requiring data residing at the second query processing engine.
12. The method of claim 8 , wherein the second data processing operation is determined to be pushed down to the second query processing engine based at least on the second data processing operation operating on a result of one or more other data processing operations on data residing at the second query processing engine.
13. The method of claim 8 , wherein the first query processing engine comprises a distributed query processing engine, and wherein the second query processing engine comprises a distributed query processing engine or a non-distributed query processing engine.
14. A computer program product comprising a non-transitory machine-readable medium storing instructions that, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising: generating, at a master node of a first query processing engine, an execution plan for a query on data that is stored at a host from a plurality of hosts in a distributed database system, the execution plan including a first data processing operation and a second data processing operation, the distributed database system including the first query processing engine and a second query processing engine, the first query processing engine comprising the master node and a plurality of executor nodes, the second query processing engine comprising a plurality of instances of the second query processing engine, and each of the plurality of hosts including one of the plurality of executor nodes and one of the plurality of instances of the second query processing engine; determining to push down, to the second query processing engine, the second data processing operation such that the second data processing operation is performed by an instance of the second query processing engine deployed at the host instead of an executor node of the first query processing engine deployed at the host, the master node determining to push down the second data processing operation based at least on the second data processing operation not requiring data stored at any other one of the plurality of hosts, and the master node determining not to push down the first data processing operation based at least on the first data processing operation requiring data stored at another one of the plurality of hosts; and dispatching, to the executor node, at least a portion of the execution plan, the portion of the execution plan including the first data processing operation such that the first data processing operation is performed by the executor node of the first query processing engine deployed at the host, and the portion of the execution plan further including the second data processing operation such that the second data processing operation is forwarded, by the executor node, to the instance of the second query processing engine deployed at the host to be performed by the instance of the second query processing engine deployed at the host.
15. The method of claim 8 , wherein the master node determines to push down the second data processing operation based at least on the second query processing engine supporting the second data processing operation.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
September 23, 2016
April 28, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.