Parallel Preparation of a Query Execution Plan in a Massively Parallel Processing Environment Based on Global and Low-Level Statistics

PublishedDecember 18, 2018

Assigneenot available in USPTO data we have

InventorsLukasz Gaza Artur M. Gruszecki Tomasz Kazalski Konrad K. Skibski Tomasz K. Stradomski

Technical Abstract

Patent Claims

17 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A computer-implemented method for preparation and execution of an optimal query execution plan in a massively parallel processing environment wherein individual processors have varying characteristics, the method comprising: receiving, by one or more computer processors, a query involving one or more data tables, wherein the one or more data tables are stored on a plurality of processing nodes in the massively parallel processing environment; determining, by one or more computer processors, whether any of the one or more data tables are too large to be broadcast; responsive to determining any of the one or more data tables are too large to be broadcast, distributing, by one or more computer processors, the data tables determined to be too large to be broadcast to the plurality of processing nodes; broadcasting, by the one or more computer processors, to the plurality of processing nodes, one or more of the data tables, determined not to be too large for broadcasting, and a pre-processed query, wherein the plurality of processing nodes are of differing processing capability; receiving, by the one or more computer processors, candidate plans comprising a set of processing-node-specific query execution plans and an execution cost estimate associated with each processing-node-specific query execution plan, wherein the set of processing-node-specific query execution plans and associated execution cost estimates is prepared in parallel on the plurality of processing nodes based on global statistics related to more than one processing node, low level statistics related to individual processing nodes and adjustments, by the individual processing nodes, based on the differences between processing-node-specific query execution plans and wherein each of the processing-node-specific query execution plans is optimized by an associated one of the plurality of processing nodes for the one of the plurality of processing nodes; selecting, by the one or more computer processors, an optimal query execution plan based on minimized execution cost, wherein the optimal query execution plan is one of the candidate plans; and executing, by the one or more computer processors, the optimal query execution plan.

2. The computer-implemented method of claim 1 , wherein execution cost comprises time required for execution.

3. The computer-implemented method of claim 1 , wherein low level statistics comprise statistics on the level of an individual processing node and data stored on the individual processing node.

4. The computer-implemented method of claim 1 , wherein the candidate plans comprise whole-system-optimal query execution plans selected by the plurality of processing nodes.

5. The computer-implemented method of claim 4 , wherein the whole-system-optimal query execution plans are selected by the plurality of processing nodes based on comparison, among the plurality of processing nodes, of a plurality of processing-node-optimal query execution plans.

6. The computer-implemented method of claim 5 , wherein the candidate plans are based on adjustments to the set of processing-node-optimal query execution plans, based on sharing of information concerning the set of processing-node-optimal query execution plans among the plurality of processing nodes.

7. A computer program product for preparation and execution of an optimal query execution plan in a massively parallel processing environment wherein individual processors have varying characteristics, the computer program product comprising: one or more computer readable storage media and program instructions stored on the one or more computer readable storage media, the program instructions comprising: program instructions to receive a query involving one or more data tables, wherein the one or more data tables are stored on a plurality of processing nodes in the massively parallel processing environment; program instructions to determine whether any of the one or more data tables are too large to be broadcast; responsive to determining any of the one or more data tables are too large to be broadcast, program instructions to distribute the data tables determined to be too large to be broadcast to the plurality of processing nodes; program instructions to broadcast to the plurality of processing nodes, one or more of the data tables, determined not to be too large for broadcasting, and a pre-processed query, wherein the plurality of processing nodes are of differing processing capability; program instructions to receiving receive candidate plans comprising a set of processing-node-specific query execution plans and an execution cost estimate associated with each processing-node-specific query execution plan, wherein the set of processing-node-specific query execution plans and associated execution cost estimates is prepared in parallel on the plurality of processing nodes based on global statistics related to more than one processing node, low level statistics related to individual processing nodes and adjustments, by the individual processing nodes, based on the differences between processing-node-specific query execution plans and wherein each of the processing-node-specific query execution plans is optimized by an associated one of the plurality of processing nodes for the one of the plurality of processing nodes; program instructions to select an optimal query execution plan based on minimized execution cost, wherein the optimal query execution plan is one of the candidate plans; and program instructions to execute the optimal query execution plan.

8. The computer program product of claim 7 , wherein execution cost comprises time required for execution.

9. The computer program product of claim 7 , wherein low level statistics comprise statistics on the level of an individual processing node and data stored on the individual processing node.

10. The computer program product of claim 7 , wherein the candidate plans comprise whole-system-optimal query execution plans selected by the plurality of processing nodes.

11. The computer program product of claim 7 , wherein the whole-system-optimal query execution plans are selected by the plurality of processing nodes based on comparison, among the plurality of processing nodes, of a plurality of processing-node-optimal query execution plans.

12. The computer program product of claim 11 , wherein the candidate plans are based on adjustments to the set of processing-node-optimal query execution plans, based on sharing of information concerning the set of processing-node-optimal query execution plans among the plurality of processing nodes.

13. A computer system for preparation and execution of an optimal query execution plan in a massively parallel processing environment wherein individual processors have varying characteristics, the computer system comprising: one or more processors; one or more computer readable storage media; and program instructions stored on the one or more computer readable storage media for execution by at least one of the one or more processors, the program instructions comprising: program instructions to receive a query involving one or more data tables, wherein the one or more data tables are stored on a plurality of processing nodes in the massively parallel processing environment; program instructions to determine whether any of the one or more data tables are too large to be broadcast; responsive to determining any of the one or more data tables are too large to be broadcast, program instructions to distribute the data tables determined to be too large to be broadcast to the plurality of processing nodes; program instructions to broadcast to the plurality of processing nodes, one or more of the data tables, determined not to be too large for broadcasting, and a pre-processed query, wherein the plurality of processing nodes are of differing processing capability; program instructions to receiving receive candidate plans comprising a set of processing-node-specific query execution plans and an execution cost estimate associated with each processing-node-specific query execution plan, wherein the set of processing-node-specific query execution plans and associated execution cost estimates is prepared in parallel on the plurality of processing nodes based on global statistics related to more than one processing node, low level statistics related to individual processing nodes and adjustments, by the individual processing nodes, based on the differences between processing-node-specific query execution plans and wherein each of the processing-node-specific query execution plans is optimized by an associated one of the plurality of processing nodes for the one of the plurality of processing nodes; program instructions to select an optimal query execution plan based on minimized execution cost, wherein the optimal query execution plan is one of the candidate plans; and program instructions to execute the optimal query execution plan.

14. The computer system of claim 13 , wherein low level statistics comprise statistics on the level of an individual processing node and data stored on the individual processing node.

15. The computer system of claim 13 , wherein the candidate plans comprise whole-system-optimal query execution plans selected by the plurality of processing nodes.

16. The computer system of claim 13 , wherein the whole-system-optimal query execution plans are selected by the plurality of processing nodes based on comparison, among the plurality of processing nodes, of a plurality of processing-node-optimal query execution plans.

17. The computer system of claim 13 , wherein the candidate plans are based on adjustments to the set of processing-node-optimal query execution plans, based on sharing of information concerning the set of processing-node-optimal query execution plans among the plurality of processing nodes.

Patent Metadata

Filing Date

Unknown

Publication Date

December 18, 2018

Inventors

Lukasz Gaza

Artur M. Gruszecki

Tomasz Kazalski

Konrad K. Skibski

Tomasz K. Stradomski

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search