Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A computer-implemented method, comprising: selecting, during compile time of a query, one or more user-defined functions (UDFs) for which to perform one or more pilot runs; performing, by one or more computer processors, during the execution of the query and after completion of the compile time of the query, the one or more pilot runs of the one or more UDFs, comprising a pilot run associated with each of the one or more UDFs, and wherein each pilot run comprises at least partial execution of the associated UDF selected during compile time of the query and before execution of the query; collecting statistics on the one or more pilot runs of the one or more UDFs during performance of the one or more pilot runs of the one or more UDFs; and optimizing the query, after execution of the query begins, wherein the optimizing comprises: determining, during the execution of the query, an order of execution for predicates in the query based at least on part on the statistics on the one or more pilot runs of the one or more UDFs; wherein the statistics comprise information about performance of the one or more UDFs, and wherein the optimizing further comprises basing cardinality and cost estimates on the statistics collected during the one or more pilot runs of the one or more UDFs.
A computer system improves query performance by running "pilot runs" of user-defined functions (UDFs) during query execution. During query compilation, the system selects specific UDFs for pilot runs. Then, while the main query is running, it executes these UDFs partially, collecting performance statistics like execution time. Based on these statistics, the system re-optimizes the query execution plan, specifically by re-ordering the execution of predicates (WHERE clause conditions) and adjusting cardinality and cost estimates. This dynamic optimization happens after the query has started, using real-time UDF performance data to guide the query execution engine.
2. The method of claim 1 , wherein performing the one or more pilot runs of the one or more UDFs comprises performing two or more pilot runs in parallel across two or more processors.
The query optimization method, as previously described, enhances pilot run efficiency by executing two or more pilot runs of UDFs in parallel. This parallel execution is distributed across multiple processor cores or separate processors, reducing the overhead of pilot runs and providing faster feedback for query re-optimization. This allows the system to gather statistics about multiple UDFs concurrently, improving the overall query performance more quickly.
3. The method of claim 1 , further comprising storing, in a database metastore, the statistics collected during the one or more pilot runs of the one or more UDFs, wherein the statistics collected during a first pilot run associated with a first UDF are stored in the database metastore in association with a first expression signature identifying the first UDF.
The query optimization method, as previously described, includes storing the performance statistics collected during the UDF pilot runs in a database metastore. This metastore acts as a central repository for pilot run data. Specifically, the statistics from a pilot run of a particular UDF are stored along with an "expression signature" that uniquely identifies that UDF. This allows the system to retrieve and reuse these statistics in subsequent query optimizations when the same UDF is encountered again.
4. The method of claim 1 , wherein the one or more UDFs comprise a first UDF, and wherein performing the one or more pilot runs of the one or more UDFs comprises: searching a database metastore for previous pilot-run data related to the first UDF; and performing a first pilot run associated with the first UDF if the previous pilot run data related to the first UDF is not found in the database metastore.
The query optimization method, as previously described, optimizes the execution of pilot runs themselves. Before running a pilot run for a particular UDF, the system searches the database metastore for previously collected pilot run data for that UDF. If no prior data is found, then the pilot run is executed. This avoids redundant pilot runs and leverages existing knowledge to speed up query optimization.
5. The method of claim 4 , wherein the one or more UDFs further comprise a second UDF, and wherein performing the one or more pilot runs of the one or more UDFs comprises: searching the database metastore for previous pilot-run data related to the second UDF; and opting not to perform a pilot run associated with the second UDF if the previous pilot run data related to the second UDF is found in the database metastore.
Continuing from the previous optimization of pilot run execution, after searching the database metastore for previous pilot run data related to a UDF, if the metastore contains data from a previous pilot run for that UDF, the system chooses *not* to perform a new pilot run. Instead, the system reuses the existing statistics. This eliminates unnecessary pilot runs and further speeds up the query optimization process by leveraging previously collected information.
6. A system comprising: a memory having computer readable instructions; and one or more processors for executing the computer readable instructions, the computer readable instructions comprising: selecting, during compile time of a query, one or more user-defined functions (UDFs) for which to perform one or more pilot runs; and performing, during the execution of the query and after completion of the compile time of the query, the one or more pilot runs of the one or more UDFs, comprising a pilot run associated with each of the one or more UDFs, and wherein each pilot run comprises at least partial execution of the associated UDF selected during compile time of the query and before execution of the query; collecting statistics on the one or more pilot runs of the one or more UDFs during performance of the one or more pilot runs of the one or more UDFs; and optimizing the query, after execution of the query begins, wherein the optimizing comprises: determining, during the execution of the query, an order of execution for predicates in the query based at least on part on the statistics on the one or more pilot runs of the one or more UDFs; wherein the statistics comprise information about performance of the one or more UDFs, and wherein the optimizing further comprises basing cardinality and cost estimates on the statistics collected during the one or more pilot runs of the one or more UDFs.
A computer system dynamically optimizes database queries by running pilot runs of user-defined functions (UDFs). The system selects specific UDFs during query compilation for pilot runs. During query execution, these UDFs are partially executed, and performance statistics are collected. Based on these statistics, the system re-optimizes the query execution plan by re-ordering the execution of predicates and adjusting cardinality and cost estimates. This dynamic optimization uses real-time UDF performance data to guide the query execution engine during query execution.
7. The system of claim 6 , wherein performing the one or more pilot runs of the one or more UDFs comprises performing two or more pilot runs in parallel across two or more processors.
The query optimization system, as previously described, enhances pilot run efficiency by executing two or more pilot runs of UDFs in parallel across multiple processors. This allows the system to gather statistics about multiple UDFs concurrently, improving the overall query performance more quickly.
8. The system of claim 6 , the computer readable instructions further comprising storing, in a database metastore, the statistics collected during the one or more pilot runs of the one or more UDFs, wherein the statistics collected during a first pilot run associated with a first UDF are stored in the database metastore in association with a first expression signature identifying the first UDF.
The query optimization system, as previously described, stores the statistics collected during the UDF pilot runs in a database metastore. Statistics from a pilot run of a particular UDF are stored along with an "expression signature" that uniquely identifies that UDF. This allows the system to retrieve and reuse these statistics in subsequent query optimizations.
9. The system of claim 6 , wherein the one or more UDFs comprise a first UDF, and wherein performing the one or more pilot runs of the one or more UDFs comprises: searching a database metastore for previous pilot-run data related to the first UDF; and performing a first pilot run associated with the first UDF if the previous pilot run data related to the first UDF is not found in the database metastore.
The query optimization system, as previously described, optimizes the execution of pilot runs. Before running a pilot run for a particular UDF, the system searches the database metastore for previously collected pilot run data for that UDF. If no prior data is found, then the pilot run is executed. This avoids redundant pilot runs and leverages existing knowledge to speed up query optimization.
10. The system of claim 9 , wherein the one or more UDFs further comprise a second UDF, and wherein performing the one or more pilot runs of the one or more UDFs comprises: searching the database metastore for previous pilot-run data related to the second UDF; and opting not to perform a pilot run associated with the second UDF if the previous pilot run data related to the second UDF is found in the database metastore.
Continuing from the previous optimization, if the metastore contains data from a previous pilot run for a particular UDF, the system chooses *not* to perform a new pilot run. Instead, the system reuses the existing statistics. This eliminates unnecessary pilot runs and further speeds up the query optimization process.
11. A computer program product for optimizing a query, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform a method comprising: selecting, during compile time of a query, one or more user-defined functions (UDFs) for which to perform one or more pilot runs; performing, during the execution of the query and after completion of the compile time of the query, the one or more pilot runs of the one or more UDFs, comprising a pilot run associated with each of the one or more UDFs, and wherein each pilot run comprises at least partial execution of the associated UDF selected during compile time of the query and before execution of the query; collecting statistics on the one or more pilot runs of the one or more UDFs during performance of the one or more pilot runs of the one or more UDFs; and optimizing the query, after execution of the query begins, wherein the optimizing comprises: determining, during the execution of the query, an order of execution for predicates in the query based at least on part on the statistics on the one or more pilot runs of the one or more UDFs; wherein the statistics comprise information about performance of the one or more UDFs, and wherein the optimizing further comprises basing cardinality and cost estimates on the statistics collected during the one or more pilot runs of the one or more UDFs.
A computer program stored on a computer-readable medium optimizes database queries by running pilot runs of user-defined functions (UDFs). The program selects specific UDFs during query compilation for pilot runs. During query execution, these UDFs are partially executed, and performance statistics are collected. Based on these statistics, the program re-optimizes the query execution plan by re-ordering predicate execution and adjusting cardinality/cost estimates. The optimization uses real-time UDF performance data during query execution.
12. The computer program product of claim 11 , wherein performing the one or more pilot runs of the one or more UDFs comprises performing two or more pilot runs in parallel across two or more processors.
The query optimization computer program, as previously described, enhances pilot run efficiency by executing two or more pilot runs of UDFs in parallel across multiple processors, reducing overhead and providing faster feedback for query re-optimization.
13. The computer program product of claim 11 , the method further comprising storing, in a database metastore, the statistics collected during the one or more pilot runs of the one or more UDFs, wherein the statistics collected during a first pilot run associated with a first UDF are stored in the database metastore in association with a first expression signature identifying the first UDF.
The query optimization computer program, as previously described, includes storing the performance statistics collected during UDF pilot runs in a database metastore, storing statistics with an "expression signature" that uniquely identifies the UDF, so that the data can be retrieved and reused in subsequent query optimizations.
14. The computer program product of claim 11 , wherein the one or more UDFs comprise a first UDF, and wherein performing the one or more pilot runs of the one or more UDFs comprises: searching a database metastore for previous pilot-run data related to the first UDF; and performing a first pilot run associated with the first UDF if the previous pilot run data related to the first UDF is not found in the database metastore.
The query optimization computer program, as previously described, optimizes the execution of pilot runs by searching the database metastore for previously collected pilot run data for a specific UDF, and running a pilot run only if no prior data is found, avoiding redundant runs and speeding up query optimization.
15. The computer program product of claim 14 , wherein the one or more UDFs further comprise a second UDF, and wherein performing the one or more pilot runs of the one or more UDFs comprises: searching the database metastore for previous pilot-run data related to the second UDF; and opting not to perform a pilot run associated with the second UDF if the previous pilot run data related to the second UDF is found in the database metastore.
The query optimization computer program, as previously described, reuses existing pilot run data for a UDF if it's found in the database metastore, instead of performing a new pilot run, further speeding up the query optimization process.
Unknown
December 5, 2017
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.