Patentable/Patents/US-6263345
US-6263345

Histogram synthesis modeler for a database query optimizer

PublishedJuly 17, 2001
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

The invention provides a mechanism for using statistics, in connection with various database query cost modeling techniques, to more accurately estimate the number of rows and UECs that will be produced by relational operators and predicates in database systems. The ability to accurately estimate the number of rows and UECs returned by a relational operator and/or a predicate is fundamental to computing the cost of a query execution plan. This, in turn, drives the optimizer's ability to select the query plan best suited for the desired performance goal. According to the present invention, histogram statistics are synthesized bottom up from the leaf nodes to the root node of a query tree. Given input statistics in the form of histograms for each operand of a relational operator or predicate, the present inventive method and apparatus merge the input statistics in a way that it simulates the effects of the run time operator on the actual data, so as to produce a predicted row count and UEC for each histogram interval representative of the data that actually will be produced by each such operator or predicate in the query tree. A database query optimizer may use these statistics to select and implement an optimal query plan.

Patent Claims
14 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A process for producing predictive information about execution of an operation on data in a relational database, the process performed by a data processing system and comprising: receiving statistical information about rows of data in the relational database, the statistical information existing for each of a plurality of intervals of the data, wherein the statistical information includes a UEC (unique entry count) for certain of the plural intervals; and manipulating the statistical information, in accordance with the operation, on an interval by interval basis, to yield new statistical information also having a plurality of intervals and reflecting the predicted effect of the operation on the data.

2

2. The process of claim 1, wherein the statistical information includes a row count for certain of the plural intervals.

3

3. A process for producing predictive information about execution of an operation on data in a relational database, the process performed by a data processing system and comprising: receiving statistical information about rows of data in the relational database, the statistical information existing for each of a plurality of intervals of the data, wherein the statistical information includes at least one row count and at least one UEC for certain of the plural intervals; and manipulating the statistical information, in accordance with the operation, on an interval by interval basis, to yield new statistical information also having a plurality of intervals and reflecting the predicted effect of the operation on the data.

4

4. The process of claim 3, wherein the operation includes the execution of a Join operator, and wherein the statistical information includes a first set of two histograms representing distributions of data in two database tables, each histogram having row counts and UECs reflecting the distribution of the data in the two databases.

5

5. The process of claim 4 wherein, if the original two histograms have different intervals, then the manipulating step further includes the step of normalizing the histograms to have the same number of intervals, each interval having the same lower and upper bounds.

6

6. The process of claim 4, wherein the manipulating step further includes the step of producing a second set of two histograms resulting from a cross-product of the first set of two histograms.

7

7. The process of claim 6, wherein the operator is an Inner Join and the manipulating step includes the application of the following formulas on each interval of the second set of two histograms: EQU NumUec=MinOf(LeftUec, RightUec) EQU NumRows=(LeftRowCount*RightRowCount)/MAXOF(LeftUec,RightUec)/XProwCount)

8

8. The process of claim 6, wherein the operator is a Semi-Join and the following formulas are applied for each interval: EQU NumUec=MINOF(LeftUec, RightUec) EQU NumRows =LeftRowCount*(NumUec/LeftUec)

9

9. The process of claim 4, wherein the new statistical information is passed to a parent operator.

10

10. The process of claim 3, wherein the operator is a Scan and wherein all predicates on the base table are applied to the associated intervals.

11

11. A process for producing predictive information about execution of a query on data in a relational database, the process performed by a data processing system and comprising: creating a query data structure in memory for the query; selecting a current operation corresponding to a current node in the query data structure; receiving statistical information about rows of data in the relational database associated with at least one child of the current node, the statistical information existing for each of a plurality of intervals of the data, wherein the statistical information includes a UEC for certain of the plural intervals; and manipulating the statistical information, in accordance with the current operation, on an interval by interval basis, to yield statistical information for the current node, having a plurality of intervals and reflecting the predicted effect of the operation on the data.

12

12. The process of claim 11, wherein the query data structure is a query tree.

13

13. A data processing apparatus for producing predictive information about execution of a query on data in a relational database, the apparatus comprising: means for creating a query data structure in memory for the query; means for selecting a current operation corresponding to a current node in the query data structure; means for receiving statistical information about rows of data in the relational database associated with at least one child of the current node, the statistical information existing for each of a plurality of intervals of the data, wherein the statistical information includes a UEC for certain of the plural intervals; and means for manipulating the statistical information, in accordance with the current operation, on an interval by interval basis, to yield statistical information for the current node, having a plurality of intervals and reflecting the predicted effect of the operation on the data.

14

14. A database apparatus for producing predictive information about execution of a query on data in the database, the apparatus having a memory and comprising: a portion configured to create a query data structure in the memory, wherein the query data structure includes parent and child nodes; a portion configured to select a current operation corresponding to a current node in the query data structure; a portion configured to receive statistical information about rows of data in the database associated with at least one child of the current node, the statistical information existing for each of a plurality of intervals of the data, wherein the statistical information includes a UEC for certain of the plural intervals; and a portion configured to manipulate the statistical information, in accordance with the current operation, on an interval by interval basis, to yield statistical information for the current node, having a plurality of intervals and reflecting the predicted effect of the operation on the data.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

September 28, 1998

Publication Date

July 17, 2001

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Histogram synthesis modeler for a database query optimizer” (US-6263345). https://patentable.app/patents/US-6263345

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.