7249120

Method and Apparatus for Selecting Candidate Statistics to Estimate the Selectivity Value of the Conditional Selectivity Expression in Optimize Queries Based on a Set of Predicates That Each Reference a Set of Relational Database Tables

PublishedJuly 24, 2007
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
53 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A computer implemented method for approximating a number of tuples returned by a database query to optimize queries on a computerized database that comprises a set of predicates that each reference a set of database tables, the method comprising the steps of: a) determining whether a first estimated selectivity value of the query is stored and returning the stored first estimated selectivity value to approximate the number of tuples returned by the query; b) if a first estimated selectivity value of the query is not stored, expressing the query as an input selectivity value; c) determining if the query is separable based on properties of the input selectivity value; d) if the query is separable: separating the input selectivity value to form a product of query selectivity factors; recursively determining a selectivity value for each query selectivity factor; combining the selectivity values for each query selectivity factor to determine a second estimated selectivity value of the query; e) if the query is not separable: atomically decomposing the input selectivity value to form a product that comprises a conditional selectivity expression, wherein the atomically decomposing step is repeated for one or more atomic decompositions of the input selectivity value to produce a plurality of alternative products that comprise corresponding conditional selectivity expressions; for each of the plurality of alternative products, matching a corresponding conditional selectivity expression with stored statistics to obtain statistics that can estimate a selectivity value of the conditional selectivity expression by: compiling a set of candidate statistics that can be used to estimate the selectivity value of the conditional selectivity expression; selecting candidate statistics to estimate the selectivity value of the conditional selectivity expression based on a selection criteria; and using the statistics to obtain an estimated selectivity value of the conditional selectivity expression; and combining the estimated selectivity values of the conditional selectivity expressions corresponding to each alternative product to determine a third estimated selectivity value of the query; f) storing in memory either the second or third estimated selectivity value of the query based on whether the query is separable.

2

2. The method of claim 1 comprising the step of multiplying the second or third estimated selectivity value of the query by a Cartesian product of the tables referenced by the predicates to obtain a cardinality of the query.

3

3. The method of claim 1 wherein the step of separating the input selectivity value to form a product of query selectivity factors is performed by separating the predicates that reference different sets of database tables to form a product of query selectivity factors that reference different sets of database tables.

4

4. The method of claim 1 wherein each product formed in step e) further comprises a query selectivity factor and wherein a selectivity value for each query selectivity factor is recursively determined.

5

5. The method of claim 1 wherein steps c) are recursively performed until a non-separable query selectivity that can only be decomposed into a single conditional selectivity expression results.

6

6. The method of claim 1 comprising the step of associating an error with the third estimated selectivity value that is based on an accuracy with which the statistic matched with a conditional selectivity expression can estimate the selectivity of the conditional selectivity expression.

7

7. The method of claim 6 comprising the step of combining the error associated with each conditional selectivity expression to obtain an estimated error for the third estimated selectivity value for the query.

8

8. The method of claim 1 wherein the stored statistics comprise histograms on results of previously executed query expressions.

9

9. The method of claim 1 wherein the selection criteria for a candidate statistic is determined by computing a number of independence assumptions that are made when the candidate is used to estimate the selectivity value of the conditional selectivity expression and the selection criteria is to select the candidate that results in the least number of independence assumptions.

10

10. The method of claim 1 wherein the selection criteria for a candidate statistic is determined by comparing the candidate statistic with a base statistic over the same column as the candidate statistic and assigning a difference value to the candidate statistic based on a level of difference between the candidate statistic and the base statistic.

11

11. The method of claim 1 wherein the step of compiling a set of candidate statistics is performed by including statistics that are on results of queries having the same tables referenced by the conditional selectivity expression or a subset of the tables referenced by the conditional selectivity expression and the same predicates over the tables referenced in the conditional selectivity expression or a subset of the predicates over the tables referenced in the conditional selectivity expressions.

12

12. The method of claim 1 wherein the step of atomically decomposing the input selectivity value is done by exhausting every alternative way of atomically decomposing the query.

13

13. The method of claim 1 wherein the steps of atomically decomposing the input selectivity value to form a products that comprises a conditional selectivity expressions is performed based on an optimizer search strategy.

14

14. The method of claim 1 wherein the query is disjunctive and comprising the step of transforming the disjunctive predicates into conjunctive predicates by performing a De Morgan transformation on the disjunctive query.

15

15. The method of claim 1 wherein the query comprises a GROUP BY predicate over a grouping column and wherein the query is transformed prior to performance of the method steps to return a number of distinct values in the grouping column.

16

16. The method of claim 15 wherein the step of atomically decomposing the input selectivity value is performed by considering decompositions that are induced by coalescing grouping.

17

17. The method of claim 15 wherein the step of atomically decomposing the input selectivity value is performed by considering decompositions that are induced by invariant grouping.

18

18. The method of claim 1 wherein the stored statistics comprise histograms built over computed columns in a query result.

19

19. For use with a database system, a computer readable storage medium having computer executable instructions stored thereon for performing method steps to approximate a number of tuples returned by a database query to optimize queries on the database system that comprises a set of predicates that each reference a set of database tables, the method comprising the steps of: a) determining whether a first estimated selectivity value of the query is stored and returning the stored first estimated selectivity value to approximate the number of tuples returned by the query; b) if a first estimated selectivity value of the query is not stored, expressing the query as an input selectivity value; c) determining if the query is separable based on properties of the input selectivity value; d) if the query is separable: separating the input selectivity value to form a product of query selectivity factors; recursively determining a selectivity value for each query selectivity factor; combining the selectivity values for each query selectivity factor to determine a second estimated selectivity value of the query; e) if the query is not separable: atomically decomposing the input selectivity value to form a product that comprises a conditional selectivity expression, wherein the atomically decomposing step is repeated for one or more atomic decompositions of the input selectivity value to produce a plurality of alternative products that comprise corresponding conditional selectivity expressions; for each of the plurality of alternative products, matching a corresponding conditional selectivity expression with stored statistics to obtain statistics that can estimate a selectivity value of the conditional selectivity expression by: compiling a set of candidate statistics that can be used to estimate the selectivity value of the conditional selectivity expression; selecting candidate statistics to estimate the selectivity value of the conditional selectivity expression based on a selection criteria; and using the statistics to obtain an estimated selectivity value of the conditional selectivity expression; and combining the estimated selectivity values of the conditional selectivity expressions corresponding to each alternative product to determine a third estimated selectivity value of the query; and f) storing in memory either the second or third estimated selectivity value of the query based on whether the query is separable.

20

20. The computer readable storage medium of claim 19 comprising the step of multiplying the second or third estimated selectivity value of the query by a Cartesian product of the tables referenced by the predicates to obtain a cardinality of the query.

21

21. The computer readable storage medium of claim 19 wherein the step of separating input selectivity value to form a product of query selectivity factors is performed by separating the predicates that reference different sets of database tables to form a product of query selectivity factors that reference different sets of database tables.

22

22. The computer readable storage medium of claim 19 wherein each product formed in step e) further comprises a query selectivity factor and wherein a selectivity value for each query selectivity factor is recursively determined.

23

23. The computer readable storage medium of claim 19 wherein steps c) are recursively performed until a non-separable query selectivity that can only be decomposed into a single conditional selectivity expression results.

24

24. The computer readable storage medium of claim 19 comprising the step of associating an error with the third estimated selectivity value that is based on an accuracy with which the statistic matched with a conditional selectivity expression can estimate the selectivity of the conditional selectivity expression.

25

25. The computer readable storage medium of claim 24 comprising the step of combining the error associated with each conditional selectivity expression to obtain an estimated error for the third estimated selectivity value for the query.

26

26. The computer readable storage medium of claim 19 wherein the stored statistics comprise histograms on results of previously executed query expressions.

27

27. The computer readable storage medium of claim 19 wherein the selection criteria for a candidate statistic is determined by computing a number of independence assumptions that are made when the candidate is used to estimate the selectivity value of the conditional selectivity expression and the selection criteria is to select the candidate that results in the least number of independence assumptions.

28

28. The computer readable storage medium of claim 19 wherein the selection criteria for a candidate statistic is determined by comparing the candidate statistic with a base statistic over the same column as the candidate statistic and assigning a difference value to the candidate statistic based on a level of difference between the candidate statistic and the base statistic.

29

29. The computer readable storage medium of claim 19 wherein the step of compiling a set of candidate statistics is performed by including statistics that are on results of queries having the same tables referenced by the conditional selectivity expression or a subset of the tables referenced by the conditional selectivity expression and the same predicates over the tables referenced in the conditional selectivity expression or a subset of the predicates over the tables referenced in the conditional selectivity expressions.

30

30. The computer readable storage medium of claim 19 wherein the step of atomically decomposing the input selectivity value is done by exhausting every alternative way of atomically decomposing the query.

31

31. The computer readable storage medium of claim 19 wherein the steps of atomically decomposing the input selectivity value to form a product that comprises a conditional selectivity expressions is performed based on an optimizer search strategy.

32

32. The computer readable storage medium of claim 19 wherein the query is disjunctive and comprising the step of transforming the disjunctive predicates into conjunctive predicates by performing a De Morgan transformation on the disjunctive query.

33

33. The computer readable storage medium of claim 19 wherein the query comprises a GROUP BY predicate over a grouping column and wherein the query is transformed prior to performance of the method steps to return a number of distinct values in the grouping column.

34

34. The computer readable storage medium of claim 33 wherein the step of atomically decomposing the input selectivity value is performed by considering decompositions that are induced by coalescing grouping.

35

35. The computer readable storage medium of claim 33 wherein the step of atomically decomposing the input selectivity value is performed by considering decompositions that are induced by invariant grouping.

36

36. The computer readable storage medium of claim 19 wherein the stored statistics comprise histograms built over computed columns in a query result.

37

37. A computer implemented method for approximating a number of tuples returned by a database query to optimize queries on a computerized database that comprises a set of predicates that each reference a set of database tables, the method comprising the steps of: a) expressing the query as an input selectivity value; b) determining if the query is separable based on properties of the input selectivity value; c) if the query is separable: separating the input selectivity value by separating the predicates that reference different sets of database tables to form a product of query selectivity factors that reference different sets of database tables; recursively determining a selectivity value for each query selectivity factor; combining the selectivity values for each query selectivity factor to determine a first estimated selectivity value of the query; d) if the query is not separable: atomically decomposing the input selectivity value to form a product that comprises a conditional selectivity expression, wherein the atomically decomposing step is repeated for one or more atomic decompositions of the input selectivity value to produce a plurality of alternative products that comprise corresponding conditional selectivity expressions; for each of the plurality of alternative products, matching a corresponding conditional selectivity expression with stored statistics to obtain statistics that can estimate a selectivity value of the conditional selectivity expression by: compiling a set of candidate statistics that can be used to estimate the selectivity value of the conditional selectivity expression; selecting candidate statistics to estimate the selectivity value of the conditional selectivity expression based on a selection criteria; and using the statistics to obtain an estimated selectivity value of the conditional selectivity expression; and combining the estimated selectivity values of the conditional selectivity expressions corresponding to each alternative product to determine a second estimated selectivity value of the query; and e) storing in memory either the first or second estimated selectivity value of the query based on whether the query is separable.

38

38. The method of claim 37 wherein the selection criteria for a candidate statistic is determined by computing a number of independence assumptions that are made when the candidate is used to estimate the selectivity of the conditional selectivity expression and the selection criteria is to select the candidate that results in the least number of independence assumptions.

39

39. The method of claim 37 wherein the selection criteria for a candidate statistic is determined by comparing the candidate statistic with a base statistic over the same column as the candidate statistic and assigning a difference value to the candidate statistic based on a level of difference between the candidate statistic and the base statistic.

40

40. The method of claim 37 wherein the step of compiling a set of candidate statistics is performed by including statistics that are on results of queries having the same tables referenced by the conditional selectivity expression or a subset of the tables referenced by the conditional selectivity expression and the same predicates over the tables referenced in the conditional selectivity expression or a subset of the predicates over the tables referenced in the conditional selectivity expression.

41

41. The method of claim 37 wherein each product formed in step d) further comprises a query selectivity factor and wherein a selectivity value for each query selectivity factor is recursively determined.

42

42. The method of claim 37 wherein steps b) are recursively performed until a non-separable query selectivity that can only be decomposed into a single conditional selectivity expression results.

43

43. The method of claim 37 comprising the step of associating an error with the second estimated selectivity value that is based on an accuracy with which the statistic matched with a conditional selectivity expression can estimate the selectivity of the conditional selectivity expression.

44

44. The method of claim 43 comprising the step of combining the error associated with each conditional selectivity expression to obtain an estimated error for the second estimated selectivity value for the query.

45

45. For use with a database system, a computer readable storage medium having computer executable instructions stored thereon for performing method steps to approximate a number of tuples returned by a database query to optimize queries on the database system that comprises a set of predicates that each reference a set of database tables, the method comprising the steps of: a) expressing the query as an input selectivity value; b) determining if the query is separable based on properties of the input selectivity value; c) if the query is separable: separating the input selectivity value by separating the predicates that reference different sets of database tables to form a product of query selectivity factors that reference different sets of database tables; recursively determining a selectivity value for each query selectivity factor; combining the selectivity values for each query selectivity factor to determine a first estimated selectivity value of the query; d) if the query is not separable: atomically decomposing the input selectivity value to form a product that comprises a conditional selectivity expression, wherein the atomically decomposing step is repeated for one or more atomic decompositions of the input selectivity value to produce a plurality of alternative products that comprise corresponding conditional selectivity expressions; for each of the plurality of alternative products, matching a corresponding conditional selectivity expression with stored statistics to obtain statistics that can estimate a selectivity value of the conditional selectivity expression by: compiling a set of candidate statistics that can be used to estimate the selectivity value of the conditional selectivity expression; selecting candidate statistics to estimate the selectivity value of the conditional selectivity expression based on a selection criteria; and using the statistics to obtain an estimated selectivity value of the conditional selectivity expression; and combining the estimated selectivity values of the conditional selectivity expressions corresponding to each alternative product to determine a second estimated selectivity value of the query; and e) storing in memory either the first or second estimated selectivity value of the query based on whether the query is separable.

46

46. The computer readable storage medium of claim 45 wherein the selection criteria for a candidate statistic is determined by computing a number of independence assumptions that are made when the candidate is used to estimate the selectivity of the conditional selectivity expression and the selection criteria is to select the candidate that results in the least number of independence assumptions.

47

47. The computer readable storage medium of claim 45 wherein the selection criteria for a candidate statistic is determined by comparing the candidate statistic with a base statistic over the same column as the candidate statistic and assigning a difference value to the candidate statistic based on a level of difference between the candidate statistic and the base statistic.

48

48. The computer readable storage medium of claim 45 wherein the step of compiling a set of candidate statistics is performed by including statistics that are on results of queries having the same tables referenced by the conditional selectivity expression or a subset of the tables referenced by the conditional selectivity expression and the same predicates over the tables referenced in the conditional selectivity expression or a subset of the predicates over the tables referenced in the conditional selectivity expression.

49

49. The computer readable storage medium of claim 45 wherein each product formed in step d) further comprises a query selectivity factor and wherein a selectivity value for each query selectivity factor is recursively determined.

50

50. The computer readable storage medium of claim 45 wherein steps b) are recursively performed until a non-separable query selectivity that can only be decomposed into a single conditional selectivity expression results.

51

51. The computer readable storage medium of claim 45 comprising the step of associating an error with the second estimated selectivity value that is based on an accuracy with which the statistic matched with a conditional selectivity expression can estimate the selectivity of the conditional selectivity expression.

52

52. The computer readable storage medium of claim 51 comprising the step of combining the error associated with each conditional selectivity expression to obtain an estimated error for the second estimated selectivity value for the query.

53

53. A computer system comprising one or more processors and memory for approximating a number of tuples returned by a database query to optimize queries on a computerized database that comprises a set of predicates that each reference a set of database tables comprising: a) means for determining whether a first estimated selectivity value of the query is stored and means for returning the stored first estimated selectivity value to approximate the number of tuples returned by the query; b) means for expressing the query as an input selectivity value if a first estimated selectivity value of the query is not stored; c) means for determining if the query is separable based on properties of the input selectivity value; d) means for separating the input selectivity value to form a product of query selectivity factors if the query is separable, further comprising: means for recursively determining a selectivity value for each query selectivity factor; means for combining the selectivity values for each query selectivity factor to determine a second estimated selectivity value of the query; e) means for atomically decomposing the input selectivity value to form a product that comprises a conditional selectivity expression if the query is not separable, wherein the atomically decomposing is repeated for one or more atomic decompositions of the input selectivity value to produce a plurality of alternative products that comprise corresponding conditional selectivity expressions, further comprising; means for matching a corresponding conditional selectivity expression for each of the plurality of alternative products with stored statistics to obtain statistics that can estimate a selectivity value of the conditional selectivity expression, comprising: means for compiling a set of candidate statistics that can be used to estimate the selectivity value of the conditional selectivity expression; means for selecting candidate statistics to estimate the selectivity value of the conditional selectivity expression based on a selection criteria; and means for using the statistics to obtain an estimated selectivity value of the conditional selectivity expression; and means for combining the estimated selectivity values of the conditional selectivity expressions corresponding to each alternative product to determine a third estimated selectivity value of the query; and f) means for storing in memory either the second or third estimated selectivity value of the query based on whether the query is separable.

Patent Metadata

Filing Date

Unknown

Publication Date

July 24, 2007

Inventors

Nicolas Bruno
Surajit Chaudhuri

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “METHOD AND APPARATUS FOR SELECTING CANDIDATE STATISTICS TO ESTIMATE THE SELECTIVITY VALUE OF THE CONDITIONAL SELECTIVITY EXPRESSION IN OPTIMIZE QUERIES BASED ON A SET OF PREDICATES THAT EACH REFERENCE A SET OF RELATIONAL DATABASE TABLES” (7249120). https://patentable.app/patents/7249120

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.