Term Vector Modeling of Database Workloads

PublishedMay 10, 2022

Assigneenot available in USPTO data we have

Technical Abstract

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method comprising: generating a first set of one or more workloads representing a first set of queries of a plurality of queries executed on at least one database; generating a first plurality of query vectors, in a multidimensional vector space, corresponding respectively to each of the first set of queries; calculating, based on the first plurality of query vectors, a first workload vector corresponding to a first aggregate of workload across the first set of workloads; generating a second set of one or more workloads representing a second set of queries of the plurality of queries; generating a second plurality of query vectors, in the multidimensional vector space, corresponding respectively to each of the second set of queries; calculating, based on the second plurality of query vectors, a second workload vector corresponding to a second aggregate of workload across the second set of workloads; and generating a similarity score between the first set of workloads and the second set of workloads based on the first workload vector and the second workload vector, wherein the method is performed by at least one device including a hardware processor.

2. The method as recited in claim 1 , further comprising receiving selection criteria for generating the first set of workloads and the second set of workloads.

3. The method as recited in claim 2 , wherein the selection criteria is selected from a group consisting of: a period of time over which to detect the plurality of queries executed on the at least one database; one or more timeframes within the period of time on which to base generation of the first set of workloads and the second set of workloads; one or more locations associated with the plurality of queries; one or more users associated with the plurality of queries; and selection of one or more particular databases from the at least one database.

4. The method as recited in claim 1 , wherein each of the plurality of queries executed on the at least one database is associated with a SQL statement, wherein the first set of workloads comprises SQL identifiers of corresponding SQL statements associated with queries represented by the first set of workloads, and wherein the second set of workloads comprises SQL identifiers of corresponding SQL statements associated with queries represented by the second set of workloads.

5. The method as recited in claim 4 , wherein each unique SQL identifier included in the first set of workloads and the second set of workloads is represented in a multidimensional vector space by a vector having a distinct dimension.

6. The method as recited in claim 1 , wherein generating the first plurality of query vectors corresponding respectively to each of the first set of queries comprises: determining a first plurality of query identifiers corresponding respectively to each of the first set of queries; determining a first unique set of query identifiers from the first plurality of query identifiers; assigning unique dimensions in a multidimensional vector space corresponding respectively to each of the first unique set of query identifiers; and assigning values in each unique dimension in the multidimensional vector space to generate the first plurality of query vectors.

7. The method as recited in claim 6 , wherein the values assigned in each unique dimension are weighted based on a number of times a corresponding query was executed within the first set of workloads.

8. The method as recited in claim 1 , wherein the first workload vector is calculated by combining the first plurality of query vectors.

9. The method as recited in claim 1 , wherein the similarity score is a cosine similarity between the first workload vector and the second workload vector.

10. The method as recited in claim 1 , further comprising performing an action based on the similarity score, the action being selected from a group consisting of: adding the first set of workloads and the second set of workloads to a cluster which includes similar workloads, reserving resources for at least one database sufficient for processing the first set of workloads at an expected time of experiencing another set of workloads similar to the first set of workloads, identifying a pattern of similar sets of workloads across more than one database, and alerting an administrator about unexpected workload changes.

11. The method as recited in claim 1 , further comprising: determining that the first set of workloads and the second set of workloads are equivalent workloads based on analysis of the similarity score with respect to a similarity threshold value.

12. The method as recited in claim 1 , further comprising: generating a plurality of similarity score ranges comprising, at least: a first similarity score range indicating that the first set of workloads and the second set of workloads are equivalent workloads; and a second similarity score range indicating that the first set of workloads and the second set of workloads are dissimilar workloads; and comparing the similarity score to the plurality of similarity score ranges to determine a degree of similarity between the first set of workloads and the second set of workloads.

13. A system, comprising: one or more hardware processors; a non-transitory computer readable medium comprising instructions which, when executed by the one or more hardware processors, causes performance of operations comprising: generating a first set of one or more workloads representing a first set of queries of a plurality of queries executed on at least one database; generating a first plurality of query vectors, in a multidimensional vector space, corresponding respectively to each of the first set of queries; calculating, based on the first plurality of query vectors, a first workload vector corresponding to a first aggregate of workload across the first set of workloads; generating a second set of one or more workloads representing a second set of queries of the plurality of queries; generating a second plurality of query vectors, in the multidimensional vector space, corresponding respectively to each of the second set of queries; calculating, based on the second plurality of query vectors, a second workload vector corresponding to a second aggregate of workload across the second set of workloads; and generating a similarity score between the first set of workloads and the second set of workloads based on the first workload vector and the second workload vector.

14. The system as recited in claim 13 , wherein the operations further comprise receiving selection criteria for generating the first set of workloads and the second set of workloads, the selection criteria being selected from a group consisting of: a period of time over which to detect the plurality of queries executed on the at least one database; one or more timeframes within the period of time on which to base generation of the first set of workloads and the second set of workloads; one or more locations associated with the plurality of queries; one or more users associated with the plurality of queries; and selection of one or more particular databases from the at least one database.

15. The system as recited in claim 13 , wherein each of the plurality of queries executed on the database is associated with a SQL statement, wherein the first set of workloads comprises SQL identifiers of corresponding SQL statements associated with queries represented by the first set of workloads, and wherein the second set of workloads comprises SQL identifiers of corresponding SQL statements associated with queries represented by the second set of workloads.

16. The system as recited in claim 15 , wherein each unique SQL identifier included in the first set of workloads and the second set of workloads is represented in a multidimensional vector space by a vector having a distinct dimension.

17. The system as recited in claim 13 , wherein generating the first plurality of query vectors corresponding respectively to each of the first set of queries comprises: determining a first plurality of query identifiers corresponding respectively to each of the first set of queries; determining a first unique set of query identifiers from the first plurality of query identifiers; assigning unique dimensions in a multidimensional vector space corresponding respectively to each of the first unique set of query identifiers; and assigning values in each unique dimension in the multidimensional vector space to generate the first plurality of query vectors.

18. A non-transitory computer readable medium comprising instructions which, when executed by one or more hardware processors, causes performance of operations comprising: generating a first set of one or more workloads representing a first set of queries of a plurality of queries executed on at least one database; generating a first plurality of query vectors, in a multidimensional vector space, corresponding respectively to each of the first set of queries; calculating, based on the first plurality of query vectors, a first workload vector corresponding to a first aggregate of workload across the first set of workloads; generating a second set of one or more workloads representing a second set of queries of the plurality of queries; generating a second plurality of query vectors, in the multidimensional vector space, corresponding respectively to each of the second set of queries; calculating, based on the second plurality of query vectors, a second workload vector corresponding to a second aggregate of workload across the second set of workloads; and generating a similarity score between the first set of workloads and the second set of workloads based on the first workload vector and the second workload vector.

19. The non-transitory computer readable medium as recited in claim 18 , wherein the operations further comprise: determining that the first set of workloads and the second set of workloads are equivalent workloads based on analysis of the similarity score with respect to a similarity threshold value.

20. The non-transitory computer readable medium as recited in claim 18 , wherein generating the first plurality of query vectors corresponding respectively to each of the first set of queries comprises: determining a first plurality of query identifiers corresponding respectively to each of the first set of queries; determining a first unique set of query identifiers from the first plurality of query identifiers; assigning unique dimensions in a multidimensional vector space corresponding respectively to each of the first unique set of query identifiers; and assigning values in each unique dimension in the multidimensional vector space to generate the first plurality of query vectors.

Patent Metadata

Filing Date

Unknown

Publication Date

May 10, 2022

Inventors

John Mark Beresniewicz

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search