A report cost estimation module is used with cloud storage systems that store and process client data. Reports on the cloud stored client data take a significant amount of CPU, memory, storage and networking resources. A data analysis model dynamically estimates pricing for client data reports according to resource consumption by identifying a cluster group for the report and using the designated regression model for the cluster group. The data analysis can store the estimated and actual report costs and improve the estimated report costs using machine learning algorithms and crowed sourcing techniques. The report price is accurately provided to the client before running the report and thus allowing the customer to carefully manage a client report budget.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A method for determining estimating costs for reports for data stored on cloud based storage comprising: providing a data analysis model having a processor, a clustering module with a plurality of cluster report groups, a database storing historic reports, estimated costs and actual costs for the historic reports, and a plurality of regression models wherein each of the plurality of cluster report groups has a designated regression model from the plurality of regression models; receiving a first set of report parameters by the data analysis model for a first report by the data analysis model from a first client computer; determining a first cluster report group that matches the first report parameters for the first report from the plurality of cluster report groups; determining a first designated regression model that produces most accurate estimated report costs compared to actual report costs for the first cluster report group; calculating a first estimated cost of the first report by the processor applying the first designated regression model to the first set of report parameters before the first report is run; transmitting the first estimated cost of the first report to the first client computer; receiving by the processor, an approval of the first estimated cost from the first client computer; and preparing by the processor, the first report after receiving the approval of the first estimated cost.
2. The method of claim 1 further comprising: receiving a second set of report parameters by the data analysis model for a second report by the data analysis model from a second client computer; determining a second cluster report group that matches the second report parameters for the second report from the plurality of cluster report groups; determining a second designated regression model that produces most accurate estimated report costs compared to actual report costs for the second cluster report group wherein the second designated regression model is different than the first designated regression model; calculating a second estimated cost of the second report by the processor applying the second designated regression model to the second set of report parameters before the second report is run; transmitting the second estimated cost of the first report to the first client computer; receiving a rejection of the second estimated cost; and canceling the second report after receiving the rejection of the second estimated cost.
3. The method of claim 1 wherein the plurality of regression models includes: a support vector machine, a regression tree, a linear regression, a multilayer perceptron, Gaussian processes and/or a zero-R classifier.
4. The method of claim 1 wherein the first set of report parameters includes: data sources, operators, and database size.
5. The method of claim 1 wherein the first set of report parameters includes data sources and operators.
6. The method of claim 1 further comprising: receiving a second set of report parameters by the data analysis model for a second report by the data analysis model from a second client computer; determining a second cluster report group having a second designated regression model from the plurality of cluster groups that matches the second report parameters for the second report, wherein the second designated regression model is different than the first designated regression model; determining a second designated regression model that produces most accurate estimated report costs compared to actual report costs for the second cluster report group; calculating a second estimated cost of the second report by the processor applying the second designated regression model to the second set of report parameters before the second report is run; receiving by the processor, an approval of the second estimated cost from the second client computer; and transmitting the second estimated cost of the second report to the second client computer.
7. The method of claim 6 further comprising: receiving approval of the second estimated cost from the second client computer before preparing the second report.
8. The method of claim 6 further comprising: storing the second estimated cost and an actual cost for preparing the second report in a database.
9. The method of claim 1 further comprising: providing report parameters for a plurality of prior reports by the data analysis model to the processor; grouping the plurality of prior reports into a plurality of cluster groups wherein the prior reports in each of the cluster groups have similar report parameters; calculating the estimated costs of each of the plurality of prior reports by processing the report parameters with plurality of regression models; determining the regression models that produce the estimated costs of the prior reports that most closely matches the actual costs of the reports for each of the plurality of prior reports; and identifying designated regression models for each of the cluster groups of the prior reports.
10. The method of claim 9 wherein the plurality of regression models include: a support vector machine, a regression tree, a linear regression, a multilayer perceptron, Gaussian processes and a zero-R classifier.
11. The method of claim 1 wherein the report parameters include one or more of: data sources, operators, and database size.
12. A cloud storage platform with a data analysis model comprising: a processor, a clustering module with a plurality of cluster report groups, a database storing historic reports, estimated costs and actual costs for the historic reports, and a plurality of regression models wherein each of the plurality of cluster report groups has a designated regression model from the plurality of regression models; wherein the processor receives a first set of report parameters by the data analysis model for a first report from a first client computer; the processor determines a first selected cluster report group that matches the first report parameters for the first report; the processor determines a first designated regression model that produces most accurate estimated report costs compared to actual report costs for the first cluster report group; the processor calculates a first estimated cost for the first report by applying the first designated regression model to the first set of report parameters before the first report is run; and the processor transmits the first estimated cost of the first report to the first client computer and receives approval of the first estimated cost from the first client computer before preparing the first report.
13. A computer program product comprising a non-transitory computer usable medium having machine readable code embodied therein for: providing a data analysis model having a clustering module with a plurality of cluster report groups, a database storing historic reports, estimated costs and actual costs for the historic reports, and a plurality of regression models wherein each of the plurality of cluster report groups has a designated regression model from the plurality of regression models; receiving a first set of report parameters by the data analysis model for a first report by the data analysis model from a first client computer; determining a first cluster report group that matches the first report parameters for the first report from the plurality of cluster report groups; determining a first designated regression model that produces most accurate estimated report costs compared to actual report costs for the first cluster report group; calculating a first estimated cost of the first report by the processor applying the first designated regression model to the first set of report parameters before the first report is run; transmitting the first estimated cost of the first report to the first client computer; receiving approval of the first estimated cost from the first client computer; and preparing the first report after receiving the approval of the first estimated cost.
14. The computer program product of claim 13 further comprising: receiving a second set of report parameters by the data analysis model for a second report by the data analysis model from a second client computer; determining a second cluster report group that matches the second report parameters for the second report from the plurality of cluster report groups; determining a second designated regression model that produces most accurate estimated report costs compared to actual report costs for the second cluster report group wherein the second designated regression model is different than the first designated regression model; calculating a second estimated cost of the second report by the processor applying the second designated regression model to the second set of report parameters before the second report is run; transmitting the second estimated cost of the first report to the first client computer; receiving a rejection of the second estimated cost; and canceling the second report after receiving the rejection of the second estimated cost.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
June 27, 2016
November 3, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.