Embodiments of the present disclosure provide a method of analyzing data in a storage system, a storage system, and a computer program product. The method includes: in response to detecting a request for a data analytic job, obtaining target data for the data analytic job from a first storage device of the storage system. The method also includes storing the target data into a second storage device of the storage system that is assigned for data analysis, and performing the data analytic job using a data processing device and the second storage device in the storage system.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A method of analyzing data in a storage system including a plurality of host computing devices (hosts), the method comprising: performing, by a storage virtual machine (VM) running on a first host of the storage system, a data storage job directed at a first storage device of the storage system; after performing the data storage job, orchestrating a data analytic job with respect to target data stored in the first storage device by: issuing a data analytic request by a scheduler of the storage system to a data analytic VM running on a second host of the storage system; in response to the data analytic VM receiving the data analytic request, obtaining metadata that indicates information about the target data including its position from a data retrieving VM running on the second host; issuing a data request for the target data using the metadata from the data analytic VM to the storage VM; obtaining, by the storage VM, the target data for the data analytic job from the first storage device; storing the obtained target data into a second storage device of the storage system that is assigned for data analysis, the second storage device being local to the second host; and performing the data analytic job by the data analytic VM, including the data analytic VM accessing the target data from the second storage device.
2. The method of claim 1 , wherein performing the data analytic job comprises: creating a plurality of virtual machines for data analysis in the storage system; and scheduling the data analytic job onto the plurality of virtual machines.
3. The method of claim 2 , further comprising: virtualizing the second storage device into a plurality of virtual storage devices; and allocating the plurality of virtual storage devices for the plurality of virtual machines.
4. The method of claim 3 , wherein scheduling the data analytic job onto the plurality of virtual machines comprises: scheduling a first task of the data analytic job onto a first virtual machine of the plurality of virtual machines, the first virtual machine being associated with a virtual storage device of the plurality of virtual storage devices that stores the target data, and the first task directly analyzing the target data; and scheduling a second task of the data analytic job onto a second virtual machine of the plurality of virtual machines, the second task analyzing an intermediate result produced by the first task.
5. The method of claim 1 , wherein the second storage device includes a cache device of the storage system, the method further comprising: in response to the data analytic VM receiving the data analytic request transmitting an add command to the cache device to assign the cache device for data analysis; and in response to completion of the data analytic job, transmitting a remove command to the cache device to cease assignment of the cache device for the data analysis.
6. The method of claim 1 , wherein the data analytic job includes a first task and a second task, the second task being based on an intermediate result produced by the first task, and the method further comprising: storing the intermediate result in a cache device of the storage system during the performing of the data analytic job.
7. The method of claim 1 , further comprising the scheduler issuing the data analytic request in response to detection of completion of the data storage job.
8. The method of claim 1 , further comprising: storing into the first storage device a result of the performing of the data analytic job.
9. The method of claim 1 , wherein the data analytic job includes a MapReduce job.
10. The method of claim 1 wherein: the storage system is a backup system; the data storage job is a backup job configured to be performed within a backup time period; and issuing the data analytic request is performed in response to the scheduler detecting that the backup time period has expired.
11. The method of claim 1 wherein the data storage job includes one of: saving data to the first storage device; data replication from the first storage device; data de-duplication with respect to the first storage device; and data recovery from the first storage device.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
June 21, 2017
March 3, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.