Method and Device for Processing Distributed Data Solving Problem of Manual Intervention by Data Analysts

PublishedNovember 9, 2021

Assigneenot available in USPTO data we have

InventorsLingyun Gu Zhipan Guo Wei Wang Jianye Liu

Technical Abstract

Patent Claims

6 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method for processing distributed data, applied to a cluster server communicating with a distributed computing cluster, comprising the following operations: integrating and configuring data analysis services of multiple users with different data analysis requirements into a distributed computing engine program to obtain an analysis service data package, wherein different data analysis services are distinguished by different class files in the analysis service data package; configuring a distributed scheduler in the cluster server according to the analysis service data package, and calling the distributed scheduler to monitor a message content transmitted by a message middleware including multiple data analysis services to be executed; and generating a distributed data execution plan according to the message content, and performing distributed scheduling calculation on the distributed data execution plan to obtain a distributed calculation result; wherein the operation of performing distributed scheduling calculation on the distributed data execution plan to obtain a distributed calculation result comprises: parsing plan information of the distributed data execution plan, the plan information including a data analysis service list, an analysis service type, and a task plan identifier corresponding to each data analysis service in the data analysis service list; starting a first thread and a second thread according to the plan information, the first thread being for starting a target distributed computing engine program corresponding to the analysis service type, and obtaining a return status code of the target distributed computing engine program, the second thread being for obtaining log information of the target distributed computing engine program; transmitting the data analysis service list into the target distributed computing engine program after starting the target distributed computing engine program corresponding to the analysis service type; and loading data to be calculated corresponding to the task plan identifier from a predefined data source table, executing the data analysis service of the transmitted data analysis service list through the target distributed computing engine program, and performing the distributed scheduling calculation on the data to be calculated to obtain the distributed computing result; and the operation of executing the data analysis service of the transmitted data analysis service list through the target distributed computing engine program, and performing the distributed scheduling calculation on the data to be calculated to obtain the distributed computing result comprises: when the analysis service type is a retrospective analysis service type, searching whether there are target data analysis services in the data analysis service list that depend on other data analysis services through the target distributed computing engine program, wherein the other data analysis services do not exist in the data analysis service list; when there are target data analysis services in the data analysis service list that depend on the other data analysis services, adding the other data analysis services to the data analysis service list; and sorting the data analysis service list according to order of each message content in the pre-defined message content sorting list, and executing each data analysis service in the data analysis service list according to the sorting result, respectively scheduling the data to be calculated corresponding to each data analysis service to each computing node in the distributed computing cluster to execute the corresponding distributed computing task, to obtain the distributed computing result.

2. The method of claim 1 , wherein the operation of integrating and configuring data analysis services of multiple users with different data analysis requirements into a distributed computing engine program to obtain an analysis service data package comprises: defining each data analysis service as an interface service in the distributed computing engine, configuring a calculation logic corresponding to each interface service, and configuring a matrix data table returned by the calculation result of each interface service; and integrating and configuring each interface service according to each matrix data table to obtain the analysis service data package through the distributed computing engine.

3. The method of claim 1 , wherein the operation of generating a distributed data execution plan according to the message content comprises: when monitoring the message content transmitted by the message middleware, storing the message content and a transmission timestamp corresponding to the message content in a preset database, and setting an execution state of the message content to an unexecuted state; scanning the preset database every preset time interval, when it is found that there is a message content whose execution status is not executed in the preset database, and there is no execution program whose execution status is executing, sorting each message content in order of the transmission timestamp of the message content whose execution status is not executed, and generating a message content sorting list; respectively generating a distributed data execution plan for each message content according to the order of each message content in the message content sorting list; and when execution of the distributed data execution plan corresponding to any message content is completed, setting the execution state of the message content to show the message content is executed.

4. The method of claim 1 , wherein the operation of executing the data analysis service of the transmitted data analysis service list through the target distributed computing engine program, and performing the distributed scheduling calculation on the data to be calculated to obtain the distributed computing result comprises: when the analysis service type is a cache collision service type, traversing each data analysis service that needs to be cached in the data analysis service list through the target distributed computing engine program; and obtaining, according to the data analysis service, from a pre-defined cache table, collision cache data that belong to the data analysis service and are associated and matched with the data to be calculated, using the collision cache data corresponding to all data analysis services as the distributed calculation result.

5. The method of claim 1 , wherein the operation of executing the data analysis service of the transmitted data analysis service list through the target distributed computing engine program, and performing the distributed scheduling calculation on the data to be calculated to obtain the distributed computing result comprises: when the analysis service type is a script scoring service type, obtaining a scoring script corresponding to each data analysis service in the data analysis service list and all external files that the scoring script depends on through the target distributed computing engine program; and traversing the data to be calculated, calling the scoring script and all external files that the scoring script depends on to calculate the data to be calculated, to obtain the distributed calculation result.

6. A device for processing distributed data, applied to a duster server communicating with a distributed computing cluster, comprising: software function modules stored in a non-transitory machine-readable storage medium and a processor, wherein when the software function modules are executed by the processor, the method for processing the distributed data of claim 1 is performed.

Patent Metadata

Filing Date

Unknown

Publication Date

November 9, 2021

Inventors

Lingyun Gu

Zhipan Guo

Wei Wang

Jianye Liu

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search