Managing Snapshots and Application State in Micro-Batch Based Event Processing Systems

PublishedJuly 14, 2020

Assigneenot available in USPTO data we have

InventorsHoyong Park Sandeep Bishnoi Prabhu Thukkaram Santosh Kumar Pavan Advani+2 more

Technical Abstract

Patent Claims

17 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method for managing snapshots created from a Continuous Query Language (CQL) engine, comprising: receiving, by a computing device, a continuous query; applying, by the computing device, a Directed Acrylic Graph (DAG) transformation to the continuous query to generate a query plan for the continuous query, wherein the query plan is an ordered set of steps used to access data for processing of the continuous query; applying, by the computing device, a CQL transformation to the query plan to generate a transformed query plan; receiving, by a computing device, a micro-batch stream of input events related to an application; processing, by the computing device, the input events using the CQL engine to generate a set of output events related to the application, wherein the processing comprises: performing, by the CQL engine, incremental computation on each of the input events of the micro-batch stream for the continuous query based at least in part on the transformed query plan; and creating, by the CQL engine, output events for each of the input events of the micro-batch stream, wherein the set of output events comprise the output events for each of the input events of the micro-batch stream; generating, by the computing device and using a snapshot management algorithm implemented by the CQL engine, a snapshot of a current state of a system based at least in part on the set of output events related to the application; generating, by the computing device, a first directory structure to access snapshot information associated with the snapshot of the current state of the system; generating, by the computing device, a second directory structure to generate a list of snapshots associated with the current state of the system; and determining, by the computing device, based at least in part on the snapshot management algorithm, a process to get, add, or clean the list of snapshots associated with the current state of the system.

2. The method of claim 1 , wherein the micro-batch stream is a continuous stream of data discretize into sub-second micro-batches.

3. The method of claim 1 , further comprising storing, by the computing device, the set of output events related to the application in an output queue; and transmitting, by the computing device, the output events in the output queue when all of the input events have been processed.

4. The method of claim 3 , wherein the micro-batch stream comprises micro-batches of data or Resilient Distributed Datasets (RDDs).

5. The method of claim 4 , wherein the processing each of the input events comprises performing a computation on each of the input based at least in part on the transformed query plan.

6. The method of claim 5 , wherein the continuous query includes pattern matching.

7. A system, comprising: a memory configured to store computer-executable instructions; and a processor configured to access the memory and execute the computer-executable instructions to: receive a continuous query; apply a Directed Acrylic Graph (DAG) transformation to the continuous query to generate a query plan for the continuous query, wherein the query plan is an ordered set of steps used to access data for processing of the continuous query; apply a Continuous Query Language (CQL) transformation to the query plan to generate a transformed query plan such that a CQL engine can execute the continuous query using the transformed query plan; receive a micro-batch stream of input events related to an application; process the input events using the CQL engine to generate a set of output events related to the application, wherein the processing comprises: performing, by the CQL engine, incremental computation on each of the input events of the micro-batch stream for the continuous query based at least in part on the transformed query plan; and creating, by the CQL engine, output events for each of the input events of the micro-batch stream, wherein the set of output events comprise the output events for each of the input events of the micro-batch stream; generate, using a snapshot management algorithm implemented by the CQL engine, a snapshot of a current state of a system based at least in part on the set of output events related to the application; generate a first directory structure to access snapshot information associated with the snapshot of the current state of the system; generate a second directory structure to generate a list of snapshots associated with the current state of the system; and determine based at least in part on the snapshot management algorithm, a process to get, add, or clean the list of snapshots associated with the current state of the system.

8. The system of claim 7 , wherein the micro-batch stream is a continuous stream of data discretize into sub-second micro-batches.

9. The system of claim 7 , wherein the computer executable instructions are further executable to store the set of output events related to the application in an output queue; and transmit the output events in the output queue when all of the input events have been processed.

10. The system of claim 9 , wherein the micro-batch stream comprises micro-batches of data or Resilient Distributed Datasets (RDDs).

11. The system of claim 10 , wherein the processing each of the input events comprises performing a computation on each of the input based at least in part on the transformed query plan.

12. The system of claim 11 , wherein wherein the continuous query includes pattern matching.

13. A computer-readable medium storing computer-executable code that, when executed by a processor, cause the processor to perform operations comprising: receiving a continuous query; applying a Directed Acrylic Graph (DAG) transformation to the continuous query to generate a query plan for the continuous query, wherein the query plan is an ordered set of steps used to access data for processing of the continuous query; applying a Continuous Query Language (CQL) transformation to the query plan to generate a transformed query plan such that a CQL engine can execute the continuous query using the transformed query plan; receiving a micro-batch stream of input events related to an application; processing the input events based at least in part on the transformed query plan using the CQL engine to generate a set of output events related to the application, wherein the processing comprises: performing, by the CQL engine, incremental computation on each of the input events of the micro-batch stream for the continuous query based at least in part on the transformed query plan; and creating, by the CQL engine, output events for each of the input events of the micro-batch stream, wherein the set of output events comprise the output events for each of the input events of the micro-batch stream; generating, using a snapshot management algorithm implemented by the CQL engine, a snapshot of a current state of a system based at least in part on the set of output events related to the application; generating a first directory structure to access snapshot information associated with the snapshot of the current state of the system; generating a second directory structure to generate a list of snapshots associated with the current state of the system; and determining based at least in part on the snapshot management algorithm, a process to get, add, or clean the list of snapshots associated with the current state of the system.

14. The computer-readable medium of claim 13 , wherein the micro-batch stream is a continuous stream of data discretize into sub-second micro-batches.

15. The computer-readable medium of claim 13 , wherein operations further comprise storing the set of output events related to the application in an output queue; and transmitting the output events in the output queue when all of the input events have been processed.

16. The computer-readable medium of claim 15 , wherein the micro-batch stream comprises micro-batches of data or Resilient Distributed Datasets (RDDs).

17. The computer-readable medium of claim 16 , wherein the processing each of the input events comprises performing a computation on each of the input based at least in part on the transformed query plan.

Patent Metadata

Filing Date

Unknown

Publication Date

July 14, 2020

Inventors

Hoyong Park

Sandeep Bishnoi

Prabhu Thukkaram

Santosh Kumar

Pavan Advani

Kunal Mulay

Jeffrey Toillion

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search