Dynamic Partitioning Techniques for Data Streams

PublishedJune 23, 2020

Assigneenot available in USPTO data we have

InventorsMarvin Michael Theimer Gaurav D. Ghare John David Dunagan Gregory M. Burgess Ying Xiong

Technical Abstract

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A system, comprising: one or more computing devices comprising one or more respective hardware processors and memory and configured to: receive, from a client of a stream management service, an indication of one or more attributes for partitioning a data stream; determine a mapping of data records of the data stream to a plurality of partitions of the data stream based at least on different values of the one or more attributes of the data records indicated by the client of the stream management service; and receive individual ones of the data records of the data stream at two or more different ingestion, storage, or other nodes of the stream management service based at least on the mapping of the data records of the data stream to the plurality of partitions of the data stream.

2. The system as recited in claim 1 , wherein to receive the indication of one or more attributes for partitioning the data stream, the one or more computing devices are configured to: implement one or more programmatic interfaces enabling the client to specify the one or more attributes for partitioning the data stream.

3. The system as recited in claim 2 , wherein the one or more computing devices are configured to: implement the one or more programmatic interfaces as a graphical user interface, a web page, a web site, a command line interface, or an application programming interface.

4. The system as recited in claim 1 , wherein to receive the indication of one or more attributes for partitioning the data stream, the one or more computing devices are configured to: receive an indication of one or more of: a partition key supplied by a source of a data record of the data stream, an identification of a source of a data record of the data stream, at least a portion of contents of a data record of the data stream, or a network address of a source of a data record of the data stream.

5. The system as recited in claim 1 , wherein to receive individual ones of the data records at different nodes of the stream management service, the one or more computing devices are configured to: select, based at least on the mapping of the data records to the partitions, a node of an ingestion subsystem of the stream management service, a node of a storage subsystem of the stream management service, or a node of the retrieval subsystem of a stream management service to receive the individual ones of the data records.

6. The system as recited in claim 1 , wherein the one or more computing devices are configured to: generate, corresponding to a given data record, a sequence number indicative of a position of the given data record within a record acquisition sequence at an ingestion node of the stream management service, wherein the ingestion node is selected to receive the given data record based at least on the mapping of the data records to the partitions.

7. The system as recited in claim 6 , wherein the one or more computing devices are configured to: store data records of a given partition of the data stream in an order corresponding to respective sequence numbers of the data records.

8. A method, comprising: performing, by one or more computing devices of a stream management service: receiving, from a client of the stream management service, an indication of one or more attributes for partitioning a data stream; determining a mapping of data records of the data stream to a plurality of partitions of the data stream based at least on different values of the one or more attributes of the data records indicated by the client of the stream management service; and receiving individual ones of the data records of the data stream at two or more different ingestion, storage, or other nodes of the stream management service based at least on the mapping of the data records of the data stream to the plurality of partitions of the data stream.

9. The method as recited in claim 8 , further comprising: implementing one or more programmatic interfaces enabling the client to specify the one or more attributes for partitioning the data stream.

10. The method as recited in claim 9 , further comprising: implementing the one or more programmatic interfaces as a graphical user interface, a web page, a web site, a command line interface, or an application programming interface.

11. The method as recited in claim 8 , wherein receiving the indication of one or more attributes for partitioning the data stream comprises: receiving an indication of one or more of: a partition key supplied by a source of a data record of the data stream, an identification of a source of a data record of the data stream, at least a portion of contents of a data record of the data stream, or a network address of a source of a data record of the data stream.

12. The method as recited in claim 8 , further comprising: selecting, based at least on the mapping of the data records to the partitions, a node of an ingestion subsystem of the stream management service, a node of a storage subsystem of the stream management service, or a node of the retrieval subsystem of a stream management service to receive the individual ones of the data records.

13. The method as recited in claim 8 , further comprising: generating, corresponding to a given data record, a sequence number indicative of a position of the given data record within a record acquisition sequence at an ingestion node of the stream management service, wherein the ingestion node is selected to receive the given data record based at least on the mapping of the data records to the partitions; and storing data records of a given partition of the data stream in an order corresponding to respective sequence numbers of the data records.

14. The method as recited in claim 13 , wherein a given sequence number comprises an indication of a timestamp associated with ingestion of the particular data record, and an additional subsequence value.

15. A non-transitory computer-accessible storage medium storing program instructions that when executed on one or more processors cause the one or more processors to perform: receive, from a client of a stream management service, an indication of one or more attributes for partitioning a data stream; determine a mapping of data records of the data stream to a plurality of partitions of the data stream based at least on different values of the one or more attributes of the data records indicated by the client of the stream management service; and receive individual ones of the data records of the data stream at two or more different ingestion, storage, or other nodes of the stream management service based at least on the mapping of the data records of the data stream to the plurality of partitions of the data stream.

16. The non-transitory computer-accessible storage medium as recited in claim 15 , wherein the instructions when executed on the one or more processors: implement one or more programmatic interfaces enabling the client to specify the one or more attributes for partitioning the data stream.

17. The non-transitory computer-accessible storage medium as recited in claim 13 , wherein the instructions when executed on the one or more processors: implement the one or more programmatic interfaces as a graphical user interface, a web page, a web site, a command line interface, or an application programming interface.

18. The non-transitory computer-accessible storage medium as recited in claim 15 , wherein to receive the indication of one or more attributes for partitioning the data stream, the instructions when executed on the one or more processors: receive an indication of one or more of: a partition key supplied by a source of a data record of the data stream, an identification of a source of a data record of the data stream, at least a portion of contents of a data record of the data stream, or a network address of a source of a data record of the data stream.

19. The non-transitory computer-accessible storage medium as recited in claim 15 , wherein to receive individual ones of the data records at different nodes of the stream management service, the instructions when executed on the one or more processors: select, based at least on the mapping of the data records to the partitions, a node of an ingestion subsystem of the stream management service, a node of a storage subsystem of the stream management service, or a node of the retrieval subsystem of a stream management service to receive the individual ones of the data records.

20. The non-transitory computer-accessible storage medium as recited in claim 15 , wherein the instructions when executed on the one or more processors: generate, corresponding to a given data record, a sequence number indicative of a position of the given data record within a record acquisition sequence at an ingestion node of the stream management service, wherein the ingestion node is selected to receive the given data record based at least on the mapping of the data records to the partitions; and store data records of a given partition of the data stream in an order corresponding to respective sequence numbers of the data records.

Patent Metadata

Filing Date

Unknown

Publication Date

June 23, 2020

Inventors

Marvin Michael Theimer

Gaurav D. Ghare

John David Dunagan

Gregory M. Burgess

Ying Xiong

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search