Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A system, comprising: one or more computing devices comprising one or more respective hardware processors and memory and configured to: receive, from a client of a stream management service, an indication of one or more attributes for partitioning a data stream; determine a mapping of data records of the data stream to a plurality of partitions of the data stream based at least on different values of the one or more attributes of the data records indicated by the client of the stream management service; and receive individual ones of the data records of the data stream at two or more different ingestion, storage, or other nodes of the stream management service based at least on the mapping of the data records of the data stream to the plurality of partitions of the data stream.
The system relates to stream processing and data management, specifically addressing the challenge of efficiently partitioning and distributing data streams across multiple nodes in a stream management service. The system enables clients to specify attributes for partitioning a data stream, allowing the system to map data records to different partitions based on the values of those attributes. This partitioning ensures that data records are distributed across two or more nodes, such as ingestion, storage, or other processing nodes, according to the defined mapping. The system dynamically routes individual data records to the appropriate nodes based on their attribute values, optimizing data distribution and processing efficiency. By leveraging client-defined partitioning rules, the system enhances scalability and performance in stream management services, ensuring balanced workload distribution and efficient data handling. The approach supports flexible partitioning strategies, allowing clients to tailor data distribution to their specific requirements. This system is particularly useful in large-scale data processing environments where efficient partitioning and distribution of streaming data are critical for performance and resource utilization.
2. The system as recited in claim 1 , wherein to receive the indication of one or more attributes for partitioning the data stream, the one or more computing devices are configured to: implement one or more programmatic interfaces enabling the client to specify the one or more attributes for partitioning the data stream.
A system for processing data streams enables partitioning of the data stream based on specified attributes. The system includes computing devices that receive an indication of one or more attributes for partitioning the data stream. To facilitate this, the computing devices implement programmatic interfaces that allow a client to specify the attributes used for partitioning. These interfaces enable the client to define how the data stream should be divided, such as by time, content type, or other relevant criteria. The partitioning process ensures that the data stream is divided into segments based on the specified attributes, allowing for more efficient processing, analysis, or distribution of the data. The system may also include additional components for processing the partitioned data, such as storage systems, analytics engines, or distribution mechanisms, to handle the segmented data according to the client's requirements. This approach improves data management by enabling flexible and customizable partitioning of data streams based on user-defined attributes.
3. The system as recited in claim 2 , wherein the one or more computing devices are configured to: implement the one or more programmatic interfaces as a graphical user interface, a web page, a web site, a command line interface, or an application programming interface.
The invention relates to a system for providing programmatic interfaces to users, addressing the need for flexible and accessible interaction methods in computing environments. The system includes one or more computing devices that host and manage these interfaces, which can be accessed through various means such as graphical user interfaces, web pages, websites, command line interfaces, or application programming interfaces (APIs). These interfaces enable users to interact with the system's functionalities, such as data processing, software execution, or service access, in a manner that suits their preferences or technical requirements. The system ensures compatibility with different user environments by supporting multiple interface types, allowing seamless integration into diverse computing ecosystems. This flexibility enhances usability and accessibility, catering to both technical and non-technical users. The interfaces are designed to facilitate efficient and intuitive interactions, improving user experience and system adoption. The system may also include additional components, such as data storage or processing modules, to support the interfaces' operations. By offering multiple interface options, the system addresses the challenge of accommodating varying user needs and technical capabilities in modern computing applications.
4. The system as recited in claim 1 , wherein to receive the indication of one or more attributes for partitioning the data stream, the one or more computing devices are configured to: receive an indication of one or more of: a partition key supplied by a source of a data record of the data stream, an identification of a source of a data record of the data stream, at least a portion of contents of a data record of the data stream, or a network address of a source of a data record of the data stream.
A data processing system partitions a data stream into subsets based on specified attributes to improve data handling efficiency. The system receives a data stream containing multiple records from various sources and partitions the stream by analyzing one or more attributes associated with each record. These attributes may include a partition key provided by the data source, the identity of the data source, the content of the data record itself, or the network address of the source. By evaluating these attributes, the system dynamically organizes the data stream into distinct partitions, enabling targeted processing, storage, or analysis of the partitioned subsets. This approach enhances scalability and performance by allowing parallel processing of different partitions and optimizing resource allocation based on data characteristics. The system supports flexible partitioning criteria, accommodating various use cases such as real-time analytics, distributed computing, or data routing in networked environments. The partitioning mechanism ensures that records with similar attributes are grouped together, facilitating efficient data management and reducing processing overhead.
5. The system as recited in claim 1 , wherein to receive individual ones of the data records at different nodes of the stream management service, the one or more computing devices are configured to: select, based at least on the mapping of the data records to the partitions, a node of an ingestion subsystem of the stream management service, a node of a storage subsystem of the stream management service, or a node of the retrieval subsystem of a stream management service to receive the individual ones of the data records.
This invention relates to a stream management service for processing and distributing data records across multiple nodes in a distributed computing environment. The system addresses the challenge of efficiently routing and managing data records in a scalable and fault-tolerant manner, ensuring that records are properly distributed and processed across different subsystems. The system includes one or more computing devices configured to receive data records and distribute them to various nodes within the stream management service. The distribution is based on a predefined mapping of data records to partitions, which ensures that records are routed to the appropriate nodes. The nodes may belong to different subsystems, such as an ingestion subsystem for receiving incoming data, a storage subsystem for storing data records, or a retrieval subsystem for accessing stored data. The selection of the node is determined by the mapping, ensuring that each record is directed to the correct subsystem for further processing. The system dynamically selects nodes based on the mapping, allowing for efficient load balancing and fault tolerance. This ensures that data records are processed and stored in a distributed manner, improving scalability and reliability. The invention enhances the performance of stream management services by optimizing the distribution of data records across different subsystems, reducing bottlenecks, and improving overall system efficiency.
6. The system as recited in claim 1 , wherein the one or more computing devices are configured to: generate, corresponding to a given data record, a sequence number indicative of a position of the given data record within a record acquisition sequence at an ingestion node of the stream management service, wherein the ingestion node is selected to receive the given data record based at least on the mapping of the data records to the partitions.
This invention relates to data stream management systems, specifically addressing the challenge of efficiently tracking and processing data records in a distributed environment. The system involves one or more computing devices that manage the ingestion and processing of data records within a stream management service. The key innovation is the generation of a sequence number for each data record, which indicates its position within a record acquisition sequence at a specific ingestion node. This sequence number helps maintain order and traceability of data records as they are processed. The ingestion node responsible for receiving a given data record is selected based on a predefined mapping of data records to partitions, ensuring that records are distributed across the system in a controlled manner. This approach improves data consistency, reliability, and traceability in distributed stream processing environments. The sequence number generation and partition-based ingestion node selection work together to optimize data flow and processing efficiency, addressing issues related to record ordering and system scalability in large-scale data stream management.
7. The system as recited in claim 6 , wherein the one or more computing devices are configured to: store data records of a given partition of the data stream in an order corresponding to respective sequence numbers of the data records.
The system relates to data processing and storage, specifically for managing data streams in a distributed computing environment. The problem addressed is the efficient and reliable storage of data records from a data stream, ensuring that records are stored in a sequence-preserving manner to maintain data integrity and consistency. The system includes one or more computing devices that process a data stream, which is divided into partitions. Each partition contains data records that are assigned sequence numbers to indicate their order within the stream. The computing devices are configured to store these data records in a storage system, maintaining the sequence order based on their respective sequence numbers. This ensures that records are stored in the same order they were generated or processed, which is critical for applications requiring strict ordering, such as financial transactions, event logging, or real-time analytics. The system may also include mechanisms for distributing the data stream across multiple computing devices, where each device handles a subset of partitions. The storage system may be a distributed file system, a database, or another storage solution capable of handling high-throughput data streams. The sequence-preserving storage ensures that downstream consumers of the data can reliably reconstruct the original order of records, even if the data is processed or stored across multiple nodes. This approach improves fault tolerance and scalability while maintaining data consistency.
8. A method, comprising: performing, by one or more computing devices of a stream management service: receiving, from a client of the stream management service, an indication of one or more attributes for partitioning a data stream; determining a mapping of data records of the data stream to a plurality of partitions of the data stream based at least on different values of the one or more attributes of the data records indicated by the client of the stream management service; and receiving individual ones of the data records of the data stream at two or more different ingestion, storage, or other nodes of the stream management service based at least on the mapping of the data records of the data stream to the plurality of partitions of the data stream.
The invention relates to a stream management service for efficiently partitioning and distributing data records within a data stream. The problem addressed is the need to optimize the handling of large-scale data streams by intelligently distributing data records across multiple nodes to improve processing efficiency, scalability, and fault tolerance. The method involves a stream management service that receives an indication of one or more attributes from a client, which are used to partition the data stream. These attributes define how data records should be distributed across partitions. The service then determines a mapping of data records to multiple partitions based on the specified attributes, ensuring that records with different attribute values are assigned to different partitions. This mapping is used to route individual data records to two or more different nodes within the stream management service, such as ingestion, storage, or other processing nodes. The distribution ensures that records are evenly spread across nodes, reducing bottlenecks and improving parallel processing capabilities. The method supports dynamic partitioning, allowing the service to adapt to changing data characteristics or workload requirements. This approach enhances scalability, load balancing, and fault tolerance in stream processing systems.
9. The method as recited in claim 8 , further comprising: implementing one or more programmatic interfaces enabling the client to specify the one or more attributes for partitioning the data stream.
This invention relates to data stream processing systems, specifically methods for partitioning data streams based on specified attributes. The problem addressed is the need for flexible and efficient partitioning of high-volume data streams to optimize processing, storage, or analysis. Traditional systems often lack dynamic partitioning capabilities, leading to inefficiencies in handling diverse data types or workloads. The method involves partitioning a data stream into multiple segments based on one or more attributes of the data. These attributes can include metadata, content characteristics, or other properties relevant to the data stream. The partitioning is performed in real-time as the data is received, ensuring that the segments are generated dynamically and adaptively. The system further includes programmatic interfaces that allow clients to specify the attributes used for partitioning. These interfaces enable users to define custom partitioning rules, such as partitioning by data type, source, or time intervals, without requiring system-level modifications. The interfaces may support various input methods, including APIs, configuration files, or user-defined scripts, providing flexibility in how partitioning criteria are specified. The partitioned data segments can then be processed, stored, or analyzed independently, improving scalability and performance in distributed systems. This approach enhances the adaptability of data stream processing systems to different use cases and workloads.
10. The method as recited in claim 9 , further comprising: implementing the one or more programmatic interfaces as a graphical user interface, a web page, a web site, a command line interface, or an application programming interface.
This invention relates to programmatic interfaces for accessing and managing data or services. The problem addressed is the need for flexible and user-friendly ways to interact with software systems, allowing users to retrieve, manipulate, or control functionality through various interface types. The invention provides a method for implementing one or more programmatic interfaces that enable interaction with a system. These interfaces can be designed as a graphical user interface (GUI), a web page, a website, a command line interface (CLI), or an application programming interface (API). The interfaces facilitate user input, data retrieval, and system control, ensuring compatibility with different user preferences and technical environments. The method ensures that the interfaces are adaptable, allowing seamless integration with existing systems while providing a consistent experience across different interface types. This approach enhances usability, accessibility, and efficiency in software interaction, catering to diverse user needs and deployment scenarios.
11. The method as recited in claim 8 , wherein receiving the indication of one or more attributes for partitioning the data stream comprises: receiving an indication of one or more of: a partition key supplied by a source of a data record of the data stream, an identification of a source of a data record of the data stream, at least a portion of contents of a data record of the data stream, or a network address of a source of a data record of the data stream.
This invention relates to data stream processing, specifically methods for partitioning data streams based on attributes of the data records within the stream. The problem addressed is efficiently distributing data records from a high-volume data stream across multiple processing nodes or storage locations to optimize performance, load balancing, or data organization. The method involves receiving a data stream containing multiple data records and an indication of one or more attributes for partitioning the stream. These attributes can include a partition key provided by the data source, an identifier of the data source, a portion of the data record's contents, or the network address of the data source. The system uses these attributes to determine how to partition the data stream, ensuring that records with similar attributes are routed to the same partition. This partitioning can be used for parallel processing, distributed storage, or other data management tasks. The approach allows for flexible partitioning based on different types of metadata or content, enabling customization for various use cases. The method ensures that data records are distributed in a way that maintains consistency and optimizes resource utilization.
12. The method as recited in claim 8 , further comprising: selecting, based at least on the mapping of the data records to the partitions, a node of an ingestion subsystem of the stream management service, a node of a storage subsystem of the stream management service, or a node of the retrieval subsystem of a stream management service to receive the individual ones of the data records.
A stream management service processes and stores data records in a distributed system, where data is partitioned across multiple nodes for efficient ingestion, storage, and retrieval. A challenge in such systems is ensuring data records are routed to the correct nodes to maintain consistency and performance. This invention addresses that challenge by dynamically selecting nodes for data records based on their partition mappings. The system includes an ingestion subsystem for receiving data, a storage subsystem for persisting data, and a retrieval subsystem for accessing data. When processing data records, the system maps each record to a specific partition and then selects an appropriate node from one of the subsystems to handle the record. The selection is based on the partition mapping to ensure data is routed to the correct node, optimizing performance and consistency. This approach allows the system to efficiently distribute workloads across nodes while maintaining data integrity. The method applies to any stream management service where data partitioning is used to manage distributed data processing and storage.
13. The method as recited in claim 8 , further comprising: generating, corresponding to a given data record, a sequence number indicative of a position of the given data record within a record acquisition sequence at an ingestion node of the stream management service, wherein the ingestion node is selected to receive the given data record based at least on the mapping of the data records to the partitions; and storing data records of a given partition of the data stream in an order corresponding to respective sequence numbers of the data records.
This invention relates to data stream management, specifically ensuring ordered processing of data records in a distributed system. The problem addressed is maintaining the correct sequence of data records when they are distributed across multiple partitions and ingestion nodes in a stream management service. Without proper sequencing, downstream applications may process records out of order, leading to errors or incorrect results. The method involves assigning sequence numbers to data records based on their position in the acquisition sequence at an ingestion node. The ingestion node is selected to receive a given data record based on a predefined mapping of data records to partitions. Once assigned, the sequence numbers ensure that data records within a given partition are stored and processed in the correct order. This approach allows for parallel ingestion of records across multiple nodes while preserving the sequential integrity of records within each partition. The sequence numbers are used to maintain order when storing records, ensuring that downstream systems receive data in the intended sequence. This method is particularly useful in distributed stream processing systems where data is partitioned across multiple nodes to improve scalability and performance.
14. The method as recited in claim 13 , wherein a given sequence number comprises an indication of a timestamp associated with ingestion of the particular data record, and an additional subsequence value.
A system and method for managing data records in a distributed computing environment addresses challenges in tracking and processing large volumes of data efficiently. The invention focuses on assigning unique identifiers to data records to ensure accurate tracking, ordering, and retrieval. A sequence number is generated for each data record, incorporating a timestamp to mark the ingestion time and an additional subsequence value to further refine the ordering within the same timestamp. This dual-component sequence number allows for precise tracking of data records, even when multiple records are ingested at the same time. The subsequence value ensures that records with identical timestamps can be distinguished and processed in a consistent order. This method enhances data consistency, reduces conflicts during processing, and improves system reliability in distributed environments where multiple nodes may ingest data simultaneously. The invention is particularly useful in systems requiring high-throughput data processing, such as real-time analytics, financial transactions, or large-scale data storage solutions. By combining timestamp and subsequence values, the system ensures that data records are processed in a predictable and ordered manner, minimizing errors and improving overall system performance.
15. A non-transitory computer-accessible storage medium storing program instructions that when executed on one or more processors cause the one or more processors to perform: receive, from a client of a stream management service, an indication of one or more attributes for partitioning a data stream; determine a mapping of data records of the data stream to a plurality of partitions of the data stream based at least on different values of the one or more attributes of the data records indicated by the client of the stream management service; and receive individual ones of the data records of the data stream at two or more different ingestion, storage, or other nodes of the stream management service based at least on the mapping of the data records of the data stream to the plurality of partitions of the data stream.
This invention relates to stream processing systems, specifically methods for partitioning data streams to optimize ingestion, storage, and processing across distributed nodes. The problem addressed is efficiently distributing data records in a stream across multiple nodes to balance workload, reduce latency, and improve scalability. Traditional systems often rely on static partitioning schemes, which may not adapt to dynamic data characteristics or client-specific requirements. The invention provides a system where a client of a stream management service specifies attributes for partitioning a data stream. These attributes define how data records should be distributed across partitions. The system then maps each data record to a partition based on the specified attributes, ensuring records with similar attribute values are routed to the same partition. This enables efficient processing, such as aggregations or filtering, on partitioned data. The data records are then ingested or stored across multiple nodes according to the partition mapping, allowing parallel processing and reducing bottlenecks. The approach allows dynamic partitioning based on client-defined criteria, improving flexibility and performance in distributed stream processing environments. By distributing records across nodes based on attribute values, the system ensures related data is co-located, optimizing query performance and resource utilization. This method is particularly useful in large-scale data processing applications where efficient partitioning is critical for scalability and low-latency processing.
16. The non-transitory computer-accessible storage medium as recited in claim 15 , wherein the instructions when executed on the one or more processors: implement one or more programmatic interfaces enabling the client to specify the one or more attributes for partitioning the data stream.
A system and method for processing data streams involves partitioning a data stream into segments based on specified attributes. The data stream is divided into multiple segments, where each segment is associated with one or more attributes that define its characteristics or content. The system includes a non-transitory computer-accessible storage medium storing instructions that, when executed, enable a client to specify these attributes for partitioning the data stream. The partitioning process allows for efficient processing, analysis, or transmission of the data stream by organizing it into manageable segments based on user-defined criteria. This approach improves data handling by enabling targeted access, filtering, or manipulation of specific segments of the data stream. The system may also include additional features such as real-time processing, dynamic attribute updates, or integration with other data processing tools to enhance functionality. The method ensures that the data stream is partitioned in a way that aligns with the client's requirements, optimizing performance and usability.
17. The non-transitory computer-accessible storage medium as recited in claim 13 , wherein the instructions when executed on the one or more processors: implement the one or more programmatic interfaces as a graphical user interface, a web page, a web site, a command line interface, or an application programming interface.
This invention relates to a computer-accessible storage medium containing instructions for managing programmatic interfaces in a computing system. The system addresses the challenge of providing flexible and accessible interfaces for users and applications to interact with software functionalities. The storage medium includes instructions that, when executed by one or more processors, enable the implementation of various types of interfaces, including graphical user interfaces (GUIs), web pages, websites, command line interfaces (CLIs), and application programming interfaces (APIs). These interfaces facilitate user interaction with the system's features, allowing for different modes of input and output depending on the user's needs or the application's requirements. The system ensures compatibility and usability across diverse computing environments by supporting multiple interface types, enhancing accessibility and integration capabilities. The instructions further enable the interfaces to be dynamically configured or customized, adapting to different user preferences or system configurations. This approach improves user experience and system interoperability, making the software more versatile and adaptable to various deployment scenarios.
18. The non-transitory computer-accessible storage medium as recited in claim 15 , wherein to receive the indication of one or more attributes for partitioning the data stream, the instructions when executed on the one or more processors: receive an indication of one or more of: a partition key supplied by a source of a data record of the data stream, an identification of a source of a data record of the data stream, at least a portion of contents of a data record of the data stream, or a network address of a source of a data record of the data stream.
This invention relates to data stream processing systems that partition data streams based on specified attributes. The problem addressed is efficiently distributing data records from a stream to different processing nodes or storage locations based on configurable partitioning criteria. The system receives a data stream and partitions it by analyzing one or more attributes of the data records. These attributes may include a partition key provided by the data source, the source identification, the content of the data record, or the network address of the source. The partitioning logic dynamically assigns data records to different partitions based on these attributes, enabling scalable and flexible data distribution. The system may also handle metadata associated with the data stream, such as schema information, to ensure proper partitioning. The invention improves data processing efficiency by allowing customizable partitioning strategies tailored to specific workloads or system architectures. This approach is particularly useful in distributed computing environments where data needs to be routed to specific nodes for processing or storage.
19. The non-transitory computer-accessible storage medium as recited in claim 15 , wherein to receive individual ones of the data records at different nodes of the stream management service, the instructions when executed on the one or more processors: select, based at least on the mapping of the data records to the partitions, a node of an ingestion subsystem of the stream management service, a node of a storage subsystem of the stream management service, or a node of the retrieval subsystem of a stream management service to receive the individual ones of the data records.
This invention relates to a stream management service for processing data records in a distributed computing environment. The problem addressed is efficiently distributing and managing data records across different nodes of a stream management service, which includes ingestion, storage, and retrieval subsystems. The invention ensures that data records are routed to the appropriate nodes based on predefined mappings to partitions, optimizing performance and resource utilization. The system involves a non-transitory computer-accessible storage medium storing instructions that, when executed, enable a stream management service to receive data records at different nodes. The instructions select a node from the ingestion subsystem, storage subsystem, or retrieval subsystem to receive each data record based on a mapping of the data records to partitions. This selection process ensures that data records are directed to the most suitable node for processing, storage, or retrieval, depending on their partition assignment. The partitioning mechanism allows for scalable and efficient data handling, reducing bottlenecks and improving overall system performance. The invention is particularly useful in large-scale data processing environments where distributed systems must handle high volumes of data with low latency.
20. The non-transitory computer-accessible storage medium as recited in claim 15 , wherein the instructions when executed on the one or more processors: generate, corresponding to a given data record, a sequence number indicative of a position of the given data record within a record acquisition sequence at an ingestion node of the stream management service, wherein the ingestion node is selected to receive the given data record based at least on the mapping of the data records to the partitions; and store data records of a given partition of the data stream in an order corresponding to respective sequence numbers of the data records.
This invention relates to stream processing systems, specifically methods for managing and ordering data records in a distributed stream management service. The problem addressed is ensuring ordered processing of data records in a distributed environment where records may arrive out of order due to network latency or parallel ingestion paths. The system assigns sequence numbers to data records at ingestion nodes, where each node is responsible for a subset of data partitions. A sequence number indicates the position of a record within the acquisition sequence at its assigned ingestion node. Records belonging to the same partition are stored in order based on their sequence numbers, ensuring correct processing order despite potential out-of-order arrival. The sequence numbering is tied to the partition mapping, meaning records mapped to the same partition receive sequential numbers from their respective ingestion nodes. This approach enables efficient and reliable stream processing by maintaining order within partitions while allowing parallel ingestion across nodes. The sequence numbers facilitate reconstruction of the correct order for downstream processing, even if records arrive at different times or via different paths. The system is particularly useful in high-throughput environments where data must be processed in a consistent and ordered manner.
Unknown
June 23, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.