US-9607065

Hierarchical coherency log for managing a distributed data storage system

PublishedMarch 28, 2017

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A system and method for managing distributed coherent datasets using a hierarchical change log is provided. In some embodiments, a distributed storage system is provided that includes a primary storage device containing a primary dataset and a mirror storage device containing a mirror dataset. The mirror dataset includes a coherent copy of the primary dataset. The distributed storage system further includes a hierarchical change log tracking a coherence state for the mirror dataset. The hierarchical change log includes a first sub-log and a second sub-log, and a block range of the first sub-log overlaps a block range of the second sub-log. The hierarchical change log may define a priority relationship between the first sub-log and the second sub-log governing the overlap. The first sub-log and the second sub-log may be independently configured and may be different in one of a representation and a block size.

Patent Claims

15 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A computing device comprising: a memory containing machine readable medium comprising machine executable code having stored thereon instructions for performing a method of managing a coherency relationship in a distributed storage environment; a processor coupled to the memory, the processor configured to execute the machine executable code to cause the processor to: controlling a mirror storage device containing a mirror dataset, wherein the mirror dataset includes a coherent copy of a primary dataset at a primary storage device, wherein controlling the mirror storage device includes implementing a hierarchical change log that tracks a coherence state for the mirror dataset, wherein the hierarchical change log includes a first sub-log and a second sub-log, wherein the first sub-log and the second sub-log are each independently configured based on one of: a property of a corresponding data transaction, a property of the mirror dataset, and a property of a query, and wherein a block range of the first sub-log overlaps a block range of the second sub-log; and performing and input/output (I/O) operation to modify a portion of the mirror dataset, wherein the I/O operation affects the block range, including determining that the I/O operation affects the block range in a higher-priority one of the first or second sub-logs and in response to the determining creating a third sub-log for the I/O operation.

Plain English Translation

A computing device manages data coherency in a distributed storage system containing a primary dataset on a primary storage device and a coherent mirror dataset on a mirror storage device. It uses a hierarchical change log to track the coherence state of the mirror. This log contains multiple sub-logs (first and second), independently configured based on properties like transaction type, dataset characteristics, or query type. These sub-logs can have overlapping block ranges. During an I/O operation that modifies a portion of the mirror dataset within the overlapping block range, the device checks if the operation affects a higher-priority sub-log. If it does, the device creates a third sub-log specifically for that I/O operation.

Claim 2

Original Legal Text

2. The computing device of claim 1 , wherein the hierarchical change log defines a priority relationship between the first sub-log and the second sub-log.

Plain English Translation

The computing device managing data coherency in a distributed storage system, as described in the previous claim, uses a hierarchical change log that defines a priority relationship between the first and second sub-logs. This priority determines which sub-log's information is considered more authoritative when their block ranges overlap. This priority relationship is used when an I/O operation affects an overlapping block range and needs to determine which sub-log to consult or update first.

Claim 3

Original Legal Text

3. The computing device of claim 1 , wherein the first sub-log and the second sub-log are different in one of a representation and a block size.

Plain English Translation

The computing device managing data coherency in a distributed storage system, as described in the first claim, uses a first sub-log and second sub-log that differ in either their representation or their block size. The different representations could mean one sub-log uses a bitmap while the other uses a sparse matrix, and the block sizes are the granularity at which the logs track changes. These differences are based on properties like transaction type, dataset characteristics, or query type for efficiency.

Claim 4

Original Legal Text

4. The computing device of claim 3 , wherein the first sub-log includes a sparse matrix representation and wherein the second sub-log includes a bitmap representation.

Plain English Translation

The computing device from the third claim uses a first sub-log with a sparse matrix representation and a second sub-log with a bitmap representation. The sparse matrix is efficient for tracking infrequent changes across a large block range, while the bitmap is better for tracking frequent changes in a smaller block range. This allows the system to use the most efficient data structure for each type of data coherency tracking, optimizing memory usage and performance.

Claim 5

Original Legal Text

5. A method comprising: creating a first sub-log of a hierarchical change log containing a coherence state for a mirror relationship between a primary dataset and a mirror dataset, wherein the first sub-log is created based on one of: a property of a corresponding data transaction, a property of the mirror dataset, and a property of a query; in response to a first data transaction affecting the mirror dataset, creating a second sub-log of the hierarchical change log to track the data transaction, wherein a block range of the second sub-log overlaps a block range of the first sub-log, wherein there is a priority relationship between the first sub-log and the second sub-log, and wherein the second sub-log is created based on one of: the property of the corresponding data transaction, the property of the mirror dataset, and the property of the query; and performing a second data transaction to modify a portion of the mirror dataset, wherein the second data transaction affects the block range, including determining that the second data transaction affects the block range in a higher-priority one of the first or second sub-logs and in response to the determining creating a third sub-log for the second data transaction.

Plain English Translation

A method manages data coherency between a primary dataset and a mirror dataset using a hierarchical change log. First, it creates a first sub-log based on properties of data transactions, the mirror dataset, or queries. Then, in response to a first data transaction, a second sub-log is created to track that transaction. The block range of the second sub-log overlaps the block range of the first sub-log, and there's a defined priority relationship between them. Finally, if a second data transaction modifies a portion of the mirror dataset within that overlapping range, the method determines if the transaction affects the higher-priority sub-log. If so, it creates a third sub-log for the second data transaction.

Claim 6

Original Legal Text

6. The method of claim 5 , wherein the first sub-log and the second sub-log are different in one of a representation and a block size.

Plain English Translation

The method from the fifth claim, managing data coherency between a primary dataset and a mirror dataset, uses a first sub-log and a second sub-log that differ in either their representation or their block size. The different representations could mean one sub-log uses a bitmap while the other uses a sparse matrix, and the block sizes are the granularity at which the logs track changes.

Claim 7

Original Legal Text

7. The method of claim 5 , wherein the creating of the second sub-log includes optimizing the second sub-log based on one of: a property of the first data transaction, a property of the mirror dataset, and a property of a query of a sub-log.

Plain English Translation

The method from the fifth claim, managing data coherency between a primary dataset and a mirror dataset, optimizes the second sub-log's creation based on properties of the first data transaction, the mirror dataset, or queries of the sub-logs. This optimization could include choosing a specific representation (e.g., bitmap or sparse matrix), block size, or storage location for the second sub-log to improve performance or reduce storage costs.

Claim 8

Original Legal Text

8. The method of claim 5 , wherein the first data transaction includes a synchronization of the primary dataset and the mirror dataset, and wherein the second sub-log includes a sparse matrix representation of the coherence state.

Plain English Translation

The method from the fifth claim, managing data coherency between a primary dataset and a mirror dataset, involves a first data transaction that synchronizes the primary dataset and the mirror dataset. The second sub-log, created in response to this synchronization, uses a sparse matrix representation of the coherence state. This is suitable because a synchronization operation may only change a relatively small portion of a large dataset, making the sparse matrix more efficient than a bitmap.

Claim 9

Original Legal Text

9. The method of claim 5 further comprising flattening the first sub-log and the second sub-log.

Plain English Translation

The method from the fifth claim, managing data coherency between a primary dataset and a mirror dataset, further includes flattening the first sub-log and the second sub-log. Flattening combines the information from the two sub-logs into a single, unified representation of the coherence state. This can simplify subsequent operations and improve performance by eliminating the need to manage multiple sub-logs.

Claim 10

Original Legal Text

10. The method of claim 9 , wherein the flattening is performed after completing the first data transaction.

Plain English Translation

This invention describes a method for managing the coherence (data consistency) between a primary dataset and its mirror copy in a distributed storage system. It utilizes a hierarchical change log, which tracks the coherence state using multiple sub-logs. Initially, a first sub-log is created to track the mirror dataset's coherence state, configured based on factors like data transaction properties, mirror dataset characteristics, or query properties. When a "first data transaction" affects the mirror dataset, a second, specialized sub-log is created to track its impact. These sub-logs have overlapping block ranges and a defined priority relationship, and both can be independently configured. If a subsequent "second data transaction" modifies a portion of the mirror dataset within a higher-priority sub-log's range, a new (third) sub-log is created for it. As an additional step in this method, the first and second sub-logs are combined or consolidated, a process referred to as "flattening." Crucially, this flattening operation is performed specifically after the "first data transaction" (which initiated the creation of the second sub-log) has been fully completed.

Claim 11

Original Legal Text

11. A non-transitory machine readable medium having stored thereon instructions for performing a method comprising machine executable code which when executed by at least one machine, causes the machine to perform operations comprising: initiating a first data transaction affecting a mirror dataset; modifying a higher-priority sub-log of a hierarchical change log in response to the first data transaction, the higher-priority sub-log containing a coherence state for a bit range of the mirror dataset, wherein the modifying extends a bit range of the higher-priority sub-log to overlap a bit range of a lower-priority sub-log of the hierarchical change log, and wherein the higher-priority sub-log and the lower-priority sub-log are each independently configured based on one of: a property of a corresponding data transaction, a property of the mirror dataset, and a property of a query; and performing a second data transaction to modify a portion of the mirror dataset, wherein the second data transaction affects the bit range, including determining that the second data transaction affects the bit range in the higher-priority sub-log and in response to the determining creating an additional sub-log for the second data transaction, wherein the additional sub-log has a priority higher than the higher-priority sub-log.

Plain English Translation

A non-transitory machine-readable medium stores instructions for managing data coherency. The instructions, when executed, initiate a first data transaction affecting a mirror dataset. In response, a higher-priority sub-log of a hierarchical change log is modified. This sub-log tracks the coherence state for a specific bit range of the mirror. The modification extends the sub-log's bit range to overlap a bit range of a lower-priority sub-log. The higher-priority and lower-priority sub-logs are independently configured based on transaction properties, dataset properties, or query types. If a second data transaction affects the bit range covered by the sub-logs, the system creates an additional sub-log for the second data transaction with a priority higher than the existing higher-priority sub-log.

Claim 12

Original Legal Text

12. The non-transitory machine-readable medium of claim 11 , wherein the data transaction includes a synchronization of the primary dataset and the mirror dataset, and wherein the sub-log includes a sparse matrix representation of the coherence state.

Plain English Translation

The non-transitory machine-readable medium from the eleventh claim manages data coherency. The data transaction is a synchronization of a primary dataset and a mirror dataset. The sub-log, which tracks the coherence state resulting from this synchronization, uses a sparse matrix representation. This representation efficiently captures the changes introduced during synchronization, especially when only a small portion of the dataset is modified.

Claim 13

Original Legal Text

13. The non-transitory machine-readable medium of claim 11 , wherein the modifying of the sub-log configures the sub-log based on one of: a property of the data transaction, a property of the mirror dataset, and a property of a query.

Plain English Translation

The non-transitory machine-readable medium from the eleventh claim, managing data coherency, modifies the sub-log configuration based on a property of the data transaction, a property of the mirror dataset, or a property of a query. This dynamic configuration allows the system to optimize the sub-log's representation, block size, or storage location based on the specific characteristics of the data being tracked, improving performance and resource utilization.

Claim 14

Original Legal Text

14. The non-transitory machine-readable medium of claim 11 , wherein the computer program has further instructions that carry out completing the data transaction and discarding the lower-priority sub-log after completing the data transaction.

Plain English Translation

The non-transitory machine-readable medium from the eleventh claim, managing data coherency, contains further instructions to complete the data transaction and discard the lower-priority sub-log after the transaction is complete. Discarding the lower-priority sub-log frees up resources and simplifies the overall change log management process once its information has been incorporated into a higher-priority log or flattened representation.

Claim 15

Original Legal Text

15. The non-transitory machine-readable medium of claim 11 , wherein the sub-log and the lower-priority sub-log are different in one of a representation and a block size.

Plain English Translation

The non-transitory machine-readable medium from the eleventh claim, managing data coherency, stores a sub-log and a lower-priority sub-log that differ in either their representation or their block size. The different representations could mean one sub-log uses a bitmap while the other uses a sparse matrix, and the block sizes are the granularity at which the logs track changes.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06F

Patent Metadata

Filing Date

April 26, 2013

Publication Date

March 28, 2017

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search