Systems, apparatuses, methods, and computer-readable storage mediums for performing lease-based fencing using a time-limited lease window. During the time-limited lease window, writes to a shared storage medium are permitted, while writes are denied for expired leases. When a successful heartbeat is generated for a primary storage controller, the lease window is extended for the primary storage controller from the time of a previous heartbeat. Accordingly, a prolonged stall between successive heartbeats by the primary storage controller will result in the newly extended lease being expired at the time it is granted. This scheme prevents a split brain scenario from occurring when a secondary storage controller takes over as the new primary storage controller in response to detecting the stall.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A storage system comprising: a plurality of storage controllers configured to couple to one or more client computer systems via one or more data communications links; and one or more storage devices coupled to the plurality of storage controllers; wherein a first storage controller of the plurality of storage controllers is configured to: generate heartbeats on intervals of a first period of time; responsive to generating a current heartbeat, issue a lease based on an extension calculated from a prior heartbeat; determine that a lease is invalid based on the lease extension calculated from the prior heartbeat being smaller than a takeover window calculated from the prior heartbeat; responsive to determining that the lease is invalid, operate as a secondary storage controller instead of a primary storage controller, wherein the first storage controller, operating as the secondary storage controller, is not allowed to perform a state-changing operation on the one or more storage devices; determine that the lease is valid based on the lease extension not being smaller than the takeover window; and responsive to determining that the lease is valid, perform a number of pending state-changing operations, wherein the number is less than or equal to a maximum number of state-changing operations that can be performed without causing data loss due to a failure of a fencing mechanism; recheck if the lease is valid responsive to performing the number of pending state-changing operations; and perform a second number of state-changing operations responsive to determining the lease is valid, wherein the second number is less than or equal to the maximum number.
2. The storage system as recited in claim 1 , wherein the first storage controller is further configured to prevent state-changing operations from being performed without a valid lease, and wherein the second storage controller cannot perform any state-changing operations on the one or more storage devices without a valid lease.
3. The storage system as recited in claim 1 , wherein the first storage controller is a primary storage controller, and wherein a second storage controller of the plurality of storage controllers is configured to take over as a new primary storage controller responsive to detecting that the first storage controller has not generated a heartbeat for a third period of time.
4. The storage system as recited in claim 3 , wherein responsive to taking over as the new primary storage controller, the second storage controller is configured to: generate heartbeats on intervals of the first period of time; issue a new lease responsive to generating each heartbeat, wherein a beginning of the new lease is calculated from a prior heartbeat and has a duration of a second period of time; and determine if a lease is valid prior to performing a state-changing operation on a given storage device.
5. The storage system as recited in claim 1 , wherein the lease has a duration of a second period of time.
6. The storage system as recited in claim 1 , wherein the non-state-changing operations on the one or more storage devices include operations that read data from the one or more storage.
7. The storage system as recited in claim 1 , wherein the first storage controller is further configured to send the heartbeat to the second storage controller via a data communications bus between the first storage controller and the second storage controller.
8. A plurality of storage controllers: wherein each of the storage controllers are configured to couple to one or more client computer systems via one or more data communications links and each of the storage controllers are also coupled to one or more storage devices via one or more data communications links, wherein a first storage controller is configured to: generate heartbeats on intervals of a first period of time; responsive to generating a current heartbeat, issue a lease based on an extension calculated from a prior heartbeat; determine that a lease is invalid based on the lease extension calculated from the prior heartbeat being smaller than a takeover window calculated from the prior heartbeat; responsive to determining that the lease is invalid, operate as a secondary storage controller instead of a primary storage controller, wherein the first storage controller, operating as the secondary storage controller, is not allowed to perform a state-changing operation on the one or more storage devices; determine that the lease is valid based on the lease extension not being smaller than the takeover window; and responsive to determining that the lease is valid, perform a number of pending state-changing operations, wherein the number is less than or equal to a maximum number of state-changing operations that can be performed without causing data loss due to a failure of a fencing mechanism recheck if the lease is valid responsive to performing the number of pending state-changing operations; and perform a second number of state-changing operations responsive to determining the lease is valid, wherein the second number is less than or equal to the maximum number.
9. The plurality of storage controllers as recited in claim 8 , wherein the first storage controller is further configured to prevent state-changing operations from being performed without a valid lease, and wherein the second storage controller cannot perform any state-changing operations on the one or more storage devices without a valid lease.
10. The plurality of storage controllers as recited in claim 8 , wherein the lease has a duration of a second period of time.
11. The plurality of storage controllers as recited in claim 8 , wherein a duration of the lease is calculated from a prior heartbeat generated two intervals prior to the current heartbeat.
12. The plurality of storage controllers as recited in claim 8 , wherein at least a portion of said lease is for time that has already expired.
13. The plurality of storage controllers as recited in claim 8 , wherein the non-state-changing operations on the one or more storage devices include operations that read data from the one or more storage devices.
14. The plurality of storage controllers as recited in claim 8 , wherein the first storage controller is further configured to send the heartbeat to the second storage controller via a data communications bus between the first storage controller and the second storage controller.
15. A method comprising: generating, by a first storage controller in storage system that includes a plurality of storage controllers, heartbeats on intervals of a first period of time, wherein the first storage controller and a second storage controller are each configured to couple to one or more client computer systems via one or more data communications links and also coupled to one or more storage devices via one or more data communications links, and wherein the first storage controller operates as a primary storage controller and the second storage controller operates as a secondary storage controller; responsive to generating a current heartbeat, issuing, by the first storage controller, a lease based on an extension calculated from a prior heartbeat; and determining that a lease is invalid based on the lease extension calculated from the prior heartbeat being smaller than a takeover window calculated from the prior heartbeat; responsive to determining that the lease is invalid, operating as the secondary storage controller instead of the primary storage controller, wherein the first storage controller, operating as the secondary storage controller, is not allowed to perform a state-changing operation on the one or more storage devices, and wherein the first storage controller of the plurality of storage controllers and a second storage controller of the plurality of storage controllers are configured to perform non-state-changing operations on the one or more storage devices; determining that the lease is valid based on the lease extension not being smaller than the takeover window; responsive to determining that the lease is valid, performing a number of pending state-changing operations, wherein the number is less than or equal to a maximum number of state-changing operations that can be performed without causing data loss due to a failure of a fencing mechanism; recheck if the lease is valid responsive to performing the number of pending state-changing operations; and perform a second number of state-changing operations responsive to determining the lease is valid, wherein the second number is less than or equal to the maximum number.
16. The method as recited in claim 15 , further comprising preventing state-changing operations from being performed without a valid lease.
17. The method as recited in claim 15 , wherein the lease has a duration of a second period of time.
18. The method as recited in claim 15 , wherein the non-state-changing operations on the one or more storage devices include operations that read data from the one or more storage devices.
19. The method as recited in claim 15 , wherein at least a portion of said lease is for time that has already expired.
20. The method of claim 15 further comprising sending, from the first storage controller to the second storage controller, the heartbeat via a data communications bus between the first storage controller and the second storage controller.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
July 24, 2014
May 21, 2019
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.