10579476

Using Alternate Recovery Actions for Initial Recovery Actions in a Computing System

PublishedMarch 3, 2020
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
22 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A computer program product for performing a recovery action upon detecting an error in a computing system having a persistent storage, the computer program product comprising a computer readable storage medium having computer readable program code embodied therein that is executable to perform operations, the operations comprising: maintaining an initial recovery table providing initial recovery actions to perform for errors detected in the computing system; receiving an alternate recovery table including at least one alternate recovery action for at least one of the initial recovery actions, wherein an alternative recovery action provided for an initial recovery action specifies a different recovery path involving at least one of a different action and a different component in the computing system than involved in the initial recovery action, wherein the initial recovery table and the alternate recovery table are stored in the persistent storage to be maintained through a reboot of the computing system; detecting an error in the computing system; determining whether to use the initial recovery action in the initial recovery table for the detected error or the alternate recovery action in the alternate recovery table for the initial recovery action; and using the initial recovery action or alternate recovery action determined to use to address the detected error.

Plain English Translation

This invention relates to error recovery in computing systems, specifically improving the flexibility and adaptability of recovery mechanisms. The problem addressed is the rigidity of traditional error recovery systems, which often rely on predefined recovery actions that may not be optimal for all scenarios or system configurations. The invention provides a dynamic approach to error recovery by allowing alternate recovery actions to be introduced alongside initial recovery actions, enabling more flexible and context-aware error handling. The system maintains an initial recovery table stored in persistent storage, which defines standard recovery actions for detected errors. An alternate recovery table can be received and stored, containing modified or alternative recovery actions for specific errors. These alternate actions may involve different steps, components, or paths than the initial actions, allowing for customization based on system conditions or user preferences. Both tables persist through system reboots, ensuring continuity. When an error is detected, the system evaluates whether to use the initial or alternate recovery action. The decision may be based on predefined rules, system state, or other criteria. The selected action is then executed to address the error. This approach enhances system resilience by enabling adaptive recovery strategies without requiring system-wide changes or reboots. The invention is particularly useful in environments where error recovery must be tailored to specific configurations or operational contexts.

Claim 2

Original Legal Text

2. The computer program product of claim 1 , wherein the operations further comprise: maintaining at least one flag indicating whether to use the initial recovery action in the initial recovery table for the detected error or the alternate recovery action in the alternate recovery table, wherein the determining whether to use the initial or alternate recovery action comprises determining whether the flag indicates to use the initial recovery action or the alternate recovery action.

Plain English Translation

A system for error recovery in computing environments involves managing recovery actions to address detected errors. The system includes a primary recovery table storing initial recovery actions and an alternate recovery table storing alternative recovery actions for the same errors. A flag mechanism determines which recovery action to apply when an error is detected. The flag indicates whether to use the initial recovery action from the primary table or the alternate recovery action from the secondary table. When an error occurs, the system checks the flag to decide which recovery action to execute. This approach allows dynamic selection between predefined recovery strategies, improving system resilience by providing fallback options when initial recovery methods fail or are unsuitable. The system may also include mechanisms to update or override the flag based on system conditions, error frequency, or user input, ensuring adaptability to different operational scenarios. The primary and alternate recovery tables may be structured to store actions for various error types, with associated parameters or conditions for execution. This method enhances error handling flexibility in computing systems by enabling context-aware recovery decisions.

Claim 3

Original Legal Text

3. The computer program product of claim 2 , wherein the operations further comprise: setting the flag to use the at least one alternate recovery action in the alternate recovery table to recover from an error in response to receiving the alternate recovery table.

Plain English Translation

This invention relates to error recovery in computer systems, specifically a method for dynamically updating error recovery actions using an alternate recovery table. The problem addressed is the inflexibility of traditional error recovery mechanisms, which rely on preconfigured recovery actions that may not be optimal for all error scenarios or system states. The invention involves a computer program product that includes operations for managing error recovery. When an alternate recovery table is received, the system sets a flag to indicate that the alternate recovery actions in the table should be used instead of default recovery actions. The alternate recovery table contains one or more recovery actions that are dynamically provided, allowing the system to adapt its error recovery strategy based on current conditions or external inputs. This approach improves system resilience by enabling more flexible and context-aware error handling. The alternate recovery table may be generated by an external system or a monitoring component within the same system, and it can include actions tailored to specific error types, system configurations, or environmental factors. By dynamically updating recovery actions, the system can avoid rigid, preprogrammed responses that may not be effective in all situations. This method enhances reliability and reduces downtime by ensuring that the most appropriate recovery actions are applied when errors occur.

Claim 4

Original Legal Text

4. The computer program product of claim 2 , wherein the at least one flag comprises a plurality of flags, one for each of the initial recovery actions indicating whether to perform the initial recovery action or the alternate recovery action in the alternate recovery table provided for the initial recovery action, wherein the determining whether to use the initial recovery action in the initial recovery table for the detected error or the alternate recovery action in the alternate recovery table for the initial recovery action comprises determining whether the flag for the initial recovery action for detected error indicates to use the initial recovery action or the alternate recovery action.

Plain English Translation

This invention relates to error recovery in computer systems, specifically a method for dynamically selecting between initial and alternate recovery actions based on configurable flags. The problem addressed is the rigidity of traditional error recovery systems, which often rely on predefined recovery actions that may not be optimal for all scenarios. The invention improves upon this by providing a flexible recovery mechanism that allows system administrators or automated processes to choose between multiple recovery options for a given error. The system includes an initial recovery table containing predefined recovery actions for detected errors and an alternate recovery table containing alternative recovery actions for each initial action. A plurality of flags, one for each initial recovery action, determines whether to execute the initial action or its corresponding alternate action. When an error is detected, the system checks the flag associated with the initial recovery action for that error. If the flag indicates the alternate action should be used, the system retrieves and executes the alternate action from the alternate recovery table instead of the initial action. This allows for dynamic adaptation of recovery strategies based on system conditions, historical performance, or other factors without requiring changes to the core recovery logic. The invention enhances system reliability by enabling more tailored and effective error recovery responses.

Claim 5

Original Legal Text

5. The computer program product of claim 1 , wherein each of the at least one alternate recovery action is provided for one of the initial recovery actions when application of the initial recovery action would result in a data integrity or data loss exposure when applied to address the detected error in the computing system.

Plain English Translation

This invention relates to a computer program product for managing error recovery in computing systems, particularly addressing scenarios where initial recovery actions could compromise data integrity or cause data loss. The system detects errors in a computing system and applies initial recovery actions to resolve them. However, if applying an initial recovery action would risk data integrity or loss, the system provides at least one alternate recovery action as a safeguard. These alternate actions are designed to mitigate the risk while still addressing the detected error. The system may include multiple alternate recovery actions, each tailored to specific error conditions or system states. The invention ensures that recovery processes do not inadvertently worsen the system's state by avoiding actions that could lead to data corruption or loss. This approach enhances reliability by dynamically selecting the safest recovery path when standard methods are unsafe. The system may also include mechanisms to log recovery actions, track their effectiveness, and refine future responses based on historical data. The overall goal is to maintain system stability and data integrity during error recovery, even in complex or high-risk scenarios.

Claim 6

Original Legal Text

6. The computer program product of claim 1 , wherein the operations further comprise: receiving a code load to update code for the at least one of the initial recovery actions for which the at least one alternate recovery action is provided in the alternate recovery table, wherein the code load fixes a data integrity or data loss exposure in the at least one of the initial recovery actions; and return to using the at least one of the initial recovery actions to which the code load is applied from using the alternate recovery action for the initial recovery action after applying the code load.

Plain English Translation

This invention relates to systems for managing recovery actions in computing environments, particularly where initial recovery actions may have vulnerabilities that could lead to data integrity or data loss issues. The problem addressed is the need to temporarily replace vulnerable recovery actions with alternate recovery actions while a fix is developed and deployed, then seamlessly revert to the original recovery actions once the fix is applied. The system includes an alternate recovery table that stores alternate recovery actions for at least one initial recovery action that has been identified as having a data integrity or data loss exposure. When a code load is received to update the vulnerable initial recovery action, the system verifies that the update addresses the identified exposure. After confirming the fix, the system switches back from using the alternate recovery action to the updated initial recovery action. This ensures continuous system reliability by maintaining operational recovery capabilities while vulnerabilities are resolved. The approach minimizes downtime and reduces risk by providing a controlled transition between recovery mechanisms.

Claim 7

Original Legal Text

7. The computer program product of claim 1 , wherein the computing system includes a first processing unit that accesses a storage through a first device adaptor and a second processing unit that accesses the storage through a second device adaptor, wherein for an error in a path from the first processing unit to the storage, the initial recovery action comprises a failover for the first processing unit to use the second device adaptor and the alternate recovery action comprises a failover from the first processing unit to the second processing unit.

Plain English Translation

This invention relates to fault-tolerant computing systems designed to maintain data access and processing continuity in the event of hardware failures. The system addresses the problem of ensuring uninterrupted operation when a processing unit loses access to shared storage due to a path failure, such as a malfunctioning device adapter or storage link. The solution involves a dual-path architecture with redundant components to enable seamless recovery. The system includes at least two processing units, each capable of accessing a shared storage device through separate device adapters. If a failure occurs in the path between a primary processing unit and the storage, the system first attempts an initial recovery action by redirecting the primary processing unit's storage access through the secondary device adapter. If this fails, an alternate recovery action is triggered, transferring the primary processing unit's workload to the secondary processing unit, which continues accessing the storage via its own device adapter. This dual-layer failover mechanism ensures minimal downtime and data integrity. The invention is particularly useful in high-availability environments where continuous data access is critical, such as enterprise servers or cloud computing infrastructures. By providing redundant access paths and processing units, the system mitigates the risk of data loss or service disruption caused by hardware failures. The recovery actions are automated, reducing manual intervention and improving system resilience.

Claim 8

Original Legal Text

8. The computer program product of claim 1 , wherein the computing system includes a first processing unit that accesses a storage through a first device adaptor and a second processing unit that accesses the storage through a second device adaptor, wherein for an error at the first processing unit, the initial recovery action comprises a failover from the first processing unit to the second processing unit and the alternate recovery action comprises a reboot of the first processing unit.

Plain English Translation

This invention relates to fault-tolerant computing systems designed to enhance reliability and availability. The system addresses the problem of processing unit failures by implementing a dual-processing architecture with redundant storage access paths. The computing system includes a first processing unit that accesses a shared storage system through a first device adapter and a second processing unit that accesses the same storage through a second device adapter. In the event of an error at the first processing unit, the system performs an initial recovery action by failing over operations to the second processing unit, ensuring continuous availability. If the initial recovery action fails or is insufficient, the system executes an alternate recovery action by rebooting the first processing unit. This dual-action recovery mechanism improves system resilience by providing multiple layers of fault tolerance. The shared storage system allows both processing units to maintain consistent access to data, minimizing downtime and data loss during failures. The invention is particularly useful in high-availability environments where uninterrupted operation is critical.

Claim 9

Original Legal Text

9. A computer program product for performing a recovery action, the computer program product comprising a computer readable storage medium having computer readable program code embodied therein that is executable to perform operations, the operations comprising: maintaining an initial recovery table providing initial recovery actions to perform for errors detected in a computing system including a first processing unit that accesses a storage through a first device adaptor and a second processing unit that accesses the storage through a second device adaptor; receiving an alternate recovery table including at least one alternate recovery action for at least one of the initial recovery actions, wherein an alternative recovery action provided for an initial recovery action specifies a different recovery path involving at least one of a different action and a different component in the computing system than involved in the initial recovery action; detecting an error in a defective processing unit comprising one of the first or second processing unit having an error; and determining whether to use the initial recovery action in the initial recovery table comprising a first action with respect to the defective processing unit or the alternate recovery action in the alternate recovery table comprising a second action different from the first action with respect to the defective processing unit; and using the initial recovery action or alternate recovery action determined to use to address the detected error.

Plain English Translation

This invention relates to error recovery in computing systems with multiple processing units and storage access paths. The problem addressed is the rigidity of traditional error recovery mechanisms, which often rely on predefined recovery actions that may not be optimal for all system configurations or error scenarios. The invention provides a flexible recovery mechanism that allows for dynamic selection between initial and alternate recovery actions. The system includes a computing environment with at least two processing units, each accessing shared storage through separate device adaptors. An initial recovery table defines standard recovery actions for detected errors, such as retrying an operation, switching to a backup component, or logging the error. An alternate recovery table can be introduced, containing modified recovery actions that differ in either the specific action taken or the system components involved. For example, an alternate action might involve a different processing unit or storage path than the initial action. When an error occurs in one of the processing units, the system evaluates both the initial and alternate recovery actions. It then selects the most appropriate action based on system conditions, error type, or other factors. This dynamic selection allows for more efficient and adaptable error recovery, improving system reliability and uptime. The invention ensures that recovery paths can be customized without requiring system-wide changes, making it suitable for environments where recovery strategies need to be adjusted over time.

Claim 10

Original Legal Text

10. The computer program product of claim 9 , wherein when the first action comprises a warmstart or a quiescing of I/O for the defective processing unit, the second action comprises a failover from the defective processing unit to another of the first or second processing unit, wherein when the first action comprises a shutdown of the defective processing unit, the second action comprises a reboot of the defective processing unit.

Plain English Translation

This invention relates to fault management in computer systems, specifically handling defective processing units to maintain system availability. The system monitors processing units for defects and performs corrective actions based on the type of defect detected. When a defect is identified, the system executes a first action to address the defect, such as a warmstart, quiescing of I/O, or shutdown of the defective processing unit. A warmstart or quiescing of I/O triggers a failover to another operational processing unit to ensure continuous operation. If the defect requires a shutdown, the system initiates a reboot of the defective processing unit to restore functionality. The system ensures minimal disruption by dynamically selecting the appropriate recovery action based on the defect's severity and impact. This approach improves system reliability by automatically transitioning workloads or restarting failed components without manual intervention. The invention is particularly useful in high-availability environments where uninterrupted processing is critical.

Claim 11

Original Legal Text

11. A system comprising: a processor; a persistent storage; a computer readable storage medium having computer readable program code embodied therein that when executed by the processor performs operations, the operations comprising: maintaining an initial recovery table providing initial recovery actions to perform for errors detected in the system; receiving an alternate recovery table including at least one alternate recovery action for at least one of the initial recovery actions, wherein an alternative recovery action provided for an initial recovery action specifies a different recovery path involving at least one of a different action and a different component in the system than involved in the initial recovery action, wherein the initial recovery table and the alternate recovery table are stored in the persistent storage to be maintained through a reboot of the system; detecting an error in the computing system; determining whether to use the initial recovery action in the initial recovery table for the detected error or the alternate recovery action in the alternate recovery table for the initial recovery action; and using the initial recovery action or alternate recovery action determined to use to address the detected error.

Plain English Translation

The system relates to error recovery in computing systems, specifically providing flexibility in recovery mechanisms. The problem addressed is the rigidity of traditional error recovery systems, which often rely on predefined recovery actions that may not be optimal for all scenarios or system configurations. The system includes a processor, persistent storage, and a computer-readable storage medium with program code that, when executed, performs operations for dynamic error recovery. The system maintains an initial recovery table containing predefined recovery actions for detected errors. Additionally, it receives an alternate recovery table that includes at least one alternate recovery action for one or more of the initial actions. An alternate recovery action specifies a different recovery path, involving either a different action or a different system component than the initial action. Both tables are stored in persistent storage to ensure they remain available even after a system reboot. When an error is detected, the system determines whether to use the initial recovery action or the alternate recovery action. The chosen action is then executed to address the error. This approach allows for customization and optimization of recovery processes based on system-specific requirements or environmental conditions, improving system resilience and adaptability.

Claim 12

Original Legal Text

12. The system of claim 11 , wherein the operations further comprise: maintaining at least one flag indicating whether to use the initial recovery action in the initial recovery table for the detected error or the alternate recovery action in the alternate recovery table, wherein the determining whether to use the initial or alternate recovery action comprises determining whether the flag indicates to use the initial recovery action or the alternate recovery action.

Plain English Translation

A system for error recovery in computing environments addresses the challenge of efficiently handling errors by providing flexible recovery mechanisms. The system includes a primary recovery table storing initial recovery actions for detected errors and an alternate recovery table storing alternative recovery actions. When an error occurs, the system determines whether to apply the initial or alternate recovery action based on a configurable flag. This flag allows dynamic selection between the two recovery options, enabling system administrators to adapt recovery strategies without modifying the underlying tables. The system ensures robust error handling by maintaining separate tables for different recovery actions and using a flag to control which action is executed. This approach improves system reliability by providing multiple recovery paths and allowing runtime adjustments to error handling behavior. The system is particularly useful in environments where error conditions may require different responses based on operational context or historical performance.

Claim 13

Original Legal Text

13. The system of claim 11 , wherein each of the at least one alternate recovery action is provided for one of the initial recovery actions when application of the initial recovery action would result in a data integrity or data loss exposure when applied to address the detected error in the system.

Plain English Translation

This invention relates to a system for managing data recovery in computing environments, particularly addressing scenarios where initial recovery actions may risk data integrity or loss. The system detects errors in a computing system and applies initial recovery actions to resolve them. However, if applying an initial recovery action would expose the system to data integrity issues or data loss, the system instead implements at least one alternate recovery action. These alternate actions are specifically designed to mitigate the risks associated with the initial recovery steps while still addressing the detected error. The system dynamically evaluates the potential consequences of each recovery option and selects the safest approach to maintain data consistency and reliability. This ensures that recovery processes do not inadvertently corrupt or lose data, providing a more robust and secure error-handling mechanism. The invention is particularly useful in environments where data integrity is critical, such as financial systems, healthcare databases, or enterprise applications where errors must be resolved without compromising data accuracy.

Claim 14

Original Legal Text

14. The system of claim 11 , wherein the operations further comprise: receiving a code load to update code for the at least one of the initial recovery actions for which the at least one alternate recovery action is provided in the alternate recovery table, wherein the code load fixes a data integrity or data loss exposure in the at least one of the initial recovery actions; and return to using the at least one of the initial recovery actions to which the code load is applied from using the alternate recovery action for the initial recovery action after applying the code load.

Plain English Translation

This invention relates to systems for managing recovery actions in computing environments, particularly where initial recovery actions may have data integrity or data loss vulnerabilities. The system includes a mechanism to identify such vulnerabilities in initial recovery actions and provide alternate recovery actions as temporary solutions until the underlying issues are resolved. The system maintains an alternate recovery table that maps initial recovery actions to their corresponding alternate actions, ensuring continuity of operations while mitigating risks. When a code update (code load) is received to address the data integrity or data loss exposure in the initial recovery action, the system applies the update and reverts to using the original initial recovery action, deprecating the alternate action. This approach ensures that recovery processes remain robust and reliable, even when initial methods are temporarily compromised, while allowing seamless transitions back to the original methods once they are fixed. The system automates the detection of vulnerabilities, the deployment of alternate actions, and the reversion to initial actions post-update, reducing manual intervention and minimizing downtime.

Claim 15

Original Legal Text

15. The system of claim 11 , wherein the system includes a first processing unit that accesses a storage through a first device adaptor and a second processing unit that accesses the storage through a second device adaptor, wherein for an error in a path from the first processing unit to the storage, the initial recovery action comprises a failover for the first processing unit to use the second device adaptor and the alternate recovery action comprises a failover from the first processing unit to the second processing unit.

Plain English Translation

This invention relates to a fault-tolerant computing system designed to improve data storage access reliability. The system addresses the problem of potential failures in data paths between processing units and shared storage, which can disrupt operations in high-availability environments. The system includes multiple processing units and a shared storage device, with each processing unit connected to the storage through separate device adaptors. In the event of a failure in the data path between a processing unit and the storage, the system implements a two-tiered recovery mechanism. The initial recovery action involves a failover where the affected processing unit switches to an alternate device adaptor to maintain access to the storage. If this fails, the system executes an alternate recovery action by failing over the processing unit's workload to another processing unit, ensuring continuous operation. This dual-failover approach enhances system resilience by providing redundant access paths and processing redundancy, minimizing downtime and data access interruptions. The system is particularly useful in mission-critical applications where uninterrupted access to storage is essential.

Claim 16

Original Legal Text

16. A method for performing a recovery action upon detecting an error in a computing system, comprising: maintaining an initial recovery table providing initial recovery actions to perform for errors detected in the computing system; receiving an alternate recovery table including at least one alternate recovery action for at least one of the initial recovery actions, wherein an alternative recovery action provided for an initial recovery action specifies a different recovery path involving at least one of a different action and a different component in the computing system than involved in the initial recovery action, wherein the initial recovery table and the alternate recovery table are stored in a persistent storage to be maintained through a reboot of the computing system; detecting an error in the computing system; determining whether to use the initial recovery action in the initial recovery table for the detected error or the alternate recovery action in the alternate recovery table for the initial recovery action; and using the initial recovery action or alternate recovery action determined to use to address the detected error.

Plain English Translation

This invention relates to error recovery in computing systems, specifically improving the flexibility and effectiveness of recovery actions. The problem addressed is the rigidity of traditional error recovery mechanisms, which often rely on predefined actions that may not be optimal for all scenarios or system configurations. The solution involves a dynamic approach to error recovery by allowing alternate recovery paths to be introduced alongside initial recovery actions. The method maintains an initial recovery table that lists predefined recovery actions for errors detected in the computing system. An alternate recovery table can be received, which includes at least one alternate recovery action for one or more of the initial actions. These alternate actions specify different recovery paths, either by performing a different action or involving a different system component than the initial action. Both tables are stored persistently, ensuring they remain available even after a system reboot. When an error is detected, the system determines whether to use the initial recovery action or the alternate recovery action for that error. The chosen action is then executed to address the error. This approach allows for more adaptable error recovery, enabling system administrators or automated processes to customize recovery strategies based on specific system conditions or requirements. The persistent storage ensures that recovery configurations remain intact across system restarts, maintaining reliability.

Claim 17

Original Legal Text

17. The method of claim 16 , further comprising: maintaining at least one flag indicating whether to use the initial recovery action in the initial recovery table for the detected error or the alternate recovery action in the alternate recovery table, wherein the determining whether to use the initial or alternate recovery action comprises determining whether the flag indicates to use the initial recovery action or the alternate recovery action.

Plain English Translation

This invention relates to error recovery systems in computing environments, specifically methods for selecting between initial and alternate recovery actions when an error is detected. The problem addressed is the need for flexible and efficient error recovery mechanisms that can adapt based on system conditions or user preferences. The method involves maintaining a flag that indicates whether to use an initial recovery action or an alternate recovery action for a detected error. The initial recovery action is stored in an initial recovery table, while the alternate recovery action is stored in an alternate recovery table. When an error is detected, the system checks the flag to determine which recovery action to execute. This allows the system to dynamically switch between predefined recovery strategies without modifying the underlying error detection or recovery logic. The flag can be set based on various factors, such as system performance, error frequency, or user configuration. By using this approach, the system can optimize recovery processes, reduce downtime, and improve reliability. The method ensures that the appropriate recovery action is selected based on the current state or requirements of the system, enhancing overall fault tolerance.

Claim 18

Original Legal Text

18. The method of claim 16 , wherein each of the at least one alternate recovery action is provided for one of the initial recovery actions when application of the initial recovery action would result in a data integrity or data loss exposure when applied to address the detected error in the computing system.

Plain English Translation

This invention relates to error recovery in computing systems, specifically addressing scenarios where initial recovery actions could compromise data integrity or cause data loss. The method involves detecting an error in a computing system and identifying an initial recovery action to address the error. If applying this initial action would risk data integrity or loss, the system provides at least one alternate recovery action. These alternate actions are designed to mitigate the error without exposing the system to the same risks as the initial action. The method ensures that recovery processes do not inadvertently worsen the system's state by avoiding actions that could corrupt or lose data. The system may include mechanisms to evaluate the potential impact of recovery actions before execution, selecting the safest option to restore functionality while preserving data integrity. This approach is particularly useful in environments where data reliability is critical, such as databases, file systems, or distributed computing systems. The invention improves error handling by dynamically adjusting recovery strategies based on real-time risk assessments, reducing the likelihood of secondary failures during recovery.

Claim 19

Original Legal Text

19. The method of claim 16 , further comprising: receiving a code load to update code for the at least one of the initial recovery actions for which the at least one alternate recovery action is provided in the alternate recovery table, wherein the code load fixes a data integrity or data loss exposure in the at least one of the initial recovery actions; and return to using the at least one of the initial recovery actions to which the code load is applied from using the alternate recovery action for the initial recovery action after applying the code load.

Plain English Translation

This invention relates to systems for managing data recovery actions in computing environments, particularly where initial recovery actions may have vulnerabilities. The problem addressed is ensuring data integrity and preventing data loss when primary recovery mechanisms are flawed or exposed to risks. The system includes an alternate recovery table that stores backup recovery actions to replace initial recovery actions that have known issues. When a code update (code load) is received to fix the vulnerabilities in the initial recovery actions, the system applies the update and reverts to using the original recovery actions instead of the alternate ones. This ensures that recovery processes remain reliable while allowing for temporary workarounds during periods of vulnerability. The invention improves system resilience by dynamically switching between recovery methods based on the availability of secure and reliable code. The solution is particularly useful in environments where data integrity is critical, such as financial systems, healthcare databases, or enterprise storage solutions.

Claim 20

Original Legal Text

20. The method of claim 16 , wherein the computing system includes a first processing unit that accesses a storage through a first device adaptor and a second processing unit that accesses the storage through a second device adaptor, wherein for an error in a path from the first processing unit to the storage, the initial recovery action comprises a failover for the first processing unit to use the second device adaptor and the alternate recovery action comprises a failover from the first processing unit to the second processing unit.

Plain English Translation

This invention relates to fault-tolerant computing systems designed to maintain data access and processing continuity in the event of hardware failures. The system addresses the problem of ensuring uninterrupted operation when a processing unit or its connection to storage fails, which can disrupt critical applications and data availability. The system includes at least two processing units, each capable of accessing a shared storage system through separate device adaptors. If an error occurs in the path between a processing unit and the storage, the system implements a two-tiered recovery mechanism. The initial recovery action involves a failover where the affected processing unit switches to using the second device adaptor, allowing it to continue accessing storage through an alternative path. If this fails or is insufficient, the alternate recovery action involves a full failover, where the processing unit's workload is transferred to the second processing unit, ensuring continued operation without data loss or downtime. This approach provides redundancy at both the storage access path and processing unit levels, enhancing reliability in high-availability computing environments. The system is particularly useful in scenarios where continuous data access is critical, such as in enterprise servers, cloud computing, or mission-critical applications. The failover mechanisms are automated, reducing manual intervention and minimizing recovery time.

Claim 21

Original Legal Text

21. A system for accessing a storage, comprising: a first processing unit that accesses the storage through a first device adaptor; a second processing unit that accesses the storage through a second device adaptor; a computer readable storage medium having computer readable program code embodied therein that when executed by one of the first processing unit and the second processing unit performs operations, the operations comprising: maintaining an initial recovery table providing initial recovery actions to perform for errors detected in the first processing unit and the second processing unit; receiving an alternate recovery table including at least one alternate recovery action for at least one of the initial recovery actions, wherein an alternative recovery action provided for an initial recovery action specifies a different recovery path involving at least one of a different action and a different component than involved in the initial recovery action; detecting an error in a defective processing unit comprising one of the first or second processing unit having an error; and determining whether to use the initial recovery action in the initial recovery table comprising a first action with respect to the defective processing unit or the alternate recovery action in the alternate recovery table comprising a second action different from the first action with respect to the defective processing unit; and using the initial recovery action or alternate recovery action determined to use to address the detected error.

Plain English Translation

The system is designed for accessing a storage system using multiple processing units, each connected to the storage through separate device adaptors. The system includes a first and second processing unit, each capable of accessing shared storage. A computer-readable storage medium contains program code that, when executed, performs operations to manage error recovery. The system maintains an initial recovery table that defines standard recovery actions for errors detected in either processing unit. Additionally, the system can receive an alternate recovery table that provides modified recovery actions, where each alternate action specifies a different recovery path than the initial action. This path may involve different steps or components. When an error is detected in one of the processing units, the system determines whether to use the initial recovery action or the alternate recovery action. The chosen action is then executed to address the error. This approach allows for flexible error handling, enabling customized recovery procedures based on specific system conditions or requirements. The system ensures robust storage access by dynamically selecting the most appropriate recovery method for detected errors.

Claim 22

Original Legal Text

22. A method for performing a recovery action upon detecting an error in a computing system, comprising: maintaining an initial recovery table providing initial recovery actions to perform for errors detected in the computing system including a first processing unit that accesses a storage through a first device adaptor and a second processing unit that accesses the storage through a second device adaptor; receiving an alternate recovery table including at least one alternate recovery action for at least one of the initial recovery actions, wherein an alternative recovery action provided for an initial recovery action specifies a different recovery path involving at least one of a different action and a different component in the computing system than involved in the initial recovery action; detecting an error in a defective processing unit comprising one of the first or second processing unit having an error; and determining whether to use the initial recovery action in the initial recovery table comprising a first action with respect to the defective processing unit or the alternate recovery action in the alternate recovery table comprising a second action different from the first action with respect to the defective processing unit; and using the initial recovery action or alternate recovery action determined to use to address the detected error.

Plain English Translation

This invention relates to error recovery in computing systems with multiple processing units and storage access paths. The problem addressed is the inflexibility of traditional error recovery mechanisms, which rely on predefined recovery actions that may not be optimal for all system configurations or error scenarios. The solution involves a dynamic recovery system that allows for customization of recovery actions based on system conditions. The method maintains an initial recovery table that specifies default recovery actions for errors detected in a computing system. The system includes at least two processing units, each accessing shared storage through separate device adaptors. An alternate recovery table is received, containing modified recovery actions that override or supplement the initial actions. These alternate actions may involve different recovery paths, such as alternative components or steps, to address the same error. When an error is detected in one of the processing units, the system evaluates whether to use the initial recovery action or the alternate recovery action. The chosen action is then executed to resolve the error. This approach improves system reliability by allowing recovery strategies to be adapted based on system-specific requirements or historical error patterns.

Patent Metadata

Filing Date

Unknown

Publication Date

March 3, 2020

Inventors

Matthew G. Borlick
Lokesh M. Gupta
Karl A. Nielsen

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “USING ALTERNATE RECOVERY ACTIONS FOR INITIAL RECOVERY ACTIONS IN A COMPUTING SYSTEM” (10579476). https://patentable.app/patents/10579476

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10579476. See llms.txt for full attribution policy.

USING ALTERNATE RECOVERY ACTIONS FOR INITIAL RECOVERY ACTIONS IN A COMPUTING SYSTEM