Patentable/Patents/US-12612854-B2
US-12612854-B2

Method and system for predicting the lifespan of electric submersible pumps using random-forest machine-learning

PublishedApril 28, 2026
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

A method for predicting a lifespan of an electric submersible pump (ESP) involves obtaining data associated with the ESP, the data originating from different categories, predicting, using a machine learning model, based on the data, a remaining expected life of the ESP, and reporting the remaining expected life.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

. A method comprising:

2

. The method of, wherein the ESP operational data comprises ESP operational parameters, environmental parameters, design parameters, historical data, or equipment specifications.

3

. The method of, further comprising reporting the remaining expected life by identifying that the remaining expected life is below a specified threshold value.

4

. The method of, further comprising determining features in the ESP operational data that have a highest impact on extending the remaining expected life of the ESP.

5

. The method of, wherein determining the features comprises at least one selected from a group consisting of limiting an idle time of the ESP prior to active service, optimizing parameters for the ESP to operate efficiently, and optimize a design of the ESP.

Detailed Description

Complete technical specification and implementation details from the patent document.

Many oil wells are artificially lifted via electric submersible pumps (ESPs). ESP failures often result in significant production deferment until workover is completed to replace the failed ESPs. The lifespan ESPs may be negatively affected by many factors such as high pressure, high temperature, sour oil environment, etc. Accordingly, predicting the expected remaining lifespan of an ESP is nontrivial.

This summary is provided to introduce a selection of concepts that are further described below in the detailed description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in limiting the scope of the claimed subject matter.

In general, in one aspect, embodiments relate to a method for predicting a lifespan of an electric submersible pump (ESP), the method comprising: obtaining data associated with the ESP, the data originating from a plurality of different categories; predicting, using a machine learning model, based on the data, a remaining expected life of the ESP; and reporting the remaining expected life.

In general, in one aspect, embodiments relate to a system for predicting a lifespan of an electric submersible pump (ESP), the system comprising: a plurality of sensors configured to measure first parameters associated with the ESP; a database configured to store second parameters associated with the ESP; and a prediction engine configured to: obtain data associated with the ESP, the data originating from a plurality of different categories and the data comprising the first parameters and the second parameters; predict, using a machine learning model, based on the data, a remaining expected life of the ESP; and report the remaining expected life.

In general, in one aspect, embodiments relate to a non-transitory machine-readable medium comprising a plurality of machine-readable instructions executed by one or more processors, the plurality of machine-readable instructions causing the one or more processors to perform operations comprising: obtaining data associated with an ESP, the data originating from a plurality of different categories; predicting, using a machine learning model, based on the data, a remaining expected life of the ESP; and reporting the remaining expected life.

In light of the structure and functions described above, embodiments of the invention may include respective means adapted to carry out various steps and functions defined above in accordance with one or more aspects and any one of the embodiments of one or more aspect described herein.

Other aspects and advantages of the claimed subject matter will be apparent from the following description and the appended claims.

In the following detailed description of embodiments of the disclosure, numerous specific details are set forth in order to provide a more thorough understanding of the disclosure. However, it will be apparent to one of ordinary skill in the art that the disclosure may be practiced without these specific details. In other instances, well-known features have not been described in detail to avoid unnecessarily complicating the description.

Throughout the application, ordinal numbers (e.g., first, second, third, etc.) may be used as an adjective for an element (i.e., any noun in the application). The use of ordinal numbers is not to imply or create any particular ordering of the elements nor to limit any element to being only a single element unless expressly disclosed, such as using the terms “before”, “after”, “single”, and other such terminology. Rather, the use of ordinal numbers is to distinguish between the elements. By way of an example, a first element is distinct from a second element, and the first element may encompass more than one element and succeed (or precede) the second element in an ordering of elements.

Many oil wells are artificially lifted via electric submersible pumps (ESPs). ESP failures often result in significant production deferment until workover is completed to replace the failed ESPs. As failures and breakdown seem inevitable with continuous ESP operations, being able to predict failures, and thereby being able to reduce any substantial downtime and relative maintenance costs, would be highly beneficial. However, with the lifespan of ESPs potentially being negatively affected by many factors such as high pressure, high temperature, sour oil environment, etc., predicting the expected remaining lifespan of an ESP is nontrivial.

Embodiments of the disclosure provide a prediction of the expected remaining lifespan of ESPs. A machine learning model is used for the prediction. In one embodiment of the disclosure, the machine learning model is a random forest model to predict the lifespan and detect premature ESP failures. The disclosed embodiments have the ability to alert the user of potential failures ahead of time, thus providing the user with the benefit of being able to plan in advance to avoid production losses by formulating mitigation strategies to prolong ESP lifespan. While the presence of HS, high pressure and/or temperature, results in particularly harsh environments for ESPs and adversely affect their integrity and reliability, the machine learning model may also be used for other applications, e.g., in non-sour environments.

Embodiments of the disclosure may be used to accurately predict failure, optimizing ESP operation, and predict ESP health, thus making it easy to schedule ESP replacement when needed. Embodiments of the disclosure thereby help reduce the loss of production that would occur due to sudden, unexpected ESP failures. Unlike other predictive methods that can be computationally demanding, embodiments of the disclosure are computationally efficient, while providing a high degree of robustness and accuracy. They further generalize well, with a relatively low overall variance, and a low bias. Additional details are subsequently provided, after an introductory discussion of well environments.

shows a well environment () in accordance with embodiments of the disclosure. The well environment () includes a hydrocarbon reservoir (“reservoir”) () located in a subsurface hydrocarbon-bearing formation () and a well system (). The hydrocarbon-bearing formation () may include a porous or fractured rock formation that resides underground, beneath the earth's surface (“surface”) (). In the case of the well system () being a hydrocarbon well, the reservoir () may include a portion of the hydrocarbon-bearing formation (). The hydrocarbon-bearing formation () and the reservoir () may include different layers of rock having varying characteristics, such as varying degrees of permeability, porosity, and resistivity. In the case of the well system () being operated as a production well, the well system () may facilitate the extraction of hydrocarbons (or “production”) from the reservoir (). In the case of the well system () being operated as an injection well, the well system () may be used in a tertiary recovery method to displace the produced hydrocarbons and/or to maintain the pressure profile of the reservoir ().

In some embodiments, the well system () includes a wellbore (), a well sub-surface system (), a well surface system (), and a well monitoring and control system (). The well monitoring and control system () may monitor and/or control various operations of the well system (), such as well production operations, well completion operations, well maintenance operations, and reservoir monitoring, assessment and development operations. In one or more embodiments, the well monitoring and control system () is configured to operate and or monitor the electric submersible pump (ESP) (), as further discussed below. In some embodiments, the well monitoring and control system () includes a computer system that is the same as or similar to that of computer system () described below inand the accompanying description.

The wellbore () may include a bored hole that extends from the surface () into a target zone of the hydrocarbon-bearing formation (), such as the reservoir (). An upper end of the wellbore (), terminating at or near the surface (), may be referred to as the “up-hole” end of the wellbore (), and a lower end of the wellbore, terminating in the hydrocarbon-bearing formation (), may be referred to as the “downhole” end of the wellbore (). The wellbore () may facilitate the circulation of drilling fluids during drilling operations, the flow of hydrocarbon production (“production”) () (e.g., oil and gas) from the reservoir () to the surface () during production operations, the injection of substances (e.g., water) into the hydrocarbon-bearing formation () or the reservoir () during injection operations, or the communication of monitoring devices (e.g., logging tools) into the hydrocarbon-bearing formation () or the reservoir () during monitoring operations (e.g., during in situ logging operations).

In one or more embodiments, the well system () is an artificially lifted well system with an ESP () supporting production (). The ESP () may be any type of submersible pump, e.g., a multistage centrifugal pump. Stages may be stacked based on the operating requirements of the well system (). Many different factors, including the environmental conditions in the wellbore () may result in mechanical and/or electrical failures within several ESP parts, thereby affecting run life.

In one or more embodiments, during operation of the well system (), the well monitoring and control system () monitors and controls the ESP (). In one or more embodiments, the monitoring and control system () performs operations of methods described in reference to the flowchart of. Software instructions in the form of computer readable program code to perform the operations in accordance with embodiments may be stored, in whole or in part, temporarily or permanently, on a non-transitory computer readable medium. The well monitoring and control system () may further collect and record wellhead data for the well system () and other data regarding downhole equipment and downhole sensors. The wellhead data may include, for example, a record of measurements of wellhead pressure (P) (e.g., including flowing wellhead pressure (FWHP)), wellhead temperature (T) (e.g., including flowing wellhead temperature), wellhead production rate (R) over some or all of the life of the well system (), and/or water cut data. In some embodiments, the measurements are recorded in real-time, and are available for review or use within seconds, minutes or hours of the condition being sensed (e.g., the measurements are available within 1 hour of the condition being sensed). In such an embodiment, the wellhead data may be referred to as “real-time” wellhead data. Real-time wellhead data may enable an operator of the well to assess a relatively current state of the well system (), and make real-time decisions regarding development of the well system () and the reservoir (), such as on-demand adjustments in regulation of production flow from the well or injection flow to the well.

In some embodiments, the well surface system () includes a wellhead (). The wellhead () may include a rigid structure installed at the “up-hole” end of the wellbore (), at or near where the wellbore () terminates at the Earth's surface (). The wellhead () may include structures for supporting (or “hanging”) casing and production tubing extending into the wellbore (). Production () may flow through the wellhead (), after exiting the wellbore () and the well sub-surface system (), including, for example, the casing and the production tubing.

In some embodiments, the well surface system () includes a surface sensing system (). The surface sensing system () may include sensor devices for sensing characteristics of substances, including production (), passing through or otherwise located in the well surface system (). The characteristics may include, for example, pressure, temperature and flow rate of production () flowing through the wellhead (), or other conduits of the well surface system (), after exiting the wellbore ().

Whileshows various configurations of hardware components and/or software components, other configurations may be used without departing from the scope of the disclosure. For example, various components inmay be combined to create a single component. As another example, the functionality performed by a single component may be performed by two or more components.

shows a system () for predicting the lifespan of an ESP, in accordance with one or more embodiments. The system () includes a prediction engine () that operates on data () associated with the operation of the ESP to generate a prediction () of the lifespan of the ESP. In one or more embodiments, the prediction () is made by a machine learning algorithm (). The machine learning algorithm, in one embodiment, is a random forest model, which may operate as a supervised machine learning algorithm performing a regression to predict the timeline of a failure event for the ESP.

A random forest model is an ensemble machine learning algorithm that uses multiple decision trees to make predictions. The architecture of random forest models is unique in that it combines multiple decision trees to reduce the risk of overfitting and improve the overall generalization of the model and the accuracy of predictions, in comparison to individual trees. This is based on the idea that multiple “weak learners” can combine to create a “strong learner.” Each individual classifier is considered a “weak learner,” while the group of classifiers functioning together is regarded as a “strong learner.” This approach allows random forests to effectively capture complex relationships and interactions between features, resulting in better predictive performance.

Each of the multiple decision trees operates on a different subset of the same dataset, followed by taking an average of the results to improve the overall accuracy of the predictions. In other words, instead of relying on a single decision tree, the random forest gathers predictions from each tree and makes a final prediction based on the majority of these predictions.

The architecture of a random forest model is suitable for predicting the failure of an ESP because it is capable of capturing complex and non-linear relationships between predictors and a target variable. The predictors may originate from different categories. Data () associated with the ESP may be collected for these predictors and may serve as inputs to the prediction engine (). Examples of the different categories and predictors in these categories include, but are not limited to:

Data () associated with the ESP may vary based on the specific ESP system and the data available. These and other data that may be acquired in real-time or near-real-time, e.g., by sensors () as shown in. Other data may be obtained from a database (). The choice of predictors in the data () to be used by the prediction engine () may affect the predictive performance and may be part of the training of the machine learning algorithm () as described below in reference to.

Those skilled in the art will appreciate that the data () mentioned above are some examples of the input variables that may be used to build the random forest machine learning model to predict ESP lifespans in sour high-pressure and temperature environments. These inputs are fed into each node of a decision tree to build the random forest model. Feeding more inputs into a random forest predictive model can increase the complexity of the model and potentially lead to better predictions. On one hand, feeding more inputs can provide the model with more information fundamental to the performance of the ESP, and potentially improve its accuracy. On the other hand, to overcome the challenge of overfitting that results with increasing the size of the feature space and reducing the risk of the model relying too heavily on any one input, or from the additional irrelevant inputs that may contain noise, additional processing is required. Specifically, feature selection and cross-validation may be performed on the data. Cross-validation involves splitting the data into multiple training and validation sets and testing the random forest (ML) model on each of these. For example, cross-validation may be performed using the cross-val-score function from the scikit-learn library using Python software available in Anaconda.

The prediction () of remaining expected life of the ESP is the output of the prediction engine () when operating on the data () associated with the ESP. The prediction () of the remaining expected life of the ESP may be a number, such as the number of days remaining until failure, rather than a specific date or time of the anticipated failure or a more general measure of pump deterioration.

show flowcharts in accordance with one or more embodiments. One or more steps inmay be performed by one or more components (e.g., well monitoring and control system () as described in). While the various steps inare presented and described sequentially, one of ordinary skill in the art will appreciate that some or all of the steps may be executed in different orders, may be combined or omitted, and some or all of the steps may be executed in parallel. Furthermore, the blocks may be performed actively or passively.

, a method for training a random forest model, in accordance with one or more embodiments, is shown.

The training may be achieved by training the machine learning model on training data where the target variable is the time until failure, and the input features are related to ESPs and their environments.

In Step, the training data associated with the operation of ESPs are obtained. The training data may include data associated with any of the predictors in any of the categories as previously described. For example, the training data may be historical data recorded from ESPs as they were operating over time. The historical data may also include a documentation of failures of these ESPs, thereby allowing the training data to be used for a supervised training of the machine learning algorithm. To ensure good generalization, the training data may be comprehensive and may include data from different well environments, for different ESPs, etc. In other words, training data that accurately and completely covers the lifetime of the ESPs is obtained. The training data may include features that are based on any combination of the parameters previously discussed.

In Step, the training data is pre-processed. The training data may be corrupted, noisy, or incomplete, making it difficult to build a robust model. Accordingly, the training data may be pre-processed to remove errors, outliers, or missing values. The pre-processing may further involve feature engineering. The feature engineering may identify the most influential features that contribute to ESP failure, to improve the accuracy of the model. Less relevant or irrelevant features may be removed from the training data. Different tools may be used to identify the relevance and characteristics of features.

For example, a heatmap may be used to visually represent and analyze the relationship between two variables using a color-coded grid. The heatmap may provide insights into the strength, direction, and shape of the relationship between the two variables. Also, histograms may be used to visually explore the distribution of the data and to help identify patterns and relationships between variables.

The pre-processing of the training data may also involve a data transformation that involves converting the training data into a suitable format for the training of the machine learning model. The data transformation may involve, for example, scaling or normalizing of features.

In Step, the random forest model is trained. Bagging (bootstrap aggregating) may be used for the training. The training involves a random split of the training data into a training data set, a validation data set, and a test data set. The ratio for the random split may be, for example, 80:10:10.

The training data set may be used to train the random forest model. The algorithm uses the data in the training data set to learn patterns and relationships between the features and the target variable.

The validation data set may be used to fine-tune the hyperparameters of the model. The hyperparameters are parameters that are not learned by the model during training, but rather set by the user. These control the behavior of the algorithm and can have a significant impact on the performance of the model. The validation data set is used to test different combinations of hyperparameters and select the ones that result in the best performance.

The test data set may be used to evaluate the final performance of the random forest model. Once the hyperparameters have been selected using the validation data set, the model is trained again using both the training and validation data sets, and the test data set is used to evaluate its performance. The test data set is a completely independent set of data that the model has never seen before and is used to estimate how the model will perform on new, unseen data. By splitting the training data into a training data set, a validation data set, and a test data set, a robust and reliable random forest model that can generalize well to new, unseen data may be obtained.

The random split may be performed using a sampling with replacement. Training the random forest model in building a random forest model entails the process of building multiple decision trees, where each tree is constructed using a random subset of features and a random subset of the training data set. During the training process, the decision trees are iteratively built by splitting the data at each node based on the best feature that separates the data points. The splitting may be performed such that the impurity of the data points at each node is minimized, e.g., based on a mean-squared error. The random forest model may be trained until the desired number of decision trees is built, and each tree grown to the maximum depth. Once trained, each tree makes a prediction based on its own set of decision rules. The final prediction is based on the average or majority vote of the individual tree predictions.

The described approach helps to avoid unstable models that cannot adapt to the addition of new data as well as overfit models that do not generalize well. The use of class weights in this method gives importance to minority classes when handling imbalanced data. There is no need to prune as in decision trees. In the course of prediction, every tree is utilized for making distinctive predictions. The predictions are combined through voting such as the obtaining of the average outcomes.

In Step, the performance of the random forest model is evaluated. Evaluation of the performance may involve a model selection performed to ensure that the model accurately captures the relationship between the input features and the time until failure. Further, the trained random forest model is validated using the test data set to ensure that it generalizes well to new cases.

Steps-form a training iteration. After completion of a training iteration, the relevance of the features in the training data may be evaluated. To determine the relevance of a feature, a measure called “feature importance” is used. It is calculated by considering the reduction in impurity at a node, weighted by the probability of reaching that node. This probability is calculated by dividing the number of samples that reach that node by the total number of samples. If the value of the feature importance is higher, it indicates that the feature is more important. Less relevant or irrelevant features may be eliminated in a subsequently performed training iteration. In other words, Steps-may be repeated, e.g., with irrelevant features removed.

The execution of the method including training of the random forest with different sets of hyperparameters may end when a random forest model achieves the desired performance, or no further performance improvements can be achieved. The model selection may be used to select the best-performing model (based on specified performance metrics) of the models that have been generated by repeatedly performing Steps-. Metrics used for evaluation may include, for example, the mean squared error (MSE), mean absolute error (MAE), root mean squared error (RMSE), and R-squared.

After completion of the training, a random forest model is available to perform prediction of the timeline of a failure event for an ESP, as discussed in reference to.

shows a method for predicting the lifespan of an ESP, in accordance with one or more embodiments.

In Step, data associated with the ESP are obtained. The data may include parameters as those used for the training of the random forest model. Some of the parameters may be obtained from sensors, e.g., in real-time or near-real-time. Other parameters may be obtained from databases.

In Step, the data associated with the ESP are pre-processed. The pre-processing may be performed analogous to the pre-processing in Step.

In Step, the expected remaining life of the ESP is predicted. The prediction is performed using the random forest model operating on the data associated with the ESP. Given the data, the model applies the set of decision trees in the random forest to the data, and the prediction is generated by aggregating the predictions of the individual trees by calculating the mean. The timeline of failure events in this context refers to the predicted output values for each time point in the future, based on the input features provided to the model.

In Step, the remaining expected life is reported. The value may be reported to a user. A warning or notification may be provided when the remaining expected life drops below a specified threshold value.

In Step, actions may be taken, based on various actionable insights resulting from the execution of the method. These actions are subsequently discussed.

Patent Metadata

Filing Date

Unknown

Publication Date

April 28, 2026

Inventors

Unknown

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Method and system for predicting the lifespan of electric submersible pumps using random-forest machine-learning” (US-12612854-B2). https://patentable.app/patents/US-12612854-B2

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.