A method is provided that includes generating a visual environment for interactive development of a machine learning (ML) model. The method includes accessing observations of data each of which includes values of independent variables and a dependent variable, and performing an interactive exploratory data analysis (EDA) of the values of a set of the independent variables. The method includes performing an interactive feature construction and selection based on the interactive EDA, and in which select independent variables are selected as or transformed into a set of features for use in building a ML model to predict the dependent variable. The method includes building the ML model using a ML algorithm, the set of features, and a training set produced from the set of features and observations of the data. And the method includes outputting the ML model for deployment to predict the dependent variable for additional observations of the data.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
2. The apparatus of claim 1, wherein the system is an aircraft, and the plurality of observations of the data is flight data for plurality of flights of the aircraft, for each flight of which the values of the plurality of independent variables are measurements of a plurality of properties recorded by an airborne flight recorder from a plurality of sensors or avionic systems during the flight, and the value of the dependent variable is an indication of a condition of the aircraft during the flight.
5. The apparatus of claim 1, wherein the apparatus being caused to perform the interactive feature construction and selection includes being caused to apply the one or more of the select independent variables to a transformation to produce a feature of the set of features, the one or more of the select independent variables or the transformation being selected based on user input via the GUI.
9. The method of claim 8, wherein the system is an aircraft, and the plurality of observations of the data is flight data for plurality of flights of the aircraft, for each flight of which the values of the plurality of independent variables are measurements of a plurality of properties recorded by an airborne flight recorder from a plurality of sensors or avionic systems during the flight, and the value of the dependent variable is an indication of a condition of the aircraft during the flight.
12. The method of claim 8, wherein performing the interactive feature construction and selection includes applying the one or more of the select independent variables to a transformation to produce a feature of the set of features, the one or more of the select independent variables or the transformation being selected based on user input via the GUI.
16. The non-transitory computer-readable storage medium of claim 15, wherein the system is an aircraft, and the plurality of observations of the data is flight data for plurality of flights of the aircraft, for each flight of which the values of the plurality of independent variables are measurements of a plurality of properties recorded by an airborne flight recorder from a plurality of sensors or avionic systems during the flight, and the value of the dependent variable is an indication of a condition of the aircraft during the flight.
19. The non-transitory computer-readable storage medium of claim 15, wherein the feature construction and selection is an interactive feature construction and selection, and the apparatus being caused to perform the feature construction and selection includes being caused to apply the one or more of the select independent variables to a transformation to produce a feature of the set of features, the one or more of the select independent variables or the transformation being selected based on user input via the GUI.
This invention relates to a system for interactive feature construction and selection in machine learning or data analysis. The problem addressed is the need for users to dynamically construct and refine features from raw data to improve model performance, often requiring iterative experimentation with different transformations and variable selections. The system provides a graphical user interface (GUI) that allows users to interactively select independent variables from a dataset and apply mathematical or statistical transformations to generate new features. The transformations can include operations such as scaling, normalization, polynomial expansion, or domain-specific functions. Users can provide input through the GUI to guide the selection of variables and transformations, enabling real-time adjustments to the feature set. The system then evaluates the constructed features to assess their impact on model performance, allowing users to refine the feature engineering process iteratively. This interactive approach reduces the trial-and-error involved in traditional feature engineering, making the process more efficient and user-driven. The invention is particularly useful in applications where domain expertise is required to define meaningful features, such as in scientific research, financial modeling, or healthcare analytics.
22. The apparatus of claim 3, wherein the apparatus is further caused to build a version of the machine learning model using the machine learning algorithm, the modified set of features, and a modified training set produced from the modified set of features and the plurality of observations of the data.
This invention relates to machine learning systems that optimize model training by dynamically adjusting feature sets and training data. The problem addressed is the inefficiency of traditional machine learning approaches that rely on static feature selection and training datasets, which can lead to suboptimal model performance or excessive computational costs. The apparatus includes a machine learning model, a machine learning algorithm, and a set of features derived from a dataset containing multiple observations. The apparatus is configured to modify the set of features based on predefined criteria, such as feature importance or redundancy, to create a refined feature set. It then generates a modified training set by processing the original observations using the refined features. The apparatus further builds a version of the machine learning model using the machine learning algorithm, the refined feature set, and the modified training set. This dynamic adaptation improves model accuracy and efficiency by ensuring the training process uses only the most relevant features and corresponding data. The invention is particularly useful in scenarios where computational resources are limited or where feature relevance changes over time.
23. The method of claim 10 further comprising building a version of the machine learning model using the machine learning algorithm, the modified set of features, and a modified training set produced from the modified set of features and the plurality of observations of the data.
24. The apparatus of claim 17, wherein the apparatus is further caused to build a version of the machine learning model using the machine learning algorithm, the modified set of features, and a modified training set produced from the modified set of features and the plurality of observations of the data.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
October 25, 2018
November 15, 2022
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.