Methods and systems for intrusion detection using federated learning. In some examples, a system includes a central server configured for generating an initial security model and distributing the initial security model to edge nodes. Each edge node is configured for executing an intrusion detection system. The system includes a first subset of edge nodes each of which is configured for training a respective local security model using captured security data during live operation. The system includes a second subset of edge nodes each of which is not configured for training a local security model. The system includes fog nodes, each fog node being on a communications path between at least one edge node and the central server. At least a first fog node is configured for training a respective local security model for one or more of the edge nodes from the second subset of edge nodes.
Legal claims defining the scope of protection, as filed with the USPTO.
. A system comprising:
. The system of, wherein at least one of the central server and the first fog node is configured for determining that the one or more of the edge nodes from the second subset of edge nodes lacks at least one computing resource and, in response, configuring the first fog node for training the respective local security model for the one or more of the edge nodes from the second subset of edge nodes.
. The system of, wherein determining that the one or more of the edge nodes from the second subset of edge nodes lacks at least one computing resource comprises determining that the one or more of the edge nodes from the second subset of edge nodes lacks one or more of: sufficient processing resources, sufficient memory resources, or sufficient communications resources for training and transmitting a local security model within a specified timeframe.
. The system of, wherein determining that the one or more of the edge nodes from the second subset of edge nodes lacks at least one computing resource comprises determining that the one or more of the edge nodes from the second subset of edge nodes has an inconsistent connection to a data communications network connecting to the central server.
. The system of, wherein determining that the one or more of the edge nodes from the second subset of edge nodes lacks at least one computing resource comprises querying the one or more of the edge nodes from the second subset of edge nodes.
. The system of, wherein the at least one fog node is configured for receiving captured security data during live operation from the one or more of the edge nodes from the second subset of edge nodes.
. The system of, wherein collecting the local security models from the first subset of edge nodes and the first fog node comprises periodically querying the first subset of edge nodes for the local security models.
. The system of, wherein the captured security data comprises one or more of: global positioning system (GPS) location, internet protocol (IP) address, and user metadata.
. The system of, wherein selecting a first local security model from the collected local security models to replace the initial security model comprises selecting the first local security model based on one or more of: detection accuracy, computation runtime, resource usage rate, and recall score.
. The system of, wherein generating the initial security model comprises generating a plurality of adversarial examples using a generative adversarial network and injecting the adversarial examples into the training data.
. A method comprising:
. The method of, wherein at least one of the central server and the first fog node is configured for determining that the one or more of the edge nodes from the second subset of edge nodes lacks at least one computing resource and, in response, configuring the first fog node for training the respective local security model for the one or more of the edge nodes from the second subset of edge nodes.
. The method of, wherein determining that the one or more of the edge nodes from the second subset of edge nodes lacks at least one computing resource comprises determining that the one or more of the edge nodes from the second subset of edge nodes lacks one or more of: sufficient processing resources, sufficient memory resources, or sufficient communications resources for training and transmitting a local security model within a specified timeframe.
. The method of, wherein determining that the one or more of the edge nodes from the second subset of edge nodes lacks at least one computing resource comprises determining that the one or more of the edge nodes from the second subset of edge nodes has an inconsistent connection to a data communications network connecting to the central server.
. The method of, wherein determining that the one or more of the edge nodes from the second subset of edge nodes lacks at least one computing resource comprises querying the one or more of the edge nodes from the second subset of edge nodes.
. The method of, wherein the at least a first fog node is configured for receiving captured security data during live operation from the one or more of the edge nodes from the second subset of edge nodes.
. The method of, wherein collecting the local security models from the first subset of edge nodes and the first fog node comprises periodically querying the first subset of edge nodes for the local security models.
. The method of, wherein the captured security data comprises one or more of: global positioning system (GPS) location, internet protocol (IP) address, and user metadata.
. The method of, wherein selecting a first local security model from the collected local security models to replace the initial security model comprises selecting the first local security model based on one or more of: detection accuracy, computation runtime, resource usage rate, and recall score.
. The method of, wherein generating the initial security model comprises generating a plurality of adversarial examples using a generative adversarial network and injecting the adversarial examples into the training data.
. The method of, wherein applying compression to the collected local security models from the first subset of edge nodes and the first fog node comprises periodically querying the first subset of edge nodes for the local security models.
. The method of, wherein applying compression to the replaced initial security model trained on the second set of fog nodes comprises periodically querying the second subset of edge nodes for updating the local security model on the central server.
. The method of, wherein the first and second fog nodes perform local security model updates using collected local security model updates from the edge nodes.
. The method of, wherein the first and second fog nodes perform aggregation of local security model updates using collected local security model updates from the edge nodes.
. The method of, wherein the first and second fog nodes perform aggregation of local security model updates using generated local security model updates on the at least a first fog node.
. A system for intrusion detection using federated learning, the system comprising:
Complete technical specification and implementation details from the patent document.
This application claims benefit of U.S. Provisional Application Ser. No. 63/438,722, filed on Jan. 12, 2023, the disclosure of which is incorporated herein by reference in its entirety.
This specification relates generally to methods, systems, and computer readable media for intrusion detection using federated learning.
An Internet of Things (IoT) device is physical device that is connected to the internet and can collect, transmit, and sometimes act on data. Such devices are typically embedded with sensors and other components that allow them to gather data about their surroundings, such as temperature, humidity, or motion. They also typically can communicate this data to other devices or systems over the internet, either through wired or wireless connections.
Some common examples of IoT devices include smart thermostats, smart appliances, wearable fitness trackers, and security cameras. These devices often have a specific function or purpose, such as controlling the temperature of a room or tracking physical activity. They can also be integrated with other systems or devices, such as smart home systems or mobile apps, to provide additional functionality or control.
IoT devices can be used in a variety of applications, including home automation, energy management, healthcare, and transportation. They have the potential to greatly improve efficiency and convenience by automating tasks and providing real-time data and control over a wide range of systems and devices. However, they also raise concerns about security, privacy, and the potential for misuse or abuse of the data they collect and transmit.
This document describes methods and systems for intrusion detection using federated learning. In some examples, a system includes a central server configured for generating an initial security model trained on training data characterizing a plurality of different computer security threats and distributing the initial security model to edge nodes. Each edge node comprises at least one processor configured for executing an intrusion detection system using the initial security model. The system includes a first subset of edge nodes each of which is configured for training a respective local security model using captured security data during live operation. The system includes a second subset of edge nodes each of which is not configured for training a local security model. The system includes fog nodes, each fog node being on a communications path between at least one edge node and the central server. At least a first fog node comprises a processor configured for training a respective local security model for one or more of the edge nodes from the second subset of edge nodes. The central server is configured for updating the intrusion detection systems of the edge nodes by: collecting the local security models from the first subset of edge nodes and the first fog node; selecting a first local security model from the collected local security models to replace the initial security model; and distributing the first local security model to the edge nodes, causing each of the edge nodes to execute the intrusion detection system using the first local security model.
The computer systems described in this specification may be implemented in hardware, software, firmware, or any combination thereof. In some examples, the computer systems may be implemented using a computer readable medium having stored thereon computer executable instructions that when executed by the processor of a computer control the computer to perform steps. Examples of suitable computer readable media include non-transitory computer readable media, such as disk memory devices, chip memory devices, programmable logic devices, and application specific integrated circuits. In addition, a computer readable medium that implements the subject matter described herein may be located on a single device or computing platform or may be distributed across multiple devices or computing platforms.
Internet-of-Things (IoT) devices have become increasingly popular for use in many industries. These devices collect information about their surroundings and can perform some function based on the collected data. IoT devices are used to make life easier along with automating industry actions so a human does not need to always be present to perform the action. It is estimated that by 2025, there will be about 55.7 billion IoT devices connected, generating upwards of 80 zettabytes (one zettabyte=one trillion gigabytes).
IoT devices are used in many industries, such as smart agriculture, smart healthcare, smart homes, Much like with computers and smart phones, IoT devices run on an operating system and use communication channels that are popular to use. For example, computers tend to use Windows, OS X, or Linux, while mobile phones tend to use iOS or Android. IoT devices, similar to personal computers, oftentimes use Windows or Linux, with Linux representing the majority of IoT operating systems. Although these devices are capable of running the operating systems that are on more powerful computers, the devices may not have the resources to adequately train deep learning models on-device.
As IoT devices generate large amounts of data, this data must also be processed. This processing tends to happen on a centralized server, where the data from the IoT devices are transmitted to the central server. This is a process that takes a lot of time and memory capacity. As the number of IoT devices in a system increases, the volume of data that is sent over various communication channels increases, requiring more robust communication methods. IoT devices in different locations performing different functions also increases the heterogeneity of the collected data. There have been IoT-based architectures proposed to handle the distributed nature of IoT devices on a network, such as the blockchain architecture. In addition to having to maintain these communication networks for IoT devices, people, businesses, and governments enjoy keeping their data private. Federated learning (FL) is used to address protecting data privacy and mitigating misuse of data. By keeping the sensitive data on the collection device and training a model on this data, the model can be distributed to other areas for use without needing to transmit any of the sensitive data.
There are some works that use FL to propose DL-based IDS incorporating IoT devices, although a large limitation with these works is that the architecture is based on a simulation, rather than using actual IoT devices. It is not uncommon for IoT devices to not have the processing power or memory capacity to train deep learning models of the needed size. We mitigate this limitation of edge IoT devices by utilizing fog nodes within the fog layer that will handle local model training in place of the edge devices that cannot train local models on-device. The fog layer will also collect available local models trained by edge devices, or by the fog node themselves for more reliable transfer to the cloud layer for further processing.
With many IoT devices, the bandwidth required to communicate with the between devices also needs to be large and sufficient. Compression methods such as quantization has been used to reduce communication overhead between edge devices and the central server for F, such as with the FedPaq algorithm. Quantization is a lossy compression method. We will use lossless compression for floating points as mentioned in a couple of works. Using compression, we can reduce the communication overhead significantly, decreasing the time and bandwidth to transfer the models between layers.
A large company has many locations. All locations access shared data through a centralized server. The devices connected within the organization network can be identified as edge devices or edge nodes. Together, the interconnected devices, including the central server, form a giant network of IoT. To safeguard the edge nodes from malicious cyberattacks, an intrusion detection system (IDS) is the most effective defense mechanism available.
A state-of-the-art IDS is known for its highly trained machine learning model. The model requires large amount of data and high computational resource to train and test. The model is most likely to be trained on the centralized server, which usually contains a large amount of data and has abundant computational resources. The data is collected from each location at varying time intervals. After setting a cutoff timestamp, all data collected prior to the timestamp is used to train the IDS model. The trained IDS model is then tested and deployed on the next queued security update to all edge nodes in all locations. The whole process can be viewed as a security update.
Unfortunately, the process from data collection to final model deployment and setup is not swift. Each step in the process may require a lengthy period. If new cyberattack data are collected after the cutoff timestamp, in order to create a new model that can detect the new cyberattacks, the newly collected data have to be merged with the existing dataset to train the new model. Because of the large time gaps that typically exist between security updates, the edge nodes within such an IoT network can be left vulnerable to new attacks.
To deploy a new IDS, a machine learning model typically must be written, trained, and tested many times (generally alpha and beta versions) before it is officially installed. This entire process can be time consuming-on average, a net operations engineer requires 33% more time identifying and troubleshooting network issues, 30% more time detecting vulnerabilities and threats to take remediation steps and rectify such issues, and 37% more time analyzing and exploring advancements in automation. To shorten the time between a system update and a new deployment, one strategy is the use of a responsive framework that consists of several machine learning models that are updated in real-time.
Federated learning is a type of machine learning in which multiple devices or systems work together to train a shared model, without the need to centralize the data used for training. In federated learning, each device or system has its own data, and the shared model is trained by sending model updates back and forth between the devices and a central server. The devices each use their own data to train a local version of the model, and then send the updates to the central server, which aggregates the multiple local models to create a single global, or shared, model.
One of the main benefits of federated learning is that it allows for the training of models on a large amount of data without the need to centralize or transfer that data. This can be particularly useful in situations where the data is sensitive or proprietary, or where it is impractical or infeasible to transfer the data due to its size or location. Federated learning can also be more efficient and faster than traditional machine learning approaches, as the model can be trained in parallel on multiple devices.
Using federated learning on a network with a large number of devices can incur high communication costs. In addition to using federated learning, the communication costs of transferring large amounts of local security model updates are lowered by compressing the local security model updates when transferring between the edge layer and fog layer, and also when transferring between the fog layer and the cloud layer.
is a block diagram of an example systemfor intrusion detection using federated learning. The arrangement inshows the topology of the federated learning-based intrusion detection system (FLIDS). The FLIDS contains three distinct layers. In, one sees a Cloud Layer, a Fog Layer-, and an Edge Layer-. Between each layer, there is efficient bidirectional communication.
The cloud layer includes a central server. The fog layer includes a number of fog nodes-, and the edge layer includes a number of edge nodes-. The fog nodes-can be located on communications paths between the edge nodes-and the central server. The central serveris configured for generating an initial security model trained on training data characterizing a plurality of different computer security threats and distributing the initial security model to the edge nodes-and, typically, the fog nodes-too.
The edge nodes-can be IoT devices. Some of the edge nodes-are configured for capturing data during live operation and for training a local security model using the captured data. Some other edge nodes are not configured for training a local model; for example, some edge nodes may lack computing resources for training a local security model.
The fog nodes-can be IoT devices. Fog nodes are typically configured solely as data communication intermediary. Some of the fog nodes-may be routers, switches, or wireless access points. At lease one of the fog nodes-is configured to train local security model updates for one or more of the edge nodes-that are not configured for training local security model updates. This can be useful where a fog node has more computational resources than the edge nodes. This fog node is in communication with one or more of the edge nodes-by either a wired or wireless data communication path.
The central serveris configured for updating the IDS model of the edge nodes-by:
is a block diagram of a modelfor intrusion detection using a generative adversarial network (GAN). It is comprised of two models,and, to produce hyper-realistic data samples that are closely similar to the real data samples, whichis already trained on classifying with high accuracy. Asproduces generated data samples,classifies the generated samples as normal or attack samples. The generated samples along with their corresponding classification decision is used to further train bothand. Whenis able to generate attack samples thatclassifies as “attack,” then the GAN is able to produce realistic attack samples to train the DL-based IDS models.
The central serverofcan use a GAN to generate the initial security model. The modeluses training dataincluding real data samples of computer security threats (e.g., malware files) and network security threats and a generatorfor generating adversarial examples and injecting the adversarial examples into the training data.
GANs are a type of machine learning model that can be used to train an intrusion detection system. GANs usually comprise two sub-models working in tandem, a generatorand a discriminator. The generatorgenerates synthetic data to trick the discriminatorinto classifying the synthetic data as real data, while the discriminatorattempts to distinguish the synthetic data from real data.
To use GANs to train an intrusion detection system, one can first collect a large dataset of both normal and malicious network traffic. The generator can then be trained to generate synthetic normal traffic, while the discriminator is trained to distinguish between the synthetic normal traffic and the real malicious traffic.
During training, the generator and discriminator are both optimized simultaneously. The generator is trained to generate synthetic data that is increasingly difficult for the discriminator to distinguish from real data, while the discriminator is trained to become better at distinguishing between the synthetic and real data.
Once the GAN has been trained, it can be used to generate synthetic normal traffic for use in testing the intrusion detection system. The intrusion detection system can then be trained on this synthetic normal traffic and real malicious traffic, allowing it to learn to identify malicious traffic in a more robust and efficient way.
shows the direction of flow for the data generated on the edge devices and also for the global model from the cloud server. The edge devices are either transferring locally trained Deep Learning modelsor batches of encrypted local training datato the fog layer. The fog layer is either receiving compressed local security model updates from edge devices capable of training DL modelsor training data from the edge devices unable to train DL models. Fog nodes connected to edge devices that cannot train local models will train models in place of the edge devices. The fog layer then transfers losslessly compressed models to the cloud layer. The cloud layer produces a global model, compresses the model, and distributes the model to the fog layer, which then distributes the compressed global model to the edge devices. The edge devices decompress the global model and executes the functions of the global model.
shows the direction of flow of the locally trained models and the global model. Edge devices that are capable of training models locally compress their respective model before transferring the compressed model to the fog layer. Edge devices that are incapable of training models locally and rely on fog nodes to train their model will instead have the fog nodes perform the lossless compression on the trained models, as in. When the cloud layer begins distributing current global model, the cloud server will use lossless compression methods to compress the global model before transferring to the fog layer, as in. The fog layer then distributes the compressed global model to their respective edge devices. The edge devices decompress the global model and use the model for intrusion detection.
is a block diagram of an example central server. The central serverincludes one or more processorsand memorystoring instructions for the processors. The central serverincludes a model generatorconfigured for generating an initial security model trained on training data characterizing computer security threats. For example, the model generatorcan use an adversarial example generatorand training data to train the initial security model.
The initial model, in some examples, comprises two parts. Firstly, the system uses GAN-based adversarial attacks against a black box IDS while still preserving the functional behavior of the network traffic. The training method is adversarial training, which injects adversarial examples into the training data. This helps the federated learning model to learn possible adversarial perturbations. The generator adds perturbations in an attempt to fool the discriminator while the discriminator learns to identify real or fake flows.
Secondly, the initial model can include another layer for anomaly detection. Using one type of adversarial training methods, the model becomes robust to only the adversarial samples it was trained against, making it only as effective as signature based IDS. (A signature-based IDS typically monitors inbound network traffic to find sequences and patterns that match a particular attack signature.) Using the training method of the present application, the system can include both (a) a generative federated learning-based model and (b) a linear machine learning model, both to be injected into the IDS pipeline. Therefore as disclosed herein, both known and unknown adversarial perturbations are identified and mitigated against.
The central serveralso includes a model distributorconfigured for distributing security models to edge nodes. For example, the model distributormay maintain a list of network addresses for devices that are subscribed to receiving security updates, and the model distributorcan transmit the initial model and updated models to devices on the list, e.g., over a data communications network such as the Internet.
The central serverincludes a local model collector and selectorconfigured for updating intrusion detection systems of edge nodes by collecting local security models from edge nodes and fog nodes; selecting a new local security model from the collected local security models to replace the initial security model; and distributing the new local security model to the edge nodes, causing each of the edge nodes to execute the intrusion detection system using the new local security model. For example, the local model collector and selectorcan be configured to periodically query the edge nodes and fog nodes for local models, or the edge nodes and fog nodes can provide the local models on a rolling basis. Selecting the new local security model can include selecting a “best” local model based on one or more of: detection accuracy, computation runtime, resource usage rate, and recall score.
andare block diagrams of example edge nodes.is a block diagram of an example edge nodethat trains a local model, andis a block diagram of an example edge nodethat does not train a local model, where a fog node instead performs the model training.
shows an edge nodehaving one or more processorsand memorystoring instructions for the processors. The edge nodeincludes at least one edge node functionwhich is configured for performing a task during live operation of the edge node, e.g., collecting, processing, and transmitting data. The edge nodealso includes an intrusion detection systemwhich is configured to detect and, in some cases, stop computer security attacks against the edge nodeusing a security model. The intrusion detection systeminitially uses a model provided by the central serverand later uses a local model trained on the edge node
The edge nodeincludes a data collectorand a local model generator. The data collectorcollects data during live operation of the edge node, for example, the data collectorcan collect data from one or more sensors or network traffic from a communications system that is a component of the edge node. The local model generatoris configured for generating a local security model using the initial security model and the collected data. The intrusion detection systemprovides live feedback to the local model on real data collected at the edge nodeduring live operation. The local model generatorcan be configured to use adaptive transfer learning. Transfer learning can include reusing the knowledge in source tasks to improve the learning of a target task, and adaptive transfer learning can include adapting a transfer learning process as it proceeds.
shows an edge nodethat lacks the local model generator. Generating the local model can, in some cases, require computing resources for generating the model under specified target conditions, e.g., a specified amount of time. In some cases, the edge nodelacks one or more of: sufficient processing resources, sufficient memory resources, or sufficient communications resources for training and transmitting a local security model within a specified timeframe. In some cases, the edge nodehas an inconsistent connection to a data communications network connecting to the central server.
Instead, the edge nodehas a fog node coordinator. The fog node coordinatorcommunicates with a fog node, e.g., by sending the data from the data collectorto the fog node, so that the fog node can generate a local model using the collected data. The fog node coordinatorcan, in some examples, be configured for finding a suitable fog node, e.g., by selecting a fog node from a list of network addresses of fog nodes, or by polling fog nodes to find one that is available or has a more reliable communications path. The fog node coordinatorcan provide the collected data in any appropriate manner, e.g., periodically, such as when a threshold amount of data has been received, or in response to a query from the fog node.
is a block diagram of an example fog node. The fog nodeincludes one or more processorsand memorystoring instructions for the processors. The fog nodeincludes at least one fog node functionconfigured for, e.g., routing or monitoring data communications traffic. The fog nodecan include an intrusion detection system.
The fog nodeincludes a data collectorand a local model generator. The data collectorand local model generatorare configured for receiving data from one or more edge nodes that do not generate local models, e.g., because they lack sufficient computing resources. The fog nodeincludes an edge node coordinatorconfigured for communicating with edge nodes that use the fog nodefor generating local security models.
In some examples, the edge node coordinatoris configured for finding edge nodes that are not building models and establishing communications with those edge nodes. For example, the edge node coordinatormay have a list of network addresses of edge nodes (e.g., transmitted by the central server) and can be configured to query each of the edge nodes to determine if the edge node is generating a local model or if the edge node lacks sufficient resources to generate a local model. The edge node can, e.g., transmit a message stating that it is not generating a local modal, or the edge node can transmit a message specifying one or more computing resources so that the edge node coordinatorcan determine whether or not the edge node can generate the local model or if the fog nodewill generate the local model.
shows the direction of flow for the data generated on the edge devices and also for the global model from the cloud server. The edge devices are either transferring locally trained Deep Learning modelsor batches of encrypted local training datato the fog layer. The fog layer is either receiving compressed local security model updates from edge devices capable of training DL modelsor training data from the edge devices unable to train DL models. Fog nodes connected to edge devices that cannot train local models will train models in place of the edge devices. The fog layer then aggregates the received local security update models from edge nodes that can train DL modelsor the fog layer aggregates local security model updates from training DL models after receiving training data from edge devices unable to train DL modelstransfers losslessly compressed models to the cloud layer. The cloud layer produces a global model, compresses the model, and distributes the model to the fog layer, which then distributes the compressed global model to the edge devices. The edge devices decompress the global model and executes the functions of the global model.
In some examples, the fog nodeis a dedicated device or system of devices for generating local models for edge nodes. The fog nodecan be a location based hub that performing model generation for edge nodes within a certain distance (physical distance or network distance). Since many edge devices lack computation power, a location-based hub that has relatively higher computation power than edge devices will act as an intermediary. The location-based fog hub will perform training to generate local security models for those devices that do not meet the necessary specifications.
For the resource-constrained edge devices, data can be encrypted and sent in batches before uploading to the fog hub. Several edge devices can be monitored by the fog hub, which can act as a controller. The controller can be configured with a secure mechanism to fend off backdoor and other standard attacks, and it can also be configured to monitor traffic real-time and apply adaptive learning and transfer learning under the overall scope of federated learning. This setup extends the decentralized concept to the distributed federated learning framework. Instead of a single point of failure, which used to be the central server, inserting fog layers will ensure the general operations proceed as normal, even if certain nodes are under siege from intruders.
In this manner, a robust adaptive transfer learning model can be placed on the fog layer. The fog layer can be responsible for converting and organizing incoming data based on the transmission protocols. As new live data arrives on the edge devices, e.g., periodically, a new batch of data is sent to the fog hub for adaptive transfer learning. The best model on the fog node(e.g., where the fog nodeserves multiple edge nodes and selects one as described above with respect to the central server) will be uploaded to the central serverfor further evaluation.
Although specific examples and features have been described above, these examples and features are not intended to limit the scope of the present disclosure, even where only a single example is described with respect to a particular feature. Examples of features provided in the disclosure are intended to be illustrative rather than restrictive unless stated otherwise. The above description is intended to cover such alternatives, modifications, and equivalents as would be apparent to a person skilled in the art having the benefit of this disclosure.
Unknown
April 14, 2026
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.