The invention discloses a cross-architecture automated detection method and system for third-party components and security risks, comprising: identify and reverse the firmware of the IoT device, classify the resulting reverse products into binary and non-binary files; disassemble binary files to mine the semantic information in them; convert non-binary files into string text files; build a database containing third-party components and their known CVE; combine pattern matching to scan string text files automatically, collect third-party components in the firmware of IoT device, and collect and retrieve vulnerabilities of corresponding third-party components. Through organically combining the semantic information of the vulnerability assembly code and the semantic information of the firmware assembly code of IoT device, the similarity comparison across architectures and deep learning is realized, and the specific pattern vulnerability is mined and verified automatically. The invention does not require the acquisition of firmware source code, the detection process is automated, greatly reducing the difficulty and workload of manual analysis.
Legal claims defining the scope of protection, as filed with the USPTO.
. A cross-architecture automated detection method for third-party components and security risks thereof, comprising the following steps:
. The cross-architecture automated detection method for third-party components and their security risks according tois characterized by the following, wherein the scanning match in the second sub-step of the second step is cross-line matching rule.
. The cross-architecture automated detection method for third-party components and their security risks according to, which is characterized by taking a subset of the complete firmware dataset of IoT device as a test object, comparing a number of rules and a scanning time ratio obtained under different numbers of lines, and selecting an optimal number of rows as a number of cross-line for final cross-line matching rules.
. The cross-architecture automated detection method for third-party components and their security risks according tois characterized by the following, wherein the third step comprises:
. The cross-architecture automated detection method for third-party components and their security risks according tois characterized by the following, wherein the third step further comprises:
. A cross-architecture automated detection system for third-party components and their security risks, comprising:
Complete technical specification and implementation details from the patent document.
This application claims priority of Chinese Application No. 2021114036901, filed Nov. 24, 2021, all of which is hereby incorporated by reference.
The present invention relates to the field of Internet of Things (IoT) protocol testing technology, in particular to a cross-architecture automated detection method and system for third-party components and security risks thereof.
At present, the number of IoT devices is growing rapidly. Gartner predicts that the number of IoT devices used worldwide will grow to 20.8 billion by 2020. Firmware is a special type of software that provides the underlying control of device hardware to a large number of IoT. From this point of view, the security of IoT devices depends to a large extent on the security of their firmware.
The firmware of IoT device is a type of software embedded in IoT hardware devices that performs functions similar to “operating system” in IoT devices. This means that once the firmware vulnerability of IoT device is exploited, an attacker can often take control of the entire IoT device. Hackers proactively exploit IoT security vulnerabilities not to attack the device itself, but as a springboard for a variety of malicious behaviors, paving the way for subsequent distributed denial-of-service attacks, malware distribution, spam email delivery, click fraud, credit card theft, and more.
There are two main types of firmware security vulnerabilities in IoT devices: one type is the firmware security vulnerabilities themselves, mainly including code defects and program errors in the development process; Insecure configuration information; Sensitive information and key information disclosure. The other type is integrated third-party software vulnerabilities, which mainly include the use of third-party software that is not maintained and third-party software with lagging versions, which lack security attention and vulnerability review, and there are security risks.
The Chinese patent literature with the publication number CN113515457A discloses a method for detecting the firmware security of IoT device, comprising: obtaining the firmware information of IoT device to be detected, the firmware information comprises the firmware ID, version number; finding the corresponding boot files, web components, and vulnerability information according to the firmware information. The simulation module simulates the web page according to the found startup file and web components, and obtains the web page to be detected. According to the found vulnerability information, the vulnerability verification module simulates the attack on the web page to be detected obtained by the simulation module according to the vulnerability number and exp code in the vulnerability information. The result information after the simulated attack is obtained, the existence of the vulnerability in the result information is verified, and if it is determined that there is a vulnerability, the firmware information, vulnerability information and result information are displayed.
At present, there are still closed source, customization and massive significant features of the firmware of IoT device, and there is a lack of a unified platform or tool that can be directly applied to the security analysis of the firmware of IoT device of different manufacturers and different devices. Firmware lacks an open and unified standard, it is inseparable from the features of its direct embedding in hardware, and different manufacturers of hardware structure is highly customized, even the firmware of different devices from the same manufacturer often varies greatly. For commercial and security reasons, manufacturers also tend not to disclose the structure and code of their respective firmware. This makes top-down code analysis techniques difficult to apply. Dynamic analysis methods are subject to the huge cost of physical equipment, and reverse analysis has become a common method of firmware analysis. In addition, different devices may adopt different hardware architectures, resulting in multiple firmware architectures, which brings great challenges to manually reverse firmware analysis. How to implement a common cross-architecture firmware security analysis is one of the key directions of research.
In view of the technical deficiencies of cross-architecture automated firmware security analysis of IoT device, the present invention provides a cross-architecture automated detection method and system for third-party components and their security risks, the method and system adopt reverse dismantling and analysis of firmware, without the need to obtain firmware source code. The inspection process is automated, which greatly reduces the difficulty and workload of manual analysis. By augmenting the similarity model training dataset, it can be easily extended to multiple schemas and is already applicable to many of the current mainstream architectures.
The technical implementations of the present invention are as follows:
A cross-architecture automated detection method for third-party components and their security risks, including the following steps:
Preferably, step (1) comprises:
Preferably, step (2) comprises:
Further preferably, the scan matching in step (2-2) is based on cross-row matching rules.
If the scan matching of step (2-2) takes row as the unit, if the third-party component and its version number are distributed across rows during this process, the third-party component's name and version number cannot be included at the same time, resulting in omissions. Cross-row matching rules, on the other hand, are statistically based collection strategies that collect multiple lines of strings to include both third-party component's name and version number to avoid omissions.
Further, take a subset of the complete firmware dataset of IoT device as the test object, compare the number of rules and the scanning time ratio obtained under different numbers of rows, and select the best number of rows as the number of cross-line matches for the final cross-line matching rule.
Preferably, steps (3) comprising:
Further, step (3) comprises:
The present invention further provides a cross-architecture automated detection system for third-party components and their security risks, comprising:
Compared with the prior art, the present invention has the following beneficial effects:
(1) The cross-architecture automated detection technology of third-party components and their security risks in the firmware of the IoT device of the present invention is based on reverse analysis and deep learning to conduct firmware security analysis, the specific process is: reverse breaking the firmware of the IoT device, stringifying and disassembling the reverse file, detecting the third-party components and their security risks based on pattern matching with combination of the vulnerability database, detecting the third-party components and their security risks based on similarity comparison with the combination of the vulnerability database. It can be seen that in the whole process, there is no need to obtain firmware source code, and there is no dependence on the actual IoT device. The scanning and detection process is fully automated, that makes up for the shortcomings of source-based static analysis technology and device-based dynamic analysis technology, which can be directly applied to the closed-source firmware security analysis of IoT device, and can efficiently detect and mine the third-party components and their security risks.
(2) The method and system of the invention can extend the detection capability to other architectures only by modifying part of the units therein, and can achieve cross-architecture firmware vulnerability mining. If the method is applied to the firmware of an IoT device using a new architecture, the source code of the third-party component needs to be compiled in the compilation chain environment of the corresponding architecture, and the intermediate binary products need to be disassembled and added to the training set of the similarity comparison model training unit; The match-based third-party component risk detection module is even more unaffected by the architecture.
(3) The present invention improves the word vector model of natural language processing technology and applies it to code similarity detection, and innovatively apply it to the firmware security analysis, the similarity reaches a certain threshold that is considered to have a corresponding security risk in the firmware. At the same time, the risk detection modules based on the matching third-party components corroborate each other, and the accuracy rate is achieved after inspection.
The present invention is further described in detail in combination with the attached drawings and the implementation, it should be noted that the following implementation is intended to facilitate understanding of the present invention, and does not have any qualifying effect on it.
The present invention provides a cross-architecture automated detection method for third-party components and their security risks in the firmware of IoT device, this method compares and analyzes the feature information of third-party components and the semantic information of code by compressing and decompiling firmware and combining with the self-built vulnerability information base, automatically collects and retrieves third-party components of firmware, collects, compares, and retrieves CVE (Common Vulnerabilities & Exposures) for corresponding third-party components. This includes the following steps:
Step 3: By organically combining the semantic information of the vulnerability assembly code and the semantic information encoded by the firmware assembly code, realizing the similarity comparison across architectures and based on deep learning, and mining and verifying the specific pattern vulnerabilities automatically. The step comprises the following steps:
The present invention implements the above steps by four modules, as shown in, comprising a firmware reverse extraction module of IoT device, a third-party component and a CVE information database module, a third-party component risk detection module based on matching, and a third-party component risk detection module based on similarity comparison.
(1) The role of the reverse extraction module of the firmware of IoT device is to identify the architecture, file system and compression method of the firmware of a given IoT device according to the compression features of the firmware of IoT device, and reverse the firmware according to the compression method.
Combined with the identified file system, the reverse files are classified and extracted, and the binary executable files are collected and classified according to the schema and type of the binary files; The non-binary files are converted to string text files.
The binary files obtained above are disassembled according to the schema (ARM, X86 or MIPS) of the binary and converted into assembly code of the corresponding schemas. The function is used as a unit to divide the assembly code into a more fine-grained structural unit, and the function units obtained by the partition are arranged in a logical order to complete the mining of the semantic information of the firmware of IoT device.
(2) The role of the third-party components and CVE information database module is to build the mapping relationship from the third-party components and their versions to the CVE information, and then to the functions and files where the vulnerability is located, based on the public CVE information database. The function of this module is to provide data support to the match-based third-party component risk detection module and the third-party component risk detection module based on similarity comparison.
(3) The role of the third-party component risk detection module based on matching is to retrieve the features of third-party components in the firmware of IoT device based on the string pattern matching strategy, and combine the third-party components and the CVE information database module to effectively detect and report the third-party components in the firmware of IoT device. The workflow of this module is shown in.
The coarse-grained detection unit scans the string files obtained by module 1 automatically, the first coarse-grained matching is performed based on the greedy matching principle with featuring the third-party component name and the third-party component used in the firmware is recorded. The rule extraction unit extracts the strings containing the third-party components and version numbers, and classifies them according to the combination pattern, summarizes all the strings with the same mode and obtains the regular expression, which is added to the rule list.
Based on the more complete list of matching rules described above, the fine-grained detection unit scans the string files obtained by module 1 automatically, uses the greedy matching principle to perform the second fine-grained matching with featuring the third-party component's name and version number, and records the third-party component used in the firmware and its version number; Combined with the CVE database established in step 2, the CVE of the third-party component in the firmware is retrieved.
(4) The role of the third-party component risk detection module based on similarity comparison is based on BLSTM neural network model in natural language processing, which is used to calculate and compare the semantic information of the firmware of IoT device and the vulnerability function, and combines the third-party components and the CVE information database module to achieve effective detection and risk reporting of the third-party components in the firmware of IoT device. The workflow of the module is shown in.
Vulnerability locators compile the source code of the corresponding versions of third-party components in the compilation chain environments of different architectures (X86, ARM and MIPS) and extract the binary files in which the vulnerability resides from the binary executables that are intermediates; Disassembles the binary file according to the schema and converts it to the assembly code of the corresponding schema. The functions are taken as units to divide the assembly code into more fine-grained structural units, and the function where the vulnerability is located is extracted.
The training unit of similarity comparison model uses the resulting assembly code as the training data for deep learning similarity comparison. Assembly codes derived from the same source code and compiled by different compilation chains are marked as similar, and assembly codes compiled from different source codes are marked as not similar. Operators and operands in assembly code are treated as words in natural language, assembly instructions are treated as sentences in natural language, and semantic comparisons across architecture instructions are treated as natural language translations. The above-mentioned assembly code training data is trained by the BLSTM neural network to obtain a similarity calculation model. The model takes two sets of assembly code sequences that allow cross-architecture as input, with a similarity result as output;
The calculation unit of the similarity comparison model compares the assembly code of the vulnerability function with the assembly code of each function of the firmware one by one on the basis of obtaining the functional assembly code of the firmware of the IoT device in step 1 and obtaining the functional assembly code of the vulnerability location unit, that is, the two are input into the similarity calculation model obtained by the training unit of the similarity comparison model to obtain the similarity. Finally, several of the most similar firmware functions are selected as potential security risks and a report is output.
Implementation
To further demonstrate the effect of the implementation of the present invention, the implementation is experimented with firmware from many different manufacturers, different devices and different architectures. We downloaded the firmware of IoT device from the official web sites of multiple manufacturers, recorded the manufacturer, device, version and architecture of these firmwares. In addition, we pre-imported relevant data to third-party components and the CVE information database module based on the CVE official website and the third-party component official website. We applied the present invention to several downloaded firmware, to achieve the detection of third-party components of the firmware of IoT device and their security risks, and the corresponding results were verified manually.
The firmware information and experimental data are shown in Table 1. The firmware covers four vendors, two different IoT devices, and two different architectures, demonstrating that the invention can be adapted to different vendors, devices, and different architectures. Experimental result showed that the present invention reported a total of 108 security risks associated with third party components in five firmware, after manual verification, 102 effective security risks associated with third-party components are identified. Compared with manual reverse analysis, manual reverse analysis takes a lot of time, and the time spent depends on the analyst's coding analysis experience and proficiency. The mining process of the invention and the experience of the analyst have nothing to do with it, take less time and are more efficient. Compared with dynamic analysis, dynamic analysis requires additional funds to purchase the corresponding equipment and cannot be applied to large-scale firmware scenarios.
Thus, according to the experiments, the present invention can detect security risks associated with third-party components in the firmware of the IoT device efficiently and automatically.
The above implementation provides a detailed description of the technical scheme and beneficial effects of the invention, it should be understood that the above is only a specific implementation of the invention, and it is not intended to limit the present invention, where any modifications, additions and equivalent substitutions etc. made within the scope of the principles of the present invention, should be included within the scope of the invention.
Unknown
March 17, 2026
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.