One or more unit-test cases are generated from a monolingual code corpus and the generated unit-test cases are filtered to generate a corpus of unit-test cases which have acceptability scores exceeding one or more predefined thresholds. One or more of the code samples of the monolingual code corpus are translated from a source language to a target language using a pretrained Large Language Model and the generated unit-test cases are translated from the source language to the target language. The LLM-translated code samples are validated using the translated unit-test cases and a parallel-data training corpus comprising the LLM-translated code samples that pass the validation is created. The pretrained large language model (LLM) is fine-tuned using the parallel-data training corpus, a given code segment is translated using the fine-tuned large language model (LLM), the translated given code segment is tested and the tested given code segment is deployed.
Legal claims defining the scope of protection, as filed with the USPTO.
. A method comprising:
. The method of, further comprising:
. The method of, wherein the filtering of the generated unit-test cases further comprises retaining the generated unit-test cases that have toolkit-metrics exceeding the one or more predefined thresholds.
. The method of, further comprising performing code translation of the given segment of code in the source language to the target language using the Large Language Model.
. The method of, wherein the large language model is a multi-billion parameter machine learning model pretrained in an unsupervised manner.
. The method of, wherein the parallel-data training corpus comprises functionally-equivalent implementation of logic in multiple programming languages.
. The method of, wherein the generated unit-test cases are expected input-output pairs with assert statements.
. The method of, further comprising repeating the translating of the one or more of the code samples of the monolingual code corpus, the validating, and the creating operations for all code samples of the monolingual code corpus.
. The method of, further comprising retraining the large language model using the parallel-data training corpus.
. The method of, further comprising repeating the translating of the one or more of the code samples of the monolingual code corpus, the validating, and the creating operations for code samples of the monolingual code corpus that failed the validation operation.
. The method of, further comprising verifying that the code samples of the monolingual code corpus and the translated code are functionally equivalent using the translated unit-test cases.
. The method of, wherein the acceptability scores are one or more of a coverage score and a mutation score.
. The method of, further comprising running the deployed tested given code segment.
. A computer program product, comprising:
. An apparatus comprising:
. The apparatus of, the operations further comprising:
. The apparatus of, wherein the filtering of the generated unit-test cases further comprises retaining the generated unit-test cases that have toolkit-metrics exceeding the one or more predefined thresholds.
. The apparatus of, the operations further comprising performing code translation of the given segment of code in the source language to the target language using the large language model.
. The apparatus of, the operations further comprising retraining the large language model using the parallel-data training corpus.
. The apparatus of, the operations further comprising verifying that the code samples of the monolingual code corpus and the translated code are functionally equivalent using the translated unit-test cases.
Complete technical specification and implementation details from the patent document.
The present invention relates generally to the electrical, electronic and computer arts and, more particularly, to computer-aided software design, machine translation, and generative modeling.
Code translation is the task of translating source code from one language to another. It is primarily utilized in application modernization, where applications built in legacy languages, such as Common Business Oriented Language (COBOL) and Formula Translation (FORTRAN), need to be re-written in a modern language. Application modernization and code translation is a task-intensive process, whereby large teams of programmers work, potentially for years, to complete the modernization task. Machine learning (ML) systems that can learn from training data to translate code can greatly reduce the time and effort that a manual conversion of the applications would require. Training these machine learning systems for code translation, however, requires a large amount of parallel data; that is, samples of code implementing the same functionality in different languages. Availability of this data is quite limited for low-resource languages (such as COBOL and FORTRAN), thus limiting the availability of ML code translation systems for the task.
Artificial Intelligence for Code (AI4Code) aims to integrate recent advances in artificial intelligence (AI) to various sub-tasks in the programming domain and is an emerging field of focus for both the research and the business community. Code translation is one such avenue, where the aim is to translate code from one programming language to another. It is useful to modernize code bases written in legacy programming languages to a modern programming language, among others. Long periods of time and large amounts of money can be required to modernize an existing code base and, more recently, a government system implemented in the legacy language (COBOL) slowed the disbursement of government benefits. A large amount of code written in COBOL is in use today, and the U.S. Government Accountability Office recently urged multiple agencies to modernize their critical legacy technology.
Principles of the invention provide systems and techniques for large language models for creating a multi-lingual, low-resource code translation dataset. In one aspect, an exemplary method includes the operations of generating, using at least one hardware processor, one or more unit-test cases from a monolingual code corpus; filtering, using the at least one hardware processor, the generated unit-test cases to generate a corpus of unit-test cases which have acceptability scores exceeding one or more predefined thresholds; translating, using the at least one hardware processor, one or more of the code samples of the monolingual code corpus from a source language to a target language using a pretrained Large Language Model (LLM); translating, using the at least one hardware processor, the generated unit-test cases from the source language to the target language; validating, using the at least one hardware processor, the LLM-translated code samples using the translated unit-test cases; creating, using the at least one hardware processor, a parallel-data training corpus comprising the LLM-translated code samples that pass the validation; fine-tuning, using the at least one hardware processor, the pretrained large language model (LLM) using the parallel-data training corpus; translating, using the at least one hardware processor, a given code segment using the fine-tuned large language model (LLM); testing, using the at least one hardware processor, the translated given code segment; and facilitating, using the at least one hardware processor, deployment of the tested given code segment in the target language.
In one aspect, a computer program product comprises one or more tangible computer-readable storage media and program instructions stored on at least one of the one or more tangible computer-readable storage media, the program instructions executable by a processor, the program instructions comprising generating one or more unit-test cases from a monolingual code corpus; filtering the generated unit-test cases to generate a corpus of unit-test cases which have acceptability scores exceeding one or more predefined thresholds; translating one or more of the code samples of the monolingual code corpus from a source language to a target language using a pretrained Large Language Model (LLM); translating the generated unit-test cases from the source language to the target language; validating the LLM-translated code samples using the translated unit-test cases; creating a parallel-data training corpus comprising the LLM-translated code samples that pass the validation; fine-tuning the pretrained large language model (LLM) using the parallel-data training corpus; translating a given code segment using the fine-tuned large language model (LLM); testing the translated given code segment; and facilitating deployment of the tested given code segment in the target language.
In one aspect, a system comprises a memory and at least one processor, coupled to the memory, and operative to perform operations comprising generating one or more unit-test cases from a monolingual code corpus; filtering the generated unit-test cases to generate a corpus of unit-test cases which have acceptability scores exceeding one or more predefined thresholds; translating one or more of the code samples of the monolingual code corpus from a source language to a target language using a pretrained Large Language Model (LLM); translating the generated unit-test cases from the source language to the target language using a rules-based translator; validating the LLM-translated code samples using the translated unit-test cases; creating a parallel-data training corpus comprising the LLM-translated code samples that pass the validation; fine-tuning the pretrained large language model (LLM) using the parallel-data training corpus; translating a given code segment using the fine-tuned large language model (LLM); testing the translated given code segment; and facilitating deployment of the tested given code segment in the target language.
As used herein, “facilitating” an action includes performing the action, making the action easier, helping to carry the action out, or causing the action to be performed. Thus, by way of example and not limitation, instructions executing on a processor might facilitate an action carried out by instructions executing on a remote processor, by sending appropriate data or commands to cause or aid the action to be performed. Where an actor facilitates an action by other than performing the action, the action is nevertheless performed by some entity or combination of entities.
Techniques as disclosed herein can provide substantial beneficial technical effects. Some embodiments may not have these potential advantages and these potential advantages are not necessarily required of all embodiments. By way of example only and without limitation, one or more embodiments may provide one or more of:
These and other features and advantages will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.
It is to be appreciated that elements in the figures are illustrated for simplicity and clarity. Common but well-understood elements that may be useful or necessary in a commercially feasible embodiment may not be shown in order to facilitate a less hindered view of the illustrated embodiments.
Principles of inventions described herein will be in the context of illustrative embodiments. Moreover, it will become apparent to those skilled in the art given the teachings herein that numerous modifications can be made to the embodiments shown that are within the scope of the claims. That is, no limitations with respect to the embodiments shown and described herein are intended or should be inferred.
Generally, techniques, methods, and systems for utilizing and fine-tuning large language models for generating a multi-lingual, low-resource code translation training dataset and/or for performing code translation are disclosed. It is recognized that AI can help developers more quickly achieve, for example, an application modernization task, by translating code to a target language, and then relying on the developer to verify and fix the translated code (rather than translating the code from scratch). Training these AI models, however, requires parallel-data, that is, functional code in multiple programming languages, which is exceptionally rare. This lack of parallel-data is exaggerated for low-resource or legacy-languages, such as COBOL and FORTRAN. In one example embodiment, the generalization capabilities of large language models (LLMs; for example, multi-billion parameter machine learning models trained in an unsupervised manner on data from the Internet) are used along with automated unit-test case creators (generation tools) to create parallel-data for code translation, such as between multiple programming languages, including low-resource languages such as COBOL, FORTRAN, and the like. Generally, “large language model (LLM)” is used herein in its ordinary sense, namely, a language model including a neural network with many parameters (typically billions of weights or more), trained on large quantities of unlabeled text using self-supervised learning or semi-supervised learning. In this context, “large” is not a relative term but has a known meaning for the skilled artisan.
In one example embodiment, a system for generating a parallel corpora for code translation in multiple programming languages is created. The corpora contain functionally-equivalent implementation of logic in multiple programming languages. Conventional techniques perform training using multiple, monolingual corpora (training a code-translation model using unsupervised training and using only the monolingual corpus of code, or potentially multiple, monolingual corpora). The conventional techniques also perform training using parallel corpora by creating a parallel corpus from the code of projects that have implementations in multiple programming languages. (Functions that perform the same calculations or the same tasks are paired together to create the dataset.)
is a workflow for an example methodfor utilizing and fine-tuning an LLMto generate a parallel corpus for training a code translator and for performing code translation, in accordance with an example embodiment.is a flowchart for an example methodfor utilizing and fine-tuning an LLMcapable of generating a parallel corpus for training a code translator and for performing code translation, in accordance with an example embodiment. In one example embodiment, code samplesin various programming languages are collected (operation). For example, code samples in Java may be collected from conventional code databases, a codebase, and the like. A monolingual code corpusis extracted from the collected code samples (operation). The code samplesand the monolingual code corpusmay be obtained, for example, using a structured query language (SQL) query to identify the code samples from one or more available databases of code samples.
Unit-test casesare generated from the monolingual code corpuscreated in operation, utilizing known toolkits (operation). For example, unit-test casesmay be generated for the Java code samples of the monolingual code corpususing the known toolkits. The generated unit-test casesare filtered to cases which pass a functionality test and which have high toolkit-metrics, such as in regard to mutation scores and the like, to retain only the high-quality test cases (operation). In particular, each function is tested using a number of different tests and corresponding results. In one example embodiment, criteria of the high-quality test cases include coverage (such as how many of the statements in the program have been executed), mutation (such as whether the test will return a different output when a fault is created in a program), and the like.
A pretrained Large Language Model (LLM)is used to translate code samples of the monolingual code corpusfrom a source language, such as Java, to translated codein a target language, such as COBOL (operation). In one example embodiment, the Large Language Modelwas pretrained in an unsupervised manner and not for any particular task, such as it is trained to predict the next word given the previous words; the LLMmay or may not have been pretrained using code samples, or code samples in the source or target programming languages. Prompt engineering is used to specify the translation task to the LLM. (In prompt engineering, the input to the AI system includes a description of the task to be performed. Prompt engineering is conventionally based on a prompt-based dataset and a language model trained with prompt-based learning.) For example, the following prompt engineering may be utilized:
The generated unit-test casesin the source language, such as Java, are translated to the target language, such as COBOL (operation). The generated unit-test casesare expected input-output pairs with assert statements. It is noted that the translation of the unit-test casesto the translated unit-test casesis generally easier than the translation of the code samples of the monolingual code corpusdue to their simplicity. Thus, in example embodiments, a rule-based translatorhaving inference conditions is utilized to translate the unit-test cases. (The skilled person is familiar with rule-based translators, processes that translate text based on a set of defined rules.) In general, the translation of the unit-test casesusing the rule-based translatoris remarkedly more accurate than the LLM translation of the sample code of the monolingual code corpus, at least at this point in the process, since the LLMhas not yet been fine-tuned for the task of code translation. Thus, the LLM-translated codeis validated using the more reliable translated unit-test cases(operation). For example, the Java samples of the monolingual code corpustranslated to COBOL by the LLM(the translated code) are tested. Those which pass the corresponding translated unit-test casesare retained and a Java⇔COBOL data pair of code samples are created for the same (operation). The COBOL samples which fail their corresponding translated unit-test casesare excluded from the final parallel corpora at this time. In one example embodiment, the COBOL samples which fail all of their corresponding translated unit-test casesare excluded from the final parallel corpora at this time. In one example embodiment, the COBOL samples which fail any of their corresponding translated unit-test casesare excluded from the final parallel corpora at this time. In one or more embodiments, operations-are repeated for the entire monolingual code corpusto create a parallel corpus for code translation (a decision block and corresponding logic are omitted from the flowchart ofto avoid clutter).
In one example embodiment, the large language modelis then fine-tuned (retrained) using the created parallel corpus (operation). In one example embodiment, the code translation process is repeated on the code samples of the monolingual code corpusthat failed their corresponding translated unit-test cases, thereby increasing the number of parallel data-points in the parallel corpus (operation). In one or more embodiments, the performance of this step depends, for example, on the size of the LLMand the availability of compute resources. For example, for larger LLMsor black-boxed LLMs, it might not be possible to fine-tune the LLMif access is restricted by an application programming interface (API). (A black-box LLM means an LLM that cannot be fine-tuned. For example, a black-box LLM may be hosted in a manner that does not allow users to change the LLM; users can only use the LLM.) Given the teachings herein, it would be apparent to the skilled artisan that various combinations of operations of the methodcan be repeated multiple times as desired to further refine the LLMand/or further augment the parallel corpus. In one example embodiment, as part of operation, the translated unit-test casesare executed on both the original code samples of the monolingual code corpusand the translated codeto verify that the two pieces of code are functionally equivalent.
In one example embodiment, code translation from the source language of a given code segment of code to a given target language is performed using the LLMto, for example, perform application modernization (operation).
In experiments, approximately 130,000 Java samples were obtained from a publicly-accessible code repository, and a publicly available tool that automatically generates test cases with assertions for classes written in Java code was used to generate the unit-test casesfor the Java samples. A large language model was used to translate these samples to Python and COBOL. Using the described procedure, the following were created:
It is highlighted that, on a publicly-accessible code repository, there are only about 2,500 COBOL documents, but through the disclosed procedure, about 10 times more COBOL data was created. Thus, example embodiments provide a pipeline that generates parallel data for low-resource languages based on the use of large language modelsand automated test generation tools. In one example embodiment, programming language translation is performed by an engineering prompt that is used by a large language model. In one example embodiment, the translation results are verified based on automated test generation for the source language and a rule-based conversion to the target language. It is noted that a model pretrained specifically for code translation would not work well for a low-resource language due to the large training data requirements of such models; the disclosed LLM technique enables support for a low-resource language as it does not require a large amount of training data in the given language.
Given the discussion thus far, it will be appreciated that, in general terms, an exemplary method, according to an aspect of the invention, includes the operations of generating, using at least one hardware processor, one or more unit-test casesfrom a monolingual code corpus(operation); filtering, using the at least one hardware processor, the generated unit-test casesto generate a corpus of unit-test cases which have acceptability scores exceeding one or more predefined thresholds (operation); translating, using the at least one hardware processor, one or more of the code samples of the monolingual code corpusfrom a source language to a target language using a pretrained Large Language Model (LLM)(operation); translating, using the at least one hardware processor, the generated unit-test casesfrom the source language to the target language (operation); validating, using the at least one hardware processor, the LLM-translated code samplesusing the translated unit-test cases(operation); creating, using the at least one hardware processor, a parallel-data training corpus comprising the LLM-translated code samplesthat pass the validation (operation); fine-tuning, using the at least one hardware processor, the pretrained large language model (LLM) using the parallel-data training corpus; translating, using the at least one hardware processor, a given code segment using the fine-tuned large language model (LLM); testing, using the at least one hardware processor, the translated given code segment; and facilitating, using the at least one hardware processor, deployment of the tested given code segment in the target language. In one example embodiment, the tested given code segment is deployed as a replacement for a legacy application.
In one example embodiment, one or more code samplesin one or more programming languages are collected (operation); and the monolingual code corpusis extracted from the collection of code samples(operation).
In one example embodiment, the filtering of the generated unit-test casesfurther comprises retaining the generated unit-test casesthat have toolkit-metrics exceeding one or more predefined thresholds. The skilled artisan can use heuristics to define the appropriate predefined thresholds depending on the domain of interest. For example, a predefined threshold can be a mutation score of greater than 90% and/or a coverage score of greater than 80%.
In one example embodiment, code translation of a given segment of code from the source language to the target language is performed using the Large Language Model(operation).
In one example embodiment, the Large Language Modelis a multi-billion parameter machine learning model pretrained in an unsupervised manner.
In one example embodiment, the parallel-data training corpus comprises functionally-equivalent implementation of logic in multiple programming languages.
In one example embodiment, the generated unit-test casesare expected input-output pairs with assert statements.
In one example embodiment, the translating of the one or more of the code samples of the monolingual code corpus, the validating, and the creating operations are repeated for all code samples of the monolingual code corpus.
In one example embodiment, the large language modelis retrained using the parallel-data training corpus (operation).
In one example embodiment, the translating of the one or more of the code samples of the monolingual code corpus, the validating, and the creating operations are repeated for code samples of the monolingual code corpusthat failed the validation operation (operation).
In one example embodiment, the code samples of the monolingual code corpusand the translated codeare verified as being functionally equivalent using the translated unit-test cases(operation).
In one example embodiment, the acceptability scores are one or more of a coverage score and a mutation score.
In one example embodiment, a deployed legacy application is run.
In one aspect, a computer program product comprises one or more tangible computer-readable storage media and program instructions stored on at least one of the one or more tangible computer-readable storage media, the program instructions executable by a processor, the program instructions comprising generating one or more unit-test casesfrom a monolingual code corpus(operation); filtering the generated unit-test casesto generate a corpus of unit-test cases which have acceptability scores exceeding one or more predefined thresholds (operation); translating one or more of the code samples of the monolingual code corpusfrom a source language to a target language using a pretrained Large Language Model (LLM)(operation); translating the generated unit-test casesfrom the source language to the target language (operation); validating the LLM-translated code samplesusing the translated unit-test cases(operation); creating a parallel-data training corpus comprising the LLM-translated code samplesthat pass the validation (operation); fine-tuning the pretrained large language model (LLM) using the parallel-data training corpus; translating a given code segment using the fine-tuned large language model (LLM); testing the translated given code segment; and facilitating deployment of the tested given code segment in the target language.
In one aspect, an apparatus comprises a memory and at least one processor, coupled to the memory, and operative to perform operations comprising generating one or more unit-test casesfrom a monolingual code corpus(operation); filtering the generated unit-test casesto generate a corpus of unit-test cases which have acceptability scores exceeding one or more predefined thresholds (operation); translating one or more of the code samples of the monolingual code corpusfrom a source language to a target language using a pretrained Large Language Model (LLM)(operation); translating the generated unit-test casesfrom the source language to the target language using a rules-based translator(operation); validating the LLM-translated code samplesusing the translated unit-test cases(operation); creating a parallel-data training corpus comprising the LLM-translated code samplesthat pass the validation (operation); fine-tuning the pretrained large language model (LLM) using the parallel-data training corpus; translating a given code segment using the fine-tuned large language model (LLM); testing the translated given code segment; and facilitating deployment of the tested given code segment in the target language.
Various aspects of the present disclosure are described by narrative text, flowcharts, block diagrams of computer systems and/or block diagrams of the machine logic included in computer program product (CPP) embodiments. With respect to any flowcharts, depending upon the technology involved, the operations can be performed in a different order than what is shown in a given flowchart. For example, again depending upon the technology involved, two operations shown in successive flowchart blocks may be performed in reverse order, as a single integrated step, concurrently, or in a manner at least partially overlapping in time.
A computer program product embodiment (“CPP embodiment” or “CPP”) is a term used in the present disclosure to describe any set of one, or more, storage media (also called “mediums”) collectively included in a set of one, or more, storage devices that collectively include machine readable code corresponding to instructions and/or data for performing computer operations specified in a given CPP claim. A “storage device” is any tangible device that can retain and store instructions for use by a computer processor. Without limitation, the computer readable storage medium may be an electronic storage medium, a magnetic storage medium, an optical storage medium, an electromagnetic storage medium, a semiconductor storage medium, a mechanical storage medium, or any suitable combination of the foregoing. Some known types of storage devices that include these mediums include: diskette, hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or Flash memory), static random access memory (SRAM), compact disc read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanically encoded device (such as punch cards or pits/lands formed in a major surface of a disc) or any suitable combination of the foregoing. A computer readable storage medium, as that term is used in the present disclosure, is not to be construed as storage in the form of transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide, light pulses passing through a fiber optic cable, electrical signals communicated through a wire, and/or other transmission media. As will be understood by those of skill in the art, data is typically moved at some occasional points in time during normal operations of a storage device, such as during access, de-fragmentation or garbage collection, but this does not render the storage device as transitory because the data is not transitory while it is stored.
Computing environmentcontains an example of an environment for the execution of at least some of the computer code involved in performing the inventive methods, such as code translation system. In addition to block, computing environmentincludes, for example, computer, wide area network (WAN), end user device (EUD), remote server, public cloud, and private cloud. In this embodiment, computerincludes processor set(including processing circuitryand cache), communication fabric, volatile memory, persistent storage(including operating systemand block, as identified above), peripheral device set(including user interface (UI) device set, storage, and Internet of Things (IoT) sensor set), and network module. Remote serverincludes remote database. Public cloudincludes gateway, cloud orchestration module, host physical machine set, virtual machine set, and container set.
COMPUTERmay take the form of a desktop computer, laptop computer, tablet computer, smart phone, smart watch or other wearable computer, mainframe computer, quantum computer or any other form of computer or mobile device now known or to be developed in the future that is capable of running a program, accessing a network or querying a database, such as remote database. As is well understood in the art of computer technology, and depending upon the technology, performance of a computer-implemented method may be distributed among multiple computers and/or between multiple locations. On the other hand, in this presentation of computing environment, detailed discussion is focused on a single computer, specifically computer, to keep the presentation as simple as possible. Computermay be located in a cloud, even though it is not shown in a cloud in. On the other hand, computeris not required to be in a cloud except to any extent as may be affirmatively indicated.
PROCESSOR SETincludes one, or more, computer processors of any type now known or to be developed in the future. Processing circuitrymay be distributed over multiple packages, for example, multiple, coordinated integrated circuit chips. Processing circuitrymay implement multiple processor threads and/or multiple processor cores. Cacheis memory that is located in the processor chip package(s) and is typically used for data or code that should be available for rapid access by the threads or cores running on processor set. Cache memories are typically organized into multiple levels depending upon relative proximity to the processing circuitry. Alternatively, some, or all, of the cache for the processor set may be located “off chip.” In some computing environments, processor setmay be designed for working with qubits and performing quantum computing.
Computer readable program instructions are typically loaded onto computerto cause a series of operational steps to be performed by processor setof computerand thereby effect a computer-implemented method, such that the instructions thus executed will instantiate the methods specified in flowcharts and/or narrative descriptions of computer-implemented methods included in this document (collectively referred to as “the inventive methods”). These computer readable program instructions are stored in various types of computer readable storage media, such as cacheand the other storage media discussed below. The program instructions, and associated data, are accessed by processor setto control and direct performance of the inventive methods. In computing environment, at least some of the instructions for performing the inventive methods may be stored in blockin persistent storage.
COMMUNICATION FABRICis the signal conduction path that allows the various components of computerto communicate with each other. Typically, this fabric is made of switches and electrically conductive paths, such as the switches and electrically conductive paths that make up busses, bridges, physical input/output ports and the like. Other types of signal communication paths may be used, such as fiber optic communication paths and/or wireless communication paths.
VOLATILE MEMORYis any type of volatile memory now known or to be developed in the future. Examples include dynamic type random access memory (RAM) or static type RAM. Typically, volatile memoryis characterized by random access, but this is not required unless affirmatively indicated. In computer, the volatile memoryis located in a single package and is internal to computer, but, alternatively or additionally, the volatile memory may be distributed over multiple packages and/or located externally with respect to computer.
PERSISTENT STORAGEis any form of non-volatile storage for computers that is now known or to be developed in the future. The non-volatility of this storage means that the stored data is maintained regardless of whether power is being supplied to computerand/or directly to persistent storage. Persistent storagemay be a read only memory (ROM), but typically at least a portion of the persistent storage allows writing of data, deletion of data and re-writing of data. Some familiar forms of persistent storage include magnetic disks and solid state storage devices. Operating systemmay take several forms, such as various known proprietary operating systems or open source Portable Operating System Interface-type operating systems that employ a kernel. The code included in blocktypically includes at least some of the computer code involved in performing the inventive methods.
PERIPHERAL DEVICE SETincludes the set of peripheral devices of computer. Data communication connections between the peripheral devices and the other components of computermay be implemented in various ways, such as Bluetooth connections, Near-Field Communication (NFC) connections, connections made by cables (such as universal serial bus (USB) type cables), insertion-type connections (for example, secure digital (SD) card), connections made through local area communication networks and even connections made through wide area networks such as the internet. In various embodiments, UI device setmay include components such as a display screen, speaker, microphone, wearable devices (such as goggles and smart watches), keyboard, mouse, printer, touchpad, game controllers, and haptic devices. Storageis external storage, such as an external hard drive, or insertable storage, such as an SD card. Storagemay be persistent and/or volatile. In some embodiments, storagemay take the form of a quantum computing storage device for storing data in the form of qubits. In embodiments where computeris required to have a large amount of storage (for example, where computerlocally stores and manages a large database) then this storage may be provided by peripheral storage devices designed for storing very large amounts of data, such as a storage area network (SAN) that is shared by multiple, geographically distributed computers. IoT sensor setis made up of sensors that can be used in Internet of Things applications. For example, one sensor may be a thermometer and another sensor may be a motion detector.
NETWORK MODULEis the collection of computer software, hardware, and firmware that allows computerto communicate with other computers through WAN. Network modulemay include hardware, such as modems or Wi-Fi signal transceivers, software for packetizing and/or de-packetizing data for communication network transmission, and/or web browser software for communicating data over the internet. In some embodiments, network control functions and network forwarding functions of network moduleare performed on the same physical hardware device. In other embodiments (for example, embodiments that utilize software-defined networking (SDN)), the control functions and the forwarding functions of network moduleare performed on physically separate devices, such that the control functions manage several different network hardware devices. Computer readable program instructions for performing the inventive methods can typically be downloaded to computerfrom an external computer or external storage device through a network adapter card or network interface included in network module.
WANis any wide area network (for example, the internet) capable of communicating computer data over non-local distances by any technology for communicating computer data, now known or to be developed in the future. In some embodiments, the WANmay be replaced and/or supplemented by local area networks (LANs) designed to communicate data between devices located in a local area, such as a Wi-Fi network. The WAN and/or LANs typically include computer hardware such as copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and edge servers.
END USER DEVICE (EUD)is any computer system that is used and controlled by an end user (for example, a customer of an enterprise that operates computer), and may take any of the forms discussed above in connection with computer. EUDtypically receives helpful and useful data from the operations of computer. For example, in a hypothetical case where computeris designed to provide a recommendation to an end user, this recommendation would typically be communicated from network moduleof computerthrough WANto EUD. In this way, EUDcan display, or otherwise present, the recommendation to an end user. In some embodiments, EUDmay be a client device, such as thin client, heavy client, mainframe computer, desktop computer and so on.
Unknown
March 17, 2026
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.