Automated Generation and Integration of an Optimized Regular Expression

PublishedJuly 15, 2025

Assigneenot available in USPTO data we have

InventorsVinu VARGHESE Nirav Jagdish Sampat Balaji Janarthanam Anil Kumar Shikhar Srivastava+2 more

Technical Abstract

Patent Claims

19 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A system comprising: a decoder is configured to: receive an input data corresponding to a programming language; obtain a plurality of embeddings using Embeddings from Language Models (ELMO) which facilitates computing word vectors on top of a two-layered bidirectional language model (biLM) which includes forward pass and backward pass; convert the input data into the plurality of embeddings, wherein each embedding corresponds to a vector based representation of an attribute corresponding to the input data; a processor including an artificial intelligence (AI) engine to generate an optimized regular expression (Regex) based on at least one of the type of the programming language and a corresponding library in the database, wherein the AI engine comprises a learning model associated with the AI engine for reinforced learning, wherein the learning model of the AI engine is automatically updated based on an information of the corresponding library in the database, the processor comprising: a Regex optimizer to: receive a generated regular expression corresponding to the plurality of embeddings, wherein the generated regular expression is at least one of an existing regular expression from a database and a newly generated regular expression from a Regex generator associated with the processor; parse the generated regular expression into a plurality of sub-blocks; classify, using a neural network based classifier associated with the AI engine, the plurality of sub-blocks to obtain a plurality of classified sub-blocks; evaluate, using the neural network based classifier, a quantifier class for each classified sub-block to identify a corresponding computationally expensive class, wherein the neural network based classifier includes an input layer, multiple hidden layers and an output layer; perform, using at least two reinforcement learning based deep neural network associated with the AI engine, an iterative analysis based on actor-critic algorithm to obtain a plurality of optimized sub-blocks associated with a minimum computation time, wherein one of the at least two reinforcement learning pertains to an actor and the another reinforcement learning pertains to a critic for executing the actor-critic algorithm, wherein the iterative analysis comprises replacing the quantifier class with a substitute class and performing iterative computation cycles based on an updated learning model for evaluating the corresponding computation time with respect to the substitute class by using a replace function corresponding to the programming language, wherein evaluating the corresponding computation time comprises: verifying if computation loops of the iterative computation cycles corresponding to the substitute class exceed a pre-defined threshold; and discarding the computation loops and identifying a new substitute class that requires minimum time or has minimum computation loops if the computation time for the substitute class exceeds the pre-defined threshold; and combine the plurality of optimized sub-blocks to obtain the optimized regular expression.

2. The system as claimed in claim 1, wherein the system comprises a simulator coupled to the processor, the simulator to: perform simulation to generate, based on the optimized regular expression, an output in the form of filtered results comprising an endorsed result corresponding to at least value of corresponding processing time, wherein the system facilitates automated integration of the endorsed result within an application module pertaining to the programming language.

3. The system as claimed in claim 2, wherein the simulation comprises generation of a plurality of test cases and execution of the test cases based on the optimized regular expression received from the Regex optimizer wherein the simulation includes key performance indicators (KPI) of the optimized regular expression including calculation of at least one of a performance and number of matches based on a file type of the input data.

4. The system as claimed in claim 3, wherein the simulator generates the plurality of test cases based on at least one of a file type, a file size, a type of the programming language, and an optimization type corresponding to the input data, and wherein the plurality of test cases are executed, using the AI engine, based on the optimized regular expression, to obtain the output, wherein the system facilitates evaluating the processing time for each execution based on which the endorsed result is selected.

5. The system as claimed in claim 1, wherein the system comprises the (Regex) generator to generate the newly generated regular expression using the AI engine.

6. The system as claimed in claim 1, wherein the Regex generator comprises a transformer architecture including a trained model utilizing a softmax function to generate the generated regular expression in the programming language.

7. The system as claimed in claim 1, wherein the system comprises a pre-execution recommender coupled to the decoder and the processor, the pre-execution recommender to: generate, using the plurality of embeddings, an automated recommendation pertaining to a requirement for generating the optimized regular expression, wherein the automated recommendation pertains to a recommendation of the existing regex in the database, wherein based on the recommended existing regex, the AI engine provides at least one of a negative recommendation to perform the automated generation of the optimized regular expression and a positive recommendation to utilize the existing regular expression.

8. The system as claimed in claim 7, wherein the automated recommendation is based on verification of one or more parameters of the plurality of embeddings with pre-stored parameters in the database.

9. The system as claimed in claim 1, wherein the input data corresponds to at least one of a plain language text, a file type, a file size, a type of the programming language, an existing regular expression, and an application content, wherein the application content includes a uniform resource locator, and wherein the plain language text is English language text.

10. The system as claimed in claim 1, wherein the decoder executes a pre-trained artificial intelligence model to obtain the plurality of embeddings.

11. The system as claimed in claim 1, wherein the Regex optimizer parses the generated regex by using a split function corresponding to the programming language, and wherein the Regex optimizer replaces the quantifier class with the substitute class by using a replace function corresponding to the programming language.

12. The system as claimed in claim 1, wherein the optimization is performed by an optimizer algorithm associated with the Regex optimizer, wherein the iterative analysis is performed using reinforcement learning embedded within the reinforcement learning based deep neural network, and wherein the reinforced learning comprises the learning model associated with the Al engine.

13. The system as claimed in claim 12, wherein information pertaining to the optimized regular expression is stored in the database such that based on the stored information, the learning model of the AI engine is automatically updated, and wherein based on the updated learning model, the Regex optimizer automatically prioritizes and selects a suitable library for each sub-block during the iterative computation cycles, such that number of required iterative computation cycles reduces by using the updated learning model.

14. The system as claimed in claim 1, wherein the corresponding computation time in the iterative analysis is evaluated to verify if computation loops of the iterative computation cycles corresponding to the substitute class exceeds the pre-defined threshold, wherein the computation loops are discarded upon exceeding the pre-defined threshold and the new substitute class is evaluated to identify the substitute class that requires minimum time for the corresponding computation loops to obtain the optimized sub-blocks including minimum computation loops, the computation loops are discarded based on penalization of the computation loops by a reinforcement logic associated with the reinforcement learning based deep neural network.

15. The system as claimed in claim 1, wherein the reinforcement learning based deep neural network is trained with multiple libraries corresponding to the programming language, and wherein the reinforcement learning based deep neural network comprises plurality of hidden layers including optimizer and loss function corresponding to a sparse categorical cross entropy function.

16. The system as claimed in claim 15, wherein the database is a knowledge database comprising the multiple libraries corresponding to the programming language, and wherein the database comprises optimization-based data and execution-based data corresponding to pre-stored regular expression.

17. A method for obtaining optimized regular expression, the method comprising: receiving, by a processor, an input data corresponding to a programming language; obtaining, by the processor, a plurality of embeddings using Embeddings from Language Models (ELMO) which facilitates computing word vectors on top of a two-layered bidirectional language model (biLM) which includes forward pass and backward pass; converting, by the processor, the input data into the plurality of embeddings, wherein each embedding corresponds to a vector based representation of an attribute corresponding to the input data; generating, by an Artificial Intelligence (AI) engine associated with the processor, an optimized regular expression (Regex) based on at least one of the type of the programming language and a corresponding library in the database, wherein the AI engine comprises a learning model associated with the AI engine for reinforced learning, wherein the learning model of the AI engine is automatically updated based on an information of the corresponding library in the database, wherein generating the optimized regular expression comprises: receiving, by the processor, the generated regular expression, from the Artificial Intelligence (AI) engine, corresponding to the plurality of embeddings, wherein the generated regular expression is at least one of an existing regular expression from a database and a newly generated regular expression; parsing, by the processor, the generated regular expression into a plurality of sub-blocks; classifying, by the processor associated with a neural network-based classifier, the plurality of sub-blocks to obtain a plurality of classified sub-blocks; evaluating, by the processor, a quantifier class for each classified sub-block to identify a corresponding computationally expensive class; performing, by the processor executing using at least a two reinforcement learning based deep neural network associated with the AI engine, an iterative analysis based on an actor-critic algorithm to obtain a plurality of optimized sub-blocks associated with a minimum computation time, wherein one of the at least two reinforcement learning pertains to an actor and the another reinforcement learning pertains to a critic for executing the actor-critic algorithm, wherein the iterative analysis comprises replacing the quantifier class with a substitute class and performing iterative computation cycles based on an updated learning model for evaluating the corresponding computation time with respect to the substitute class by using a replace function corresponding to the programming language, wherein evaluating the corresponding computation time comprises: verifying if computation loops of the iterative computation cycles corresponding to the substitute class exceed a pre-defined threshold; discarding the computation loops and identifying a new substitute class that requires minimum time or has minimum computation loops if the computation time for the substitute class exceeds the pre-defined threshold; and combining, by the processor, the plurality of optimized sub-blocks to obtain the optimized regular expression.

18. The method as claimed in claim 17, wherein the method comprises: generating, using the plurality of embeddings, an automated recommendation pertaining to a requirement for generating the optimized regular expression, wherein the automated recommendation pertains to a recommendation of the existing regex in the database, wherein based on the recommended existing regex, the AI engine provides at least one of a negative recommendation to perform the automated generation of the optimized regular expression and a positive recommendation to utilize the existing regular expression.

19. The method as claimed in claim 18, wherein the method comprises: performing simulation, by the processor, based on the optimized regular expression, to generate an output in the form of filtered results comprising an endorsed result corresponding to a least value of corresponding processing time; and integrating automatically, by the processor, the endorsed result within an application module pertaining to the programming language.

Patent Metadata

Filing Date

Unknown

Publication Date

July 15, 2025

Inventors

Vinu VARGHESE

Nirav Jagdish Sampat

Balaji Janarthanam

Anil Kumar

Shikhar Srivastava

Kunal Jaiwant Kharsadia

Saran Prasad

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search