A platform for automated design of gene-editing experiments includes one or more processing units and a non-transitory computer-readable storage device. The storage device contains instructions that, when executed, configure the processing units to perform a method. The method includes receiving a meta request with information about a requested gene-editing experiment, configuring an ordered list of tasks via a reasoning framework, and implementing tasks via a Task Executor module utilizing state machines. The Task Executor connects to external APIs, provides instructions to a User-Proxy Agent module, and receives user input. The User-Proxy Agent forms prompts based on current state instructions, user requests, interaction history, and API results to determine appropriate actions. The platform outputs recommendations responsive to the meta request.
Legal claims defining the scope of protection, as filed with the USPTO.
. A platform for automated design of gene-editing experiments, comprising:
. The platform of, wherein the reasoning framework comprises a large language model configured to decompose the meta request into the ordered list of tasks.
. The platform of, wherein the large language model is trained using a dataset comprising curated question-and-answer pairs derived from gene-editing discussions.
. The platform of, wherein the large language model is fine-tuned using a technique selected from the group consisting of full parameter fine-tuning and quantized low-rank adaptation (QLoRA) fine-tuning.
. A method for selecting gene editing delivery methods, comprising:
. The method of, further comprising categorizing the user inputs into one of a plurality of predefined biological categories, wherein the literature search is performed based on the categorized biological category.
. The method of, wherein the plurality of predefined biological categories comprises: mammalian in vivo, mammalian embryos, mammalian primary cells or stem cells ex vivo, mammalian cell lines with strong evidence of high-efficiency transfection, mammalian cell lines or organoids without strong evidence of high-efficiency transfection, human in vivo or human embryos, and bacteria, viruses, and other organisms.
. The method of, wherein ranking the candidate delivery methods comprises:
. A method for training a gene editing model, comprising:
. The method of, wherein preprocessing the dataset comprises:
. The method of, wherein fine-tuning the pre-trained language model comprises using a technique selected from the group consisting of full parameter fine-tuning and quantized low-rank adaptation (QLoRA) fine-tuning.
. The method of, further comprising:
. A method for gene editing inference, comprising:
. The method of, wherein retrieving relevant information from the curated knowledge base comprises:
. The method of, wherein synthesizing the answer comprises:
. A method for designing guide RNA for gene editing, comprising:
. The method of, wherein applying the chain-of-table methodology comprises:
. A system for automated design of gene-editing experiments, comprising:
. The system of, wherein the reasoning framework comprises a large language model trained on a dataset of curated question-answer pairs derived from gene-editing discussions.
. The system of, wherein the large language model is fine-tuned using a technique selected from the group consisting of full parameter fine-tuning and quantized low-rank adaptation (QLoRA) fine-tuning.
Complete technical specification and implementation details from the patent document.
This application claims priority to U.S. Provisional Patent Application No. 63/571,707, filed Mar. 29, 2024, which is hereby incorporated by reference in its entirety.
The present disclosure relates to artificial intelligence systems for biological research, and more particularly to a large language model-based agent system for automating the design of CRISPR gene-editing experiments.
Genome engineering technology has transformed biomedical research by enabling precise modifications to genetic information. This field encompasses various techniques for altering DNA sequences within living organisms, with applications ranging from basic scientific research to potential therapeutic interventions. Among these techniques, CRISPR-Cas systems have emerged as a widely adopted tool due to their efficiency and versatility.
CRISPR, which stands for Clustered Regularly Interspaced Short Palindromic Repeats, was originally discovered as part of bacterial immune systems. Researchers later adapted this natural mechanism into a programmable gene-editing tool. The CRISPR-Cas system typically consists of a guide RNA (gRNA) that directs a Cas nuclease to a specific DNA sequence, where it can make targeted modifications.
As the field of genome engineering has advanced, researchers have developed various CRISPR-based techniques beyond simple gene knockout. These include methods for activating or repressing gene expression (CRISPRa/i), introducing precise base changes without double-strand breaks (base editing), and making small insertions or deletions (prime editing). Each of these approaches has its own set of considerations and design parameters.
Designing effective gene-editing experiments requires a deep understanding of both the CRISPR technology and the biological system under investigation. Researchers must consider factors such as the choice of CRISPR system, guide RNA design, delivery method, and potential off-target effects. Additionally, validating the results of gene-editing experiments often involves complex molecular biology techniques and data analysis.
The complexity of gene-editing experimental design can present challenges, particularly for researchers who are new to the field or working with unfamiliar biological systems. There is a general interest in tools and resources that can assist in streamlining the experimental design process, potentially reducing the time and resources required to plan and execute gene-editing studies.
Artificial intelligence and machine learning approaches have shown promise in various areas of biological research. These computational methods can process large amounts of data and potentially identify patterns or make predictions that may not be immediately apparent to human researchers. There is ongoing exploration of how such approaches might be applied to enhance the design and execution of gene-editing experiments.
As the field of genome engineering continues to evolve, there is a general focus on improving the efficiency, specificity, and accessibility of gene-editing techniques. This includes efforts to refine existing tools, develop new methodologies, and create resources that can support researchers in designing and implementing gene-editing experiments across a wide range of applications.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
According to an aspect of the present disclosure, a platform for automated design of gene-editing experiments is provided. The platform includes one or more processing units and a non-transitory computer-readable storage device operably coupled to the one or more processing units. The non-transitory computer-readable storage device contains instructions that, when executed, configure the one or more processing units to, collectively, perform a method. The method includes receiving a meta request, the meta request including information about a requested gene-editing experiment. The method further includes configuring, via a reasoning framework, an ordered list of tasks required to achieve the requested gene-editing experiment based on the information. The reasoning framework is configured to sequentially send each task in the ordered list of tasks, and optionally a previous result, to a Task Executor module and receive a result from the Task Executor Module responsive to the task. The method also includes implementing, via the Task Executor module, a task received from the reasoning framework. The Task Executor utilizes state machines to decompose sub-goals and is configured to connect to one or more external application programming interfaces (APIs) by sending an API call to a Tool Provider module and receiving a result, provide instructions to a User-Proxy Agent module and receive user input responsive to the instructions from the User-Proxy Agent, and send feedback to the User-Proxy Agent based on the task and/or user input. The method further includes forming a prompt, via the User-Proxy Agent module, based on an instruction inherent to a current state from the Task Executor module, a request made by the user, a history of past interactions within the current task session, results from external APIs, or a combination thereof, then using the prompt with the User-Proxy Agent module to determine an appropriate next action. The current state encapsulates a description of a current task and any input required from a user. The platform is configured to output one or more recommendations responsive to the meta request.
According to other aspects of the present disclosure, the platform may include one or more of the following features. The reasoning framework may comprise a large language model configured to decompose the meta request into the ordered list of tasks. The large language model may be trained using a dataset comprising curated question-and-answer pairs derived from gene-editing discussions. The large language model may be fine-tuned using a technique selected from the group consisting of full parameter fine-tuning and quantized low-rank adaptation (QLoRA) fine-tuning.
According to another aspect of the present disclosure, a method for selecting gene editing delivery methods is provided. The method includes extracting parameters from user inputs related to gene editing, performing a literature search based on the extracted parameters, ranking candidate delivery methods using citations from the literature search results, and outputting a ranked list of candidate delivery methods.
According to other aspects of the present disclosure, the method for selecting gene editing delivery methods may include one or more of the following features. The method may further comprise categorizing the user inputs into one of a plurality of predefined biological categories, wherein the literature search is performed based on the categorized biological category. The plurality of predefined biological categories may comprise: mammalian in vivo, mammalian embryos, mammalian primary cells or stem cells ex vivo, mammalian cell lines with strong evidence of high-efficiency transfection, mammalian cell lines or organoids without strong evidence of high-efficiency transfection, human in vivo or human embryos, and bacteria, viruses, and other organisms. Ranking the candidate delivery methods may comprise retrieving a predefined set of delivery methods associated with the categorized biological category, calculating a score for each delivery method based on the number of citations from the literature search results, and ordering the delivery methods based on the calculated scores.
According to another aspect of the present disclosure, a method for training a gene editing model is provided. The method includes obtaining a dataset of gene editing discussions from a public forum, preprocessing the dataset to extract question-answer pairs, fine-tuning a pre-trained language model using the extracted question-answer pairs, and storing the fine-tuned model for subsequent use in gene editing tasks.
According to other aspects of the present disclosure, the method for training a gene editing model may include one or more of the following features. Preprocessing the dataset may comprise anonymizing personal information in the discussions, extracting question-answer pairs from individual discussion threads, and filtering the extracted pairs to remove irrelevant or low-quality content. Fine-tuning the pre-trained language model may comprise using a technique selected from the group consisting of full parameter fine-tuning and quantized low-rank adaptation (QLoRA) fine-tuning. The method may further comprise evaluating the fine-tuned model using a test set of gene editing questions and comparing the performance of the fine-tuned model to the pre-trained model on the test set.
According to another aspect of the present disclosure, a method for gene editing inference is provided. The method includes receiving a gene editing query, processing the query using a model trained with fine-tuning on gene editing discussions, retrieving relevant information from a curated knowledge base of gene editing literature, synthesizing an answer based on the processed query and retrieved information, and outputting the synthesized answer.
According to other aspects of the present disclosure, the method for gene editing inference may include one or more of the following features. Retrieving relevant information from the curated knowledge base may comprise embedding the gene editing query and documents in the knowledge base into semantic vectors, performing a similarity search to identify the most relevant documents based on cosine similarity between the query vector and document vectors, and summarizing the identified relevant documents in relation to the gene editing query. Synthesizing the answer may comprise combining information from the processed query, the retrieved relevant information, and a response generated by a fine-tuned large language model trained on gene editing discussions, and generating a concise answer that addresses the specific aspects of the gene editing query.
According to another aspect of the present disclosure, a method for designing guide RNA for gene editing is provided. The method includes receiving a user request for guide RNA design, extracting relevant parameters from the user request, accessing a pre-designed guide RNA table, applying a chain-of-table methodology to process the pre-designed guide RNA table based on the extracted parameters, selecting guide RNA sequences from the processed table, and outputting the selected guide RNA sequences.
According to other aspects of the present disclosure, the method for designing guide RNA for gene editing may include one or more of the following features. Applying the chain-of-table methodology may comprise selecting rows from the pre-designed guide RNA table where specified columns match given values, ordering the selected rows based on values in a specified column, and returning a top number of rows from the ordered selection.
According to another aspect of the present disclosure, a system for automated design of gene-editing experiments is provided. The system includes a User-Proxy Agent module configured to interact with a user and process user inputs, a reasoning framework configured to decompose a gene editing request into an ordered list of tasks, a Task Executor module configured to implement tasks using state machines, a Tool Provider module configured to connect to external APIs, and a non-transitory computer-readable storage device containing instructions that, when executed, cause the system to perform a method. The method includes receiving a gene editing request from the user via the User-Proxy Agent module, decomposing the request into tasks using the reasoning framework, sequentially executing the tasks using the Task Executor module, and outputting gene editing experiment design recommendations to the user via the User-Proxy Agent module.
According to other aspects of the present disclosure, the system for automated design of gene-editing experiments may include one or more of the following features. The reasoning framework may comprise a large language model trained on a dataset of curated question-answer pairs derived from gene-editing discussions. The large language model may be fine-tuned using a technique selected from the group consisting of full parameter fine-tuning and quantized low-rank adaptation (QLoRA) fine-tuning.
The foregoing general description of the illustrative embodiments and the following detailed description thereof are merely exemplary aspects of the teachings of this disclosure and are not restrictive.
The following description sets forth exemplary aspects of the present disclosure. It should be recognized, however, that such description is not intended as a limitation on the scope of the present disclosure. Rather, the description also encompasses combinations and modifications to those exemplary aspects described herein.
The present disclosure relates to a large language model agent for automated design of gene-editing experiments. The system may provide assistance to researchers in planning and executing complex gene-editing tasks. In some cases, the system may utilize a combination of natural language processing, domain-specific knowledge, and external tools to guide users through various aspects of gene-editing experimental design.
The system may operate in three distinct modes to accommodate different user needs and experimental scenarios. In some cases, a Meta Mode may provide predefined workflows for common gene-editing tasks. An Auto Mode may offer more flexibility by generating customized task lists based on user inputs. A QA Mode may allow users to ask specific questions and receive targeted information throughout the experimental design process.
In some cases, the system may integrate domain knowledge from curated databases, published literature, and expert-designed protocols. This integration may enable the system to provide up-to-date and relevant guidance for gene-editing experiments. The system may also incorporate external tools and APIs to perform specialized tasks such as guide RNA design or off-target prediction.
The large language model agent may be designed to assist researchers across various stages of gene-editing experiments, including but not limited to selecting appropriate CRISPR systems, designing guide RNAs, choosing delivery methods, and planning validation experiments. By leveraging natural language interactions, the system may aim to make complex gene-editing techniques more accessible to researchers with varying levels of expertise.
In some cases, a system for automated design of gene-editing experiments may include multiple components working together to process and execute gene editing requests.illustrates a block diagram of such a system.
The system may include a computing device. In some cases, the computing devicemay have a device housing. The device housingmay contain various components of the computing device. In some cases, the computing devicemay include a processing unit. The processing unitmay be configured to execute instructions and perform computations necessary for the automated design of gene-editing experiments.
In some cases, the computing devicemay also include a memory. The memorymay be operably coupled to the processing unit. The memorymay store data and instructions that may be accessed and executed by the processing unit. Additionally, the computing devicemay include a storage. The storagemay provide non-transitory computer-readable storage for larger amounts of data and long-term storage of instructions.
The system may also include a remote computing device. In some cases, the remote computing devicemay include a processing unit. The processing unitmay be configured to interact with a user, processing user inputs and providing outputs related to gene-editing experiment design.
In some cases, the system may further include a service provider. Such service providers may provide tools that can be utilized by the system. Such tools may be accessible via, e.g., API calls. The service providermay utilize a processing unit. The processing unitmay be configured to provide additional computational resources or specialized services for gene-editing experiment design.
The computing device, remote computing device, and service providermay be interconnected to enable communication and data flow between the devices. This interconnection may allow for distributed processing and storage capabilities in the automated design of gene-editing experiments.
In some cases, the system may include a reasoning framework. The reasoning framework may be configured to decompose a gene editing request into an ordered list of tasks. This decomposition may allow for systematic processing of complex gene-editing experiment designs.
The system may also include a Tool Provider module. In some cases, the Tool Provider module may be configured to connect to external APIs. This connection may allow the system to access additional tools and resources for gene-editing experiment design.
In some cases, the system may include a User-Proxy Agent module. The User-Proxy Agent module may be configured to interact with a user and process user inputs. This interaction may facilitate user-friendly operation of the system for automated design of gene-editing experiments.
The system may also include a Task Executor module. In some cases, the Task Executor module may be configured to implement tasks using state machines. This implementation may allow for structured execution of the tasks required for gene-editing experiment design.
In some cases, the non-transitory computer-readable storage device may contain instructions. When executed, these instructions may cause the system to perform methods for automated design of gene-editing experiments. The methods may include receiving a gene editing request, decomposing the request into tasks, sequentially executing the tasks, and outputting gene editing experiment design recommendations.
illustrates a block diagram of a systemfor automated design of gene-editing experiments. The systemmay include various components that work together to process and execute gene editing requests.
In some cases, the systemmay receive a meta request. The meta requestmay include information about a requested gene-editing experiment. This information may be used to initiate the automated design process.
The systemmay include an LLM planner. In some cases, the LLM plannermay be configured to decompose the meta requestinto an ordered list of tasks. The LLM plannermay be connected to one or more task modules. These task modulesmay represent different types of tasks that can be performed in the gene-editing experiment design process.
In some cases, the LLM plannermay be configured to sequentially send each task in the ordered list of tasks to a task executor. The LLM plannermay also optionally send a previous result to the task executor. The task executormay be configured to implement tasks received from the LLM planner.
The task executormay utilize state machines to decompose sub-goals. In some cases, the task executormay consider a current task. The current taskmay encapsulate a description of the task and any input required from a user.
The task executormay be configured to connect to one or more external application programming interfaces (APIs). In some cases, the task executormay send an API callto the APIand receive an API response. This interaction may allow the systemto access external tools and resources for gene-editing experiment design.
In some cases, the task executormay be configured to provide instructionsto an LLM agent. The LLM agentmay act as a user-proxy agent, interacting with the user and processing user inputs. The task executormay receive user inputfrom the LLM agentresponsive to the instructions.
The LLM agentmay generate LLM output, which may be presented to the user. In some cases, the LLM agentmay receive a user response. The user responsemay be processed to generate the user input, which may be passed to the task executor.
The task executormay be configured to provide feedbackto the LLM agent. This feedbackmay be based on the task and/or user input. The feedbackmay help guide the user through the gene-editing experiment design process.
In some cases, the LLM agentmay form a prompt based on the instructionfrom the task executor, a request made by the user, a history of past interactions within the current task session, results from external APIs, or a combination thereof. The LLM agentmay use this prompt to determine an appropriate next action.
The systemmay be configured to output one or more recommendations responsive to the meta request. These recommendations may be based on the processing performed by the various components of the system.
Unknown
October 2, 2025
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.