US-12221657

Method and system for improving amplicon sequencing based taxonomic resolution of microbial communities

PublishedFebruary 11, 2025

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

The taxonomic resolution obtained with conventional sequencing methods like Sanger (longer read lengths) takes a huge amount of time. While, NGS technologies (shorter read lengths) involves a lot of cost in sequencing. In addition to that the accuracy and depth of taxonomic classification is also less. A method and system for improving accuracy of amplicon sequencing based taxonomic profiling of microbial communities has been provided. The proposed strategy relies on obtaining taxonomic abundance profiles of a microbial community from two paired-end sequencing experiments, each of which targets different pair-wise combinations of non-contiguous (or contiguous) V-regions. The two taxonomic profiles are then combined based on (pre-estimated) accuracies of the individual V-regions (targeted in the experiments) in resolving each of the taxonomic groups under consideration.

Patent Claims

4 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method for improving accuracy of taxonomic profiling of a microbial community based on amplicon sequencing, the method comprising: collecting a biological sample from an environment; obtaining a first subsample and a second subsample from the biological sample; extracting microbial DNA from the first subsample and the second subsample; sequencing, the extracted microbial DNA from the first subsample using a sequencer to get first DNA sequence data, wherein the first DNA sequence data comprises of a plurality of pairs of sequence fragments, wherein each pair of the plurality of pairs of sequence fragments is generated through paired-end sequencing of a first amplicon that comprises a first combination of informative regions within the first amplicon, wherein the first combination of informative regions comprises informative regions arranged contiguously or non-contiguously in a phylogenetic marker gene targeted in the first amplicon sequencing, the sequencing of the first combination of informative regions arranging contiguously comprising the steps of: designing primers including a forward primer and a reverse primer against a stretch of the extracted microbial DNA such that the informative regions reside within the stretch and the primers target two contiguous informative regions, wherein the two contiguous informative regions are two adjacent informative regions; generating paired-end reads including a forward read and a reverse read by performing the paired-end sequencing of the first amplicon, wherein the paired-end sequencing is a 250 bpx2 paired-end sequencing where the stretch of the extracted microbial DNA is sequenced from both ends; and merging the forward read and the reverse read into a single sequence forming a merged read based on an overlap between the forward read and the reverse read constituting a pair, wherein the overlap is found between the forward and the reverse read on sequencing the two adjacent informative regions; and wherein the informative regions contain phylogenetically relevant information; sequencing, the extracted DNA from the second subsample using the sequencer to get second DNA sequence data, wherein the second DNA sequence data comprises of a plurality of pairs of sequence fragments, wherein each pair of the plurality of pairs of sequence fragments is generated through paired-end sequencing of a second amplicon that comprises a second combination of informative regions within the second amplicon, wherein the second combination of informative regions comprises informative regions arranged non-contiguously in the phylogenetic marker gene targeted in the second amplicon sequencing, the sequencing of the second combination of informative regions arranging non-contiguously comprising the steps of: designing primers including aforward primer and areverse primer against a stretch of the extracted microbial DNA such that the informative regions reside within the stretch and the primers target two non-contiguous informative regions, wherein the two non-contiguous informative regions are two distantly separated informative regions; generating paired-end reads including a forward read and a reverse read by performing the paired-end sequencing of the second amplicon, wherein the paired-end sequencing is a 250 bpx2 paired-end sequencing where the stretch of the extracted microbial DNA is sequenced from both ends; concatenating the forward read and the reverse read into a single sequence forming a concatenated read using a string of multiple ambiguous nucleotide characters when the forward read and the reverse read do not overlap, wherein the overlap is not found between the forward read and the reverse read on sequencing the two separated informative regions; wherein utility of targeting pairs of non-contiguously placed informative regions improves taxonomic classification accuracy, wherein the second combination of informative regions are different from the first combination of informative regions and one of the informative regions in the first combination of informative regions and the second combination of informative regions is shared by the first combination of informative regions and the second combination of informative regions, and wherein the first and second amplicon sequencing experiments target the phylogenetic marker gene; generating, via one or more hardware processors, a first microbial taxonomic abundance profile of the first sequenced subsample by performing a taxonomic classification of phylogenetically relevant information corresponding to the first combination of informative regions, wherein the first combination of informative regions are submitted as query sequences for performing the taxonomic classification, and wherein the first microbial taxonomic abundance profile comprises abundance values corresponding to one or more pair of sequence fragments comprising the first combination of informative regions classified into a plurality of taxonomic groups; generating, via the one or more hardware processors, a second microbial taxonomic abundance profile of the second sequenced subsample by performing the taxonomic classification of phylogenetically relevant information corresponding to the second combination of informative regions, wherein the second combination of informative regions are submitted as query sequences for performing the taxonomic classification, and wherein the second microbial taxonomic abundance profile comprises abundance values corresponding to one or more pair of sequence fragments comprising the second combination of informative regions classified into the plurality of taxonomic groups; pre-computing, via the one or more hardware processors, taxonomic classification accuracies for various possible combinations of informative regions for microbes belonging to the plurality of taxonomic groups, wherein the pre-computing is based on marker gene sequences of known taxonomic origin present in existing sequence databases, to generate a computation table; and combining, via the one or more hardware processors, the first microbial taxonomic abundance profile and the second microbial taxonomic abundance profile of the first and the second sequenced subsample based on the computation table to generate a combined microbial taxonomic abundance profile, wherein combining the first microbial taxonomic abundance profile and the second microbial taxonomic abundance profile utilizes a combinatorial strategy and the combined microbial taxonomic abundance profile has a refined abundance value for each taxonomic group and has improved taxonomic classification accuracy as compared to the first microbial taxonomic abundance profile and the second microbial taxonomic abundance profile obtained individually for the first and the second subsample, targeting the first combination of informative regions and the second combination of informative regions in the phylogenetic marker gene, wherein the combinatorial strategy comprises: obtaining the abundance values of a particular taxonomic group ‘i’ (Tix and Tiy) corresponding to the first and second sequenced subsamples, generated by performing the taxonomic classification utilizing the first combination of informative regions and the second combination of informative regions; providing pre-computed relative accuracies Wix and Wiy in taxonomic classification for the particular taxonomic group ‘i’ using the first combination of informative regions ‘x’ and the second combination of informative regions ‘y’; and calculating the refined abundance value (Tixy) for the particular taxonomic group ‘i ’ using the following formula:, T i xy = ( W i x W i y * T i x ) + T i y 1 + W i x W i y and calculating the refined abundance value for all the taxonomic groups to obtain a more accurate microbial taxonomic abundance profile as compared to the first microbial taxonomic abundance profile and the second microbial taxonomic abundance profile obtained individually for the first and the second subsample.

2. The method of claim 1, wherein the sequencing the extracted microbial DNA from the first subsample and the second subsample is performed on the first amplicon and the second amplicon, wherein the first amplicon and the second amplicon constitute a 16S rRNA gene or portions of the 16S rRNA gene, and wherein the 16S rRNA gene comprises multiple phylogenetically informative regions.

3. The method of claim 1, wherein each informative region of the first combination of informative regions and the second combination of informative regions is a variable region in a 16S rRNA gene amplicon.

4. The method of claim 1, wherein the choice of the phylogenetic marker gene and the first combination of informative regions and the second combination of informative regions within the marker gene selected for amplicon sequencing is based on the pre-computed taxonomic classification accuracies for the various possible combinations of informative regions for microbes belonging to the plurality of taxonomic groups.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

C12Q G16B

Patent Metadata

Filing Date

August 9, 2019

Publication Date

February 11, 2025

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search