US-RE050636-B2

Accurate molecular barcoding

PublishedOctober 14, 2025

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

In accordance with some embodiments herein, compositions and methods for accurate barcoding of nucleic acids are described. The compositions and methods can involve a plurality of unique oligonucleotide species comprising unique molecule barcodes. In some embodiments, the molecule barcodes can have a relatively low G content, and can exhibit reduced bias in amplification and analysis.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A reverse transcription reaction composition comprising;:

2. The composition of, wherein the constraint on G content is less than 1% of the unique oligonucleotide species comprising the molecule barcodehavehavinga G content of 50% or more.

3. The composition of, wherein the constraint on G content is the molecule barcodes of all of the unique oligonucleotide species in the composition collectively have a G content of no more than 12.5%.

4. The composition of, wherein the unique oligonucleotide species are disposed in at least two spatially isolated pools, each pool comprising at least 100 unique oligonucleotides of the unique oligonucleotide species,

5. The composition of, wherein the unique oligonucleotide species of each pool are immobilized on a substrate, so that the sample barcodes but not the molecule barcodes are the same for the oligonucleotide species immobilized on each substrate.

6. The composition of, wherein for at least 95% of the unique oligonucleotide species, any G in the molecule barcode is not adjacent to another G.

7. The composition of, wherein at least 95% of the molecule barcodes of the unique oligonucleotide species comprise the sequence HNHNHNHN, wherein each “H” is any one of A, C, or T, and wherein each “N” is any one of A, G, C, or T.

8. The composition of, wherein each of the unique oligonucleotide species comprises a spacer 3′ of the barcode region and 5′ of the target specific region, said spacer comprising the sequence HHHHHHHH, wherein each “H” is any one of A, C, or T.

9. The composition of, wherein each oligonucleotide species has a length of 24-140 nucleotides.

10. The composition of, wherein the composition comprises at least two oligonucleotides of the same unique oligonucleotide species.

11. The composition of, wherein the uniform region comprises a target-specific region comprising a sequence flanking an immune cell receptor or immunoglobulin variable region coding sequence.

12. The composition of, wherein the immune cell receptor variable region coding sequence is selected from the group consisting of: a T cell receptor variable region coding sequence, a B cell receptor variable region coding sequence, and a combination thereof.

13. The composition of, wherein the molecule barcodes of all of the unique oligonucleotide species in the composition collectively have a G content of 2.5%-10%.

14. The composition of, wherein the molecule barcode is 7-9 nucleotides.

15. The composition of, comprising at least 6,500 unique oligonucleotide species.

16. A method of specifically barcoding cDNA from two or more samples, each sample comprising nucleic acids, the method comprising:

17. The method of, further comprising ascertaining nucleic acid sequences of the strands comprising the oligonucleotides of the unique oligonucleotide species and the sequence complementary to the target.

18. The method of, wherein the constraint on G content is the molecule barcodes of the unique oligonucleotide species collectively having a G content of less than 12.5%.

19. The method of, wherein the molecule barcode is 7-9 nucleotides.

20. The method of, the pool comprises at least 1000 unique oligonucleotide species.

21. The method of, the pool comprises at least 6,500 unique oligonucleotide species.

22. A kit for amplifying barcoded cDNA comprising an immune cell receptor or immunoglobulin variable region coding sequence, comprising;:

23. The kit of, wherein the molecule barcode is 7-9 nucleotides.

24. The kit of, wherein composition comprises at least 6,500 unique oligonucleotide species.

25. A composition comprising:

26. The composition of, wherein the constraint on G content is less than 1% of the unique oligonucleotide species comprising the molecule barcode having a G content of 50% or more.

27. The composition of, wherein the constraint on G content is the molecule barcodes of all of the unique oligonucleotide species in the composition collectively have a G content of no more than 12.5%.

28. The composition of, wherein the unique oligonucleotide species are disposed in at least two spatially isolated pools, each pool comprising at least 100 unique oligonucleotides of the unique oligonucleotide species, wherein unique oligonucleotides in the same pool comprise the same sample barcode sequence, and wherein different unique oligonucleotides of the same pool comprise a different molecule barcode sequences.

29. The composition of, wherein the unique oligonucleotide species of each pool are immobilized on a substrate, so that the sample barcodes but not the molecule barcodes are the same for the oligonucleotide species immobilized on each substrate.

30. The composition of, wherein for at least 95% of the unique oligonucleotide species, any G in the molecule barcode is not adjacent to another G.

31. The composition of, wherein at least 95% of the molecule barcodes of the unique oligonucleotide species comprise the sequence HNHNHNHN, wherein each “H” is any one of A, C, or T, and wherein each “N” is any one of A, G, C, or T.

32. The composition of, wherein each of the unique oligonucleotide species comprises a spacer 3′ of the barcode region and 5′ of the target specific region, said spacer comprising the sequence HHHHHHHH, wherein each “H” is any one of A, C, or T.

33. The composition of, wherein each oligonucleotide species has a length of 24-140 nucleotides.

34. The composition of, wherein the composition comprises at least two oligonucleotides of the same unique oligonucleotide species.

35. The composition of, wherein the uniform region comprises a target-specific region comprising a sequence flanking an immune cell receptor or immunoglobulin variable region coding sequence.

36. The composition of, wherein the immune cell receptor variable region coding sequence is selected from the group consisting of: a T cell receptor variable region coding sequence, a B cell receptor variable region coding sequence, and a combination thereof.

37. The composition of, wherein the molecule barcodes of all of the unique oligonucleotide species in the composition collectively have a G content of 2.5%-10%.

38. The composition of, wherein the molecule barcode is 7-9 nucleotides.

39. The composition of, comprising at least 6,500 unique oligonucleotide species.

40. The method of, wherein the constraint on G content is less than 1% of the unique oligonucleotide species comprising the molecule barcode having a G content of 50% or more.

41. The method of, wherein the unique oligonucleotide species of each pool are immobilized on a substrate, so that the sample barcodes but not the molecule barcodes are the same for the oligonucleotide species immobilized on each substrate.

42. The method of, wherein for at least 95% of the unique oligonucleotide species, any G in the molecule barcode is not adjacent to another G.

43. The method of, wherein at least 95% of the molecule barcodes of the unique oligonucleotide species comprise the sequence HNHNHNHN, wherein each “H” is any one of A, C, or T, and wherein each “N” is any one of A, G, C, or T.

44. The method of, wherein each of the unique oligonucleotide species comprises a spacer 3′ of the barcode region and 5′ of the target specific region, said spacer comprising the sequence HHHHHHHH, wherein each “H” is any one of A, C, or T.

45. The method of, wherein each oligonucleotide species has a length of 24-140 nucleotides.

46. The method of, wherein the target-specific region comprises a sequence flanking an immune cell receptor or immunoglobulin variable region coding sequence.

47. The method of, wherein the immune cell receptor variable region coding sequence is selected from the group consisting of: a T cell receptor variable region coding sequence, a B cell receptor variable region coding sequence, and a combination thereof.

48. The kit of, wherein the constraint on G content is the molecule barcodes of the unique oligonucleotide species collectively having a G content of less than 12.5%.

49. The kit of, wherein the constraint on G content is less than 1% of the unique oligonucleotide species comprising the molecule barcode having a G content of 50% or more.

50. The kit of, wherein the unique oligonucleotide species of each pool are immobilized on a substrate, so that the sample barcodes but not the molecule barcodes are the same for the oligonucleotide species immobilized on each substrate.

51. The kit of, wherein for at least 95% of the unique oligonucleotide species, any G in the molecule barcode is not adjacent to another G.

52. The kit of, wherein at least 95% of the molecule barcodes of the unique oligonucleotide species comprise the sequence HNHNHNHN, wherein each “H” is any one of A, C, or T, and wherein each “N” is any one of A, G, C, or T.

53. The kit of, wherein each of the unique oligonucleotide species comprises a spacer 3′ of the barcode region and 5′ of the target specific region, said spacer comprising the sequence HHHHHHHH, wherein each “H” is any one of A, C, or T.

54. The kit of, wherein each oligonucleotide species has a length of 24-140 nucleotides.

Detailed Description

Complete technical specification and implementation details from the patent document.

The present application claims the benefit of U.S. Provisional App. No. 62/330,500 filed May 2, 2016, which is hereby incorporated by reference in its entirety.

The present application is being filed along with a Sequence Listing in electronic format. The Sequence Listing is provided as a file entitled SEQUENCEBDCRI019A.TXT, created and last modified Apr. 21, 2017, which is 32,587 bytes in size, and is replaced by a file entitled BDCRI019R1_SEQLIST.xml, created and last modified Oct. 25, 2022, which is 195,405 bytes in size. The information in the electronic format of the Sequence Listing is incorporated herein by reference in its entirety.

Embodiments herein relate generally to compositions and methods for accurate barcoding of molecules, for example nucleic acid molecules.

Some embodiments include a composition comprising at least 1000 unique oligonucleotide species is provided. Each unique oligonucleotide species can comprise a barcode region comprising a molecule barcode comprising at least 7 nucleotides, in which the unique oligonucleotide species comprise different nucleic acid sequences in their barcode regions, and in which at least one of: (a) the composition consists essentially of unique oligonucleotide species wherein each molecular barcode has a G content of less than 50%; or (b) the molecule barcodes of all of the unique oligonucleotide species in the composition collectively have a G content of no more than 12.5%. In some embodiments, the barcode region further comprises a sample barcode comprising at least 3 nucleotides. In some embodiments, each unique oligonucleotide species further comprises a uniform region 3′ of the barcode region. In some embodiments, the uniform region further comprises a target-specific region 3′ of the barcode region, the target-specific region comprising at least 10 nucleotides complementary to a target nucleic acid. In some embodiments, the composition consists essentially of unique oligonucleotide species wherein each molecular barcode has a G content of less than 50%. In some embodiments, the molecule barcodes of all of the unique oligonucleotide species in the composition collectively have a G content of no more than 12.5%. In some embodiments, the unique oligonucleotide species are disposed in at least two spatially isolated pools, each pool comprising at least 100 unique oligonucleotides of the unique oligonucleotide species, wherein unique oligonucleotides in the same pool comprise the same sample barcode sequence, and wherein different unique oligonucleotides of the same pool comprise a different molecule barcode sequences. In some embodiments, the sample barcode of each unique oligonucleotide species has a G content of 50% or less. In some embodiments, the barcode region of each unique oligonucleotide species has a G content of 50% or less. In some embodiments, the molecule barcodes of the unique oligonucleotide species collectively have a G content of less than 12.5%. In some embodiments, the barcode regions of the unique oligonucleotide species collectively have a G content of no more than 12.5%. In some embodiments, for at least 95% of the unique oligonucleotide species, any G in the molecule barcode is not adjacent to another G. In some embodiments, the composition consists essentially of unique oligonucleotide species for which any G in the molecule barcode is not adjacent to another G. In some embodiments, at least 95% of the molecule barcodes of the unique oligonucleotide species comprise a sequence totaling at least 6 alternating H's and N's, wherein each “H” is any one of A, C, or T, and wherein each “N” is any one of A, G, C, or T. In some embodiments, least 95% of the molecule barcodes of the unique oligonucleotide species comprise the sequence HNHNHNHN, wherein each “H” is any one of A, C, or T, and wherein each “N” is any one of A, G, C, or T. In some embodiments, each molecule barcode of the unique oligonucleotide species comprises the sequence HNHNHNHN, wherein each “H” is any one of A, C, or T, and wherein each “N” is any one of A, G, C, or T. In some embodiments, each molecule barcode of the unique oligonucleotide species comprises the sequence HHHHHHHH, wherein each “H” is any one of A, C, or T. In some embodiments, each of the unique oligonucleotide species comprises a spacer 3′ of the barcode region and 5′ of the target specific region, said spacer comprising the sequence HNHNHNHN, wherein each “H” is any one of A, C, or T, and wherein each “N” is any one of A, G, C, or T. In some embodiments, each of the unique oligonucleotide species comprises a spacer 3′ of the barcode region and 5′ of the target specific region, said spacer comprising the sequence HHHHHHHH, wherein each “H” is any one of A, C, or T. In some embodiments, the target specific region comprises an oligo dT sequence. In some embodiments, for each unique oligonucleotide species, the molecule barcode is 3′ of the sample barcode. In some embodiments, for each unique oligonucleotide species, the sample barcode is 3′ of the molecule barcode. In some embodiments, each oligonucleotide species has a length of at least 24 nucleotides. In some embodiments, each oligonucleotide species has a length of 24-140 nucleotides. In some embodiments, the composition comprises at least 6,500 unique oligonucleotide species. In some embodiments, the composition comprises at least 65,000 unique oligonucleotide species. In some embodiments, the composition comprises at least two oligonucleotides of the same unique oligonucleotide species. In some embodiments, no two oligonucleotides of the composition are of the same unique oligonucleotide species. In some embodiments, the composition comprises at least 48 pools. In some embodiments, the unique oligonucleotide species of each pool are immobilized on a substrate, so that the sample barcodes but not the molecule barcodes are the same for the oligonucleotide species immobilized on each substrate. In some embodiments, the substrate comprises a discrete region of a surface, so that the surface can comprise two or more substrates. In some embodiments, the substrate comprises a bead. In some embodiments, each of the unique oligonucleotide species further comprises an adapter configured to immobilize the unique oligonucleotide on the substrate, wherein said barcode region is 3′ of the adapter. In some embodiments, the uniform region comprises a target-specific region comprising a sequence flanking an immune cell receptor or immunoglobulin variable region coding sequence. In some embodiments, the immune cell receptor variable region coding sequence is a T cell receptor variable region coding sequence, a B cell receptor variable region coding sequence, or a combination thereof. In some embodiments, a kit comprising the composition comprising at least 1000 unique oligonucleotide species is provided. The kit can further comprise a primer configured to hybridize on an opposite side of the variable region as the target-specific region and to hybridize to a complementary strand to a strand hybridized by the target-specific region, so that the primer is configured to amplify the variable region in conjunction with the target-specific region. In some embodiments, the primer and target-specific region are configured to amplify a nucleic acid of at least 1 kb and comprising the variable region. In some embodiments, the primer of the kit is part of the composition comprising the unique oligonucleotide species. In some embodiments, the primer of the kit is part of an other composition that is separate from the composition comprising the unique oligonucleotide species.

Some embodiments include a method of specifically barcoding a plurality of nucleic acids from two or more samples is provided. Each sample can comprising nucleic acid. The method can comprise contacting each sample with a pool comprising at least 100 unique oligonucleotide species, in which each sample is contacted in spatial isolation from the other samples. Each unique oligonucleotide species can comprise a barcode region comprising a molecule barcode comprising at least 7 nucleotides. The unique polynucleotide species of each pool can comprise the same sample barcode, and comprise different molecule barcodes. In the method, at least (a) the unique oligonucleotide species contacted with the sample consist essentially of unique oligonucleotide species wherein each molecular barcode has a G content of less than 50%; or (b) the molecule barcodes of all of the unique oligonucleotide species collectively have a G content of no more than 12.5% can apply. The method can include hybridizing target-specific regions of at least some oligonucleotides of the unique oligonucleotide species to at least some of the nucleic acids of the sample. The method can include extending the hybridized oligonucleotides, thereby producing strands comprising an oligonucleotide of the unique oligonucleotide species and a sequence complementary to the target, wherein for each sample, the strands comprise the same sample barcode and different molecule barcodes, and wherein for different samples, the molecule barcodes are different. In some embodiments, the barcode region further comprises a sample barcode comprising at least 3 nucleotides. In some embodiments, each unique oligonucleotide species further comprises a uniform region 3′ of the barcode region. In some embodiments, the uniform region further comprises a target-specific region 3′ of the barcode region, the target-specific region comprising at least 10 nucleotides complementary to a target nucleic acid. In some embodiments, the unique oligonucleotide species contacted with the sample consist essentially of unique oligonucleotide species wherein each molecular barcode has a G content of less than 50%. In some embodiments, the molecule barcodes of all of the unique oligonucleotide species collectively have a G content of no more than 12.5%. In some embodiments, the method further comprises ascertaining nucleic acid sequences of the strands comprising the oligonucleotides of the unique oligonucleotide species and the sequence complementary to the target. In some embodiments, the at least 100 unique oligonucleotide species of each pool are immobilized on a substrate, so that the unique oligonucleotide species immobilized on a given substrate comprise the same sample barcode, and different unique oligonucleotide species immobilized on the substrate comprise different molecule barcodes. In some embodiments, each sample barcode has a G content of 50% or less. In some embodiments, the molecule barcodes of the unique oligonucleotide species collectively have a G content of less than 12.5%. In some embodiments, the barcode regions of the unique oligonucleotide species collectively have a G content of no more than 12.5%. In some embodiments, for at least 95% of the unique oligonucleotide species, any G in the molecule barcode is not adjacent to another G. In some embodiments, each pool consists essentially of unique oligonucleotide species for which any G in the molecule barcode is not adjacent to another G. In some embodiments, at least 95% of the molecule barcodes of the unique oligonucleotide species comprise a sequence totaling at least 6 alternating H's and N's, wherein each “H” is any one of A, C, or T, and wherein each “N” is any one of A, G, C, or T. In some embodiments, at least 95% of the molecule barcodes of the unique oligonucleotide species comprise the sequence HNHNHNHN, wherein each “H” is any one of A, C, or T, and wherein each “N” is any one of A, G, C, or T. In some embodiments, each molecule barcode of the unique oligonucleotide species comprises the sequence HNHNHNHN, wherein each “H” is any one of A, C, or T, and wherein each “N” is any one of A, G, C, or T. In some embodiments, each molecule barcode of the unique oligonucleotides comprises the sequence HHHHHHHH, wherein each “H” is any one of A, C, or T. In some embodiments, each unique oligonucleotide comprises a spacer 3′ of the barcode region and 5′ of the target specific region, said spacer comprising the sequence HNHNHNHN, wherein each “H” is any one of A, C, or T, and wherein each “N” is any one of A, G, C, or T. In some embodiments, each unique oligonucleotide comprises a spacer 3′ of the barcode region and 5′ of the target specific region, said spacer comprising the sequence HHHHHHHH, wherein each “H” is any one of A, C, or T. In some embodiments, at least one pool comprises least two oligonucleotides of the same unique oligonucleotide species. In some embodiments, no pool comprises two oligonucleotides of the same unique oligonucleotide species. In some embodiments, the target specific region comprises an oligo dT sequence. In some embodiments, for each unique oligonucleotide species, the molecule barcode is 3′ of the sample barcode. In some embodiments, for each unique oligonucleotide species, the sample barcode is 3′ of the molecule barcode. In some embodiments, each unique oligonucleotide species has a length of at least 24 nucleotides. In some embodiments, each unique oligonucleotide species has a length of 24-140 nucleotides. In some embodiments, each pool comprises at least 6,500 unique oligonucleotide species. In some embodiments, each pool comprises at least 65,000 unique oligonucleotide species. In some embodiments, at least 48 unique samples are each contacted with a unique pool. In some embodiments, at least 99% of the samples comprise no more than one cell each. In some embodiments, the unique oligonucleotide species of each pool are immobilized on a substrate, so that the sample barcodes but not the molecule barcodes are the same for the unique oligonucleotide species immobilized on each substrate of the plurality. In some embodiments, the substrate comprises a spatially-isolated region of a surface, so that the substrates of different pools comprise the different spatially-isolated regions of the surface. In some embodiments, the substrate comprises a bead. In some embodiments, each of the unique oligonucleotide species further comprises an adapter configured to immobilize the unique oligonucleotide on the substrate, wherein said barcode region is 3′ of the adapter. In some embodiments, the uniform region comprises a target-specific region comprising a sequence flanking a sequence encoding the variable region of an immune cell receptor or immunoglobulin. In some embodiments, the variable region is of a T cell receptor or B cell receptor, or a combination thereof. In some embodiments, the method further comprises contacting the extended strands comprising an oligonucleotide of the unique oligonucleotide species and a sequence complementary to the target with primer configured to hybridize on an opposite side of the variable region as the target-specific region, and to hybridize to a complementary strand to a strand hybridized by the target-specific region. As such, the method can comprise amplifying sequences encoding variable regions of a T cell receptor, B cell receptor, or immunoglobulin. In some embodiments, the method amplifies a sequence of at least 1 kb, which comprises the variable region coding sequence.

Some embodiments include a method of making a composition comprising unique oligonucleotides is provided. The method can comprise providing a plurality of different sample barcodes comprising at least 3 nucleotides each. The method can comprise providing a plurality of different molecule barcodes comprising at least 7 nucleotides each. The methods can comprise synthesizing a plurality of unique oligonucleotide species, each unique oligonucleotide species comprising a barcode region comprising a sample barcode and a molecule barcode, in which at least one of: (a) the plurality of unique oligonucleotide species consists essentially of unique oligonucleotide species wherein each molecular barcode has a G content of less than 50%; or (b) the molecule barcodes of all of the unique oligonucleotide species in the plurality collectively have a G content of no more than 12.5%. The method can comprise disposing the unique oligonucleotide species in spatially-isolated pools, in which each pool comprises multiple unique oligonucleotide species, so that the unique oligonucleotide species of the same pool comprise the same sample barcode sequence, and wherein different unique oligonucleotide species of the same pool comprise different molecule barcode sequences, and in which each pool comprises at least 1000 unique oligonucleotide species. In some embodiments, ach unique oligonucleotide species further comprises a uniform region 3′ of the barcode region. In some embodiments, the uniform region further comprises a target-specific region 3′ of the barcode region, the target-specific region comprising at least 10 nucleotides complementary to a target nucleic acid. In some embodiments, the plurality of unique oligonucleotide species consists essentially of unique oligonucleotide species wherein each molecular barcode has a G content of less than 50%. In some embodiments, the molecule barcodes of all of the unique oligonucleotide species in the plurality collectively have a G content of no more than 12.5%. In some embodiments, each molecule barcode has a G content of 50% or less. In some embodiments, each sample barcode has a G content of 50% or less. In some embodiments, the sample barcodes of the unique oligonucleotide species collectively have a G content of no more than 12.5%. In some embodiments, the molecule barcodes of the unique oligonucleotide species collectively have a G content of less than 12.5%. In some embodiments, the barcode regions of the unique oligonucleotide species collectively have a G content of no more than 12.5%. In some embodiments, for at least 95% of the unique oligonucleotide species, any G in the molecule barcode is not adjacent to another G. In some embodiments, the plurality of the unique oligonucleotide species consists essentially of unique oligonucleotide species for which any G in the molecule barcode is not adjacent to another G. In some embodiments, at least 95% of the molecule barcodes of the unique oligonucleotide species comprise a sequence totaling at least 6 alternating H's and N's, wherein each “H” is any one of A, C, or T, and wherein each “N” is any one of A, G, C, or T. In some embodiments, at least 95% of the molecule barcodes of the unique oligonucleotides comprise the sequence HNHNHNHN, wherein each “H” is any one of A, C, or T, and wherein each “N” is any one of A, G, C, or T. In some embodiments, each molecule barcode of the unique oligonucleotides comprises the sequence HNHNHNHN, wherein each “H” is any one of A, C, or T, and wherein each “N” is any one of A, G, C, or T. In some embodiments, each molecule barcode of the unique oligonucleotides comprises the sequence HHHHHHHH, wherein each “H” is any one of A, C, or T. In some embodiments, each unique oligonucleotide species comprises a spacer 3′ of the barcode region and 5′ of the target specific region, said spacer comprising the sequence HNHNHNHN, wherein each “H” is any one of A, C, or T, and wherein each “N” is any one of A, G, C, or T. In some embodiments, each unique oligonucleotide species comprises a spacer 3′ of the barcode region and 5′ of the target specific region, said spacer comprising the sequence HHHHHHHH, wherein each “H” is any one of A, C, or T. In some embodiments, the target specific region comprises an oligo dT sequence. In some embodiments, for each unique oligonucleotide species, the molecule barcode is 3′ of the sample barcode. In some embodiments, each unique oligonucleotide species, the sample barcode is 3′ of the molecule barcode. In some embodiments, each unique oligonucleotide species has a length of at least 24 nucleotides. In some embodiments, each unique oligonucleotide species has a length of 24-140 nucleotides. In some embodiments, each pool comprises at least 6,500 unique oligonucleotide species. In some embodiments, each pool comprises at least 65,000 unique oligonucleotide species. In some embodiments, at least 48 pools are made. In some embodiments, the method further comprises immobililzing the unique oligonucleotide species of each pool onto a substrate, so that the sample barcodes but not the molecule barcodes are the same for the oligonucleotide species immobilized on each substrate of the plurality. In some embodiments, the substrates comprise discrete regions of a surface. In some embodiments, the substrates comprise beads. In some embodiments, the unique oligonucleotide species are disposed in spatially-isolated pools concurrent with said synthesis. In some embodiments, the unique oligonucleotide species are disposed in spatially-isolated pools after said synthesis. In some embodiments, the uniform region comprises a target-specific region comprising a sequence flanking a variable region coding sequence of an immune cell receptor or immunoglobulin. In some embodiments, the immune cell receptor variable region coding sequence is a T cell receptor variable region coding sequence, a B cell receptor variable region coding sequence, or a combination thereof. In some embodiments, the kit further comprising a primer configured to hybridize on an opposite side of the variable region coding sequence as the target-specific region, and to hybridize to a complementary strand to a strand hybridized by the target-specific region, and is thus configured, in conjunction with the target-specific region, to amplify the variable region coding sequence.

Some embodiments include an oligonucleotide comprising a barcode region 3′ of the adapter region. The barcode region can comprise a molecule barcode comprising at least 7 nucleotides, in which the molecule barcode has a G content of no more than 50%. In some embodiments, the oligonucleotide further comprises a sample barcode comprising at least 3 nucleotides. In some embodiments, the oligonucleotide further comprises a uniform region 3′ of the barcode region. In some embodiments, the uniform region comprises a target-specific region comprising at least 10 nucleotides complementary to a target nucleic acid. In some embodiments, the oligonucleotide further comprises an adapter region 5′ of the barcode region. In some embodiments, the uniform region comprises a target-specific region comprising a sequence flanking a variable region coding sequence of an immune cell receptor or immunoglobulin. In some embodiments, the immune cell receptor variable region coding sequence is a T cell receptor variable region coding sequence, a B cell receptor variable region coding sequence, or a combination thereof. In some embodiments, the kit further comprising a primer configured to hybridize on an opposite side of the variable region coding sequence as the target-specific region, and to hybridize to a complementary strand to a strand hybridized by the target-specific region, and is thus configured to amplify the variable region coding sequence in conjunction with the target-specific region.

In accordance with some embodiments herein, methods and compositions for accurate barcoding and analysis of nucleic acids are described. In some embodiments, individual nucleic acids of a sample can be associated with a unique barcode (e.g. a “molecule barcode”), so that upon amplification and sequence analysis, individual nucleic acids of a sample can be quantified. Without being limited by any theory, it is contemplated that bias favoring or disfavoring the representation, amplification, or properties of certain kinds of barcode sequences can interference with quantification and analysis of the individual nucleic acids of a sample (possible sources of bias in some amplification events are schematically illustrated in). Described in accordance with some embodiments herein are configurations and characteristics of unique oligonucleotide species comprising barcodes, in which the unique oligonucleotide species are configured to minimize barcode-related bias, and yield accurate analysis of nucleic acids. Without being limited by any theory, it is contemplated that barcode regions comprising features such as a guanosine (G) content below 50%, and/or no two consecutive “G's” in the barcode region can minimize bias that could otherwise confound quantification and/or analysis of nucleic acids of a sample. Optionally, the individual nucleic acid molecules of a given sample can also be associated with a “sample barcode”, so that barcode-associated nucleic acids can be subsequently pooled for an efficient batch analysis of nucleic acids from two or more samples, for example by next-generation sequencing.

Nucleic Aids

Various nucleic acids are described in accordance with some embodiments herein. For example, oligonucleotide species, samples, and/or targets can comprise nucleic acids.

As used herein, a “nucleic acid” refers to a polynucleotide sequence, or fragment thereof. A nucleic acid can comprise nucleotides. A nucleic acid can be exogenous or endogenous to a cell. A nucleic acid can exist in a cell-free environment. A nucleic acid can comprise, consist of, or consist essentially of a gene or fragment thereof. A nucleic acid can comprise, consist of, or consist essentially of DNA. A nucleic acid can comprise, consist of, or consist essentially of RNA. A nucleic acid can comprise one or more analogs (e.g. altered backgone, sugar, or nucleobase). Some non-limiting examples of analogs include: 5-bromouracil, peptide nucleic acid, xeno nucleic acid, morpholinos, locked nucleic acids, glycol nucleic acids, threose nucleic acids, dideoxynucleotides, cordycepin, 7-deaza-GTP, florophores (e.g. rhodamine or flurescein linked to the sugar), thiol containing nucleotides, biotin linked nucleotides, fluorescent base analogs, CpG islands, methyl-7-guanosine, methylated nucleotides, inosine, thiouridine, pseudourdine, dihydrouridine, queuosine, and wyosine. “Nucleic acid”, “polynucleotide,” “target polynucleotide”, and “target nucleic acid” can be used interchangeably.

As used herein, “upstream” (and variations of this root term) refers to a position that is relatively 5′ on a nucleic acid (e.g., 5′ in comparison to reference position). As used herein “downstream” (and variations of this root term) refers to a position that is relatively 3′ on a nucleic acid (e.g., 3′ in comparison to reference position). For example, as shown in, the “sample barcode” is 3′ of the “molecule barcode” and is understood to be downstream of the molecule “barcode.” For example, as shown in, the “sample barcode” is 5′ of the “molecule barcode” and is understood to be upstream of the molecule “barcode.”

A nucleic acid can comprise one or more modifications (e.g., a base modification, a backbone modification), to provide the nucleic acid with a new or enhanced feature (e.g., improved stability). A nucleic acid can comprise a nucleic acid affinity tag. A nucleoside can comprise, consist of, or consist essentially of a base-sugar combination. The base portion of the nucleoside can be a heterocyclic base. The two most common classes of such heterocyclic bases are the purines and the pyrimidines. Nucleotides can comprise, consist of, or consist essentially of nucleosides that further include a phosphate group covalently linked to the sugar portion of the nucleoside. For those nucleosides that include a pentofuranosyl sugar, the phosphate group can be linked to the 2′, the 3′, or the 5′ hydroxyl moiety of the sugar. In forming nucleic acids, the phosphate groups can covalently link adjacent nucleosides to one another to form a linear polymeric compound. In turn, the respective ends of this linear polymeric compound can be further joined to form a circular compound; however, linear compounds are generally suitable. In addition, linear compounds may have internal nucleotide base complementarity and may therefore fold in a manner as to produce a fully or partially double-stranded compound. Within nucleic acids, the phosphate groups can commonly be referred to as forming the internucleoside backbone of the nucleic acid. The linkage or backbone of the nucleic acid can be a 3′ to 5′ phosphodiester linkage.

A nucleic acid can comprise a modified backbone and/or modified internucleoside linkages. Modified backbones can include those that retain a phosphorus atom in the backbone and those that do not have a phosphorus atom in the backbone. Suitable modified nucleic acid backbones containing a phosphorus atom therein can include, for example, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkylphosphotriesters, methyl and other alkyl phosphonates such as 3′-alkylene phosphonates, 5′-alkylene phosphonates, chiral phosphonates, phosphinates, phosphoramidates including 3′-amino phosphoramidate and aminoalkylphosphoramidates, phosphorodiamidates, thionophosphoramidates, thionoalkylphosphonates, thionoalkylphosphotriesters, selenophosphates, and boranophosphates having normal 3′-5′ linkages, 2′-5′ linked analogs, and those having inverted polarity wherein one or more internucleotide linkages is a 3′ to 3′, a 5′ to 5′ or a 2′ to 2′ linkage.

A nucleic acid can comprise polynucleotide backbones that are formed by short chain alkyl or cycloalkyl internucleoside linkages, mixed heteroatom and alkyl or cycloalkyl internucleoside linkages, or one or more short chain heteroatomic or heterocyclic internucleoside linkages. These can include those having morpholino linkages (formed in part from the sugar portion of a nucleoside); siloxane backbones; sulfide, sulfoxide and sulfone backbones; formacetyl and thioformacetyl backbones; methylene formacetyl and thioformacetyl backbones; riboacetyl backbones; alkene containing backbones; sulfamate backbones; methyleneimino and methylenehydrazino backbones; sulfonate and sulfonamide backbones; amide backbones; and others having mixed N, O, S and CHcomponent parts.

A nucleic acid can comprise a nucleic acid mimetic. The term “mimetic” includes, for example, polynucleotides wherein only the furanose ring or both the furanose ring and the internucleotide linkage are replaced with non-furanose groups, replacement of only the furanose ring can also be referred as being a sugar surrogate. The heterocyclic base moiety or a modified heterocyclic base moiety can be maintained for hybridization with an appropriate target nucleic acid. One such nucleic acid can be a peptide nucleic acid (PNA). In a PNA, the sugar-backbone of a polynucleotide can be replaced with an amide containing backbone, in particular an aminoethylglycine backbone. The nucleotides can be retained and are bound directly or indirectly to aza nitrogen atoms of the amide portion of the backbone. The backbone in PNA compounds can comprise two or more linked aminoethylglycine units which gives PNA an amide containing backbone. The heterocyclic base moieties can be bound directly or indirectly to aza nitrogen atoms of the amide portion of the backbone.

A nucleic acid can comprise a morpholino backbone structure. For example, a nucleic acid can comprise a 6-membered morpholino ring in place of a ribose ring. In some of these embodiments, a phosphorodiamidate or other non-phosphodiester internucleoside linkage can replace a phosphodiester linkage.

A nucleic acid can comprise linked morpholino units (i.e. morpholino nucleic acid) having heterocyclic bases attached to the morpholino ring. Linking groups can link the morpholino monomeric units in a morpholino nucleic acid. Non-ionic morpholino-based oligomeric compounds can have less undesired interactions with cellular proteins. Morpholino-based polynucleotides can be nonionic mimics of nucleic acids. A variety of compounds within the morpholino class can be joined using different linking groups. A further class of polynucleotide mimetic can be referred to as cyclohexenyl nucleic acids (CeNA). The furanose ring normally present in a nucleic acid molecule can be replaced with a cyclohexenyl ring. CeNA DMT protected phosphoramidite monomers can be prepared and used for oligomeric compound synthesis using phosphoramidite chemistry. The incorporation of CeNA monomers into a nucleic acid chain can increase the stability of a DNA/RNA hybrid. CeNA oligoadenylates can form complexes with nucleic acid complements with similar stability to the native complexes. A further modification can include Locked Nucleic Acids (LNAs) in which the 2′-hydroxyl group is linked to the 4′ carbon atom of the sugar ring thereby forming a 2′-C,4′-C-oxymethylene linkage thereby forming a bicyclic sugar moiety. The linkage can comprise, consist of, or consist essentially of a methylene (—CH2-), group bridging the 2′ oxygen atom and the 4′ carbon atom wherein n is 1 or 2. LNA and LNA analogs can display very high duplex thermal stabilities with complementary nucleic acid (Tm=+3 to +10° C.), stability towards 3′-exonucleolytic degradation and good solubility properties.

A nucleic acid can also, in some embodiments, include nucleobase (often referred to simply as “base”) modifications or substitutions. As used herein, “unmodified” or “natural” nucleobases can include the purine bases, (e.g. adenine (A) and guanine (G)), and the pyrimidine bases, (e.g. thymine (T), cytosine (C) and uracil (U)). Modified nucleobases can include other synthetic and natural nucleobases such as 5-methylcytosine (5-me-C), 5-hydroxymethyl cytosine, xanthine, hypoxanthine, 2-aminoadenine, 6-methyl and other alkyl derivatives of adenine and guanine, 2-propyl and other alkyl derivatives of adenine and guanine, 2-thiouracil, 2-thiothymine and 2-thiocytosine, 5-halouracil and cytosine, 5-propynyl (—C═C—CH3) uracil and cytosine and other alkynyl derivatives of pyrimidine bases, 6-azo uracil, cytosine and thymine, 5-uracil (pseudouracil), 4-thiouracil, 8-halo, 8-amino, 8-thiol, 8-thioalkyl, 8-hydroxyl and other 8-substituted adenines and guanines, 5-halo particularly 5-bromo, 5-trifluoromethyl and other 5-substituted uracils and cytosines, 7-methylguanine and 7-methyladenine, 2-F-adenine, 2-aminoadenine, 8-azaguanine and 8-azaadenine, 7-deazaguanine and 7-deazaadenine and 3-deazaguanine and 3-deazaadenine. Modified nucleobases can include tricyclic pyrimidines such as phenoxazine cytidine(1H-pyrimido(5,4-b)(1,4)benzoxazin-2(3H)-one), phenothiazine cytidine (1H-pyrimido(5,4-b)(1,4)benzothiazin-2(3H)-one), G-clamps such as a substituted phenoxazine cytidine (e.g. 9-(2-aminoethoxy)-H-pyrimido(5,4-(b) (1,4)benzoxazin-2(3H)-one), carbazole cytidine (2H-pyrimido(4,5-b)indol-2-one), pyridoindole cytidine (Hpyrido(3′,4,5)pyrrolo[2,3-d]pyrimidin-2-one).

Samples

As used herein, the term “sample” refers to a composition comprising targets. Suitable samples for analysis by the disclosed methods, devices, and systems include, but are not limited to, cells, single cells, tissues, organs, or organisms. In some embodiments, a sample comprises raw or unprocessed samples, for example a whole cell, whole population of cells, or whole tissue. In some embodiments, a sample comprises an isolated cell or cell extract, or a nucleic-acid-containing fraction thereof, for example isolated nucleic acids, or a composition comprising enriched or isolated nucleic acids. In some embodiments, a sample comprises a fixed tissue, cell, or nucleic-acid-containing fraction thereof. In some embodiments, a sample comprises a frozen tissue, cell, or nucleic-acid-containing fraction thereof. In some embodiments, a sample comprises a solution comprising nucleic acids. In some embodiments, a sample comprises a solution comprising nucleic acids. In some embodiments, a sample comprises nucleic acids in a solid format, for example lyophilized nucleic acids and the like.

Unique Oligonucleotide Species

As used in compositions, methods, and oligonucleotides in accordance with some embodiments herein, a “unique oligonucleotide species” refers to an oligonucleotide, for example DNA or RNA, having a sequence that differs by at least one base from another unique oligonucleotide species. The unique oligonucleotide species of a composition in accordance with some embodiments herein can share certain structural features or formats, but can have different nucleic acid sequences from each other. The unique oligonucleotide species can be single-stranded or double-standed. A composition can comprise two or more unique oligonucleotide species, for example a diversity of 100, 1000, 6500, or 65,000 unique oligonucleotide species. Optionally, the composition comprising unique oligonucleotide species can also comprise two or more oligonucleotides of the same unique oligonucleotide species. By way of example a composition can comprise two unique oligonucleotide species: ACTT-X and TCTT-X, in which “X” is a sequence that is the same for both unique oligonucleotide species. It would be possible for the composition to comprise two copies of an oligonucleotide having the sequence ACTT-X, and one copy of an oligonucleotide having the sequence TCTT-X.

The oligonucleotide species of the compositions, methods, and oligonucleotides of some embodiments herein can comprise a barcode region and a uniform region as described herein. The barcode regions can differ between unique oligonucleotide species, so as to provide diversity in a population of unique oligonucleotide species, while the uniform regions remain the same. The barcode region can comprise a molecule index as described herein. The molecule index can be configured to minimize bias, for example by minimizing G content so that no unique oligonucleotide species in a population of unique oligonucleotide species has a molecule index with a G content greater than 50%, and/or so that the sequence “GG” does not appear in the molecule index (e.g. so that there are no two consecutive G's). Optionally, the barcode region comprises a sample index. The sample index can be configured so that the unique oligonucleotides of a given pool can have the same sample index, but different molecule indices. As such, if multiple samples are analyzed, the sample index can indicate which sample each oligonucleotide corresponds to. As such, after unique oligonucleotide species bind to target, the unique oligonucleotide species can be pooled, and sequences can be analyzed. In some embodiments, the sample index is 5′ of the molecule index. In some embodiments, the molecule index is 5′ of the sample index. Optionally, the unique oligonucleotide species comprises an adapter. The adapter can be positioned 5′ of the barcode region. In some embodiments, the adapter is configured to immobilize the unique oligonucleotide species on a substrate.

Without being limited by any theory, it is contemplated that unique oligonucleotides species configured in accordance with some embodiments herein can yield accurate analysis and sequencing results with reduced, minimal, or no bias, for example by minimizing G content in the molecule barcode or barcode region, and/or minimizing G content near the uniform region. By way of example, a reduction in bias can be ascertained by a reduction in noise and/or an increase in sensitivity for detecting the number of different molecules of a target nucleic acid in a sample (see, e.g.). In some embodiments, the reduction in bias can be ascertained as a decrease in noise (see), for example a smaller standard error in the number of target nucleic acid molecules detected in the sample. In some embodiments, the reduction in bias can be ascertained as an increase in sensitivity (see), for example detecting a larger number of the different molecules of a target nucleic acid in a sample (e.g. fewer target nucleic acid molecules are “missed”. As such, in some embodiments, compositions and methods yield quantification of target nucleic acid molecules in a sample with low noise, for example a relative standard error less than 30%, for example, less than 30%, 25%, 20%, 15%, 10%, 5%, 4%, 3%, 2%, 1%, or 0.01%, including ranges beween any two of the listed values. In some embodiments, compositions and methods yield quantification of target nucleic acid molecules in a sample with high sensitivity, for example a sensitivity (measured as a percent of the actual number of different target nucleic acids in a sample that are detected), for example, a sensitivity of at least 70%, 75%, 80%, 85%, 90%, 85%, 96%, 97%, 98%, 99%, or 99.9%, including ranges between any two of the listed values. In some embodiments, the methods or compositions described herein reduce bias by increasing sensitivity, decreasing relative standard error, or increasing sensitivity and reducing standard error relative to compositions or methods comprising unique oligonucleotide species in line with.

illustrates a conventional oligonucleotide design. It is noted that the molecule index sequence NNNNNNNN (in which each N is A, G, C, or T, and in which any two of the N's can be the same or different from each other) can comprise two or more consecutive G's, and/or can have a G content greater than 50%. For example, a subset of the population of unique oligonucleotide species in line withcan be G-rich. Without being limited by any theory, it is contemplated that at least some unique oligonucleotide species in a population based on the configuration ofcould be favored, leading to bias. Another example of “baseline” molecule index that can be subject to bias is the sequence ‘BBBBBBV’ (in which B is C, G, or T, and in which V is A, C, or G). Examples of configurations for unique oligonucleotide species in accordance with some embodiments herein are illustrated in. As shown in, addition of either TTT or TTTTT after the molecule index in accordance with some embodiments herein can reduce G-richness along the region (the length of non-G spacers can be variable, and can comprise any non-G nucleotide or nucleotides). In, the sample index and molecule positions are switched (so that the molecule index is 5′ of the sample index), so that possible high G-rich MIs in at least a subset of unique oligonucleotide species would not be adjacent to potentially G-rich gene specific target regions. In, addition of either TTT or TTTTT to the configuration illustrated incan further minimize potential G-rich regions. Without being limited by any theory, it is noted that that sample barcodes frequently can have low G contents (e.g. so as to be “non-G-rich”). It is further noted that the length of non-G spacers can be variable and can be any non-G nucleotide). In the configurations of, molecule barcodes comprise ‘HHHHHHHH’ or ‘HNHNHNHN,’ (in which H is A, C, or T, and in which any two H's can be the same or different from each other, and in which any two N's can be the same or different from each other).

In some embodiments, each of a plurality of unique oligonucleotide species (e.g., each of the unique oligonucleotide species) in a composition or method has a length of at least 24 nucleotides, for example at least 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, or 140 nucleotides in length, including ranges between any two of the listed values, for example 24-140, 24-135, 24-130, 24-125, 24-120, 24-115, 24-110, 24-105, 24-100, 24-95, 24-90, 24-85, 24-80, 24-75, 24-70, 24-65, 24-60, 24-55, 24-50, 24-45, 24-40, 25-140, 25-135, 25-130, 25-125, 25-120, 25-115, 25-110, 25-105, 25-100, 25-95, 25-90, 25-85, 25-80, 25-75, 25-70, 25-65, 25-60, 25-55, 25-50, 25-45, 25-40, 27-140, 27-135, 27-130, 27-125, 27-120, 27-115, 27-110, 27-105, 27-100, 27-95, 27-90, 27-85, 27-80, 27-75, 27-70, 27-65, 27-60, 27-55, 27-50, 27-45, 27-40, 30-140, 30-135, 30-130, 30-125, 30-120, 30-115, 30-110, 30-105, 30-100, 30-95, 30-90, 30-85, 30-80, 30-75, 30-70, 30-65, 30-60, 30-55, 30-50, 30-45, 30-40, 35-140, 35-135, 35-130, 35-125, 35-120, 35-115, 35-110, 35-105, 35-100, 35-95, 35-90, 35-85, 35-80, 35-75, 35-70, 35-65, 35-60, 35-55, 35-50, 35-45, 35-40, 40-140, 40-135, 40-130, 40-125, 40-120, 40-115, 40-110, 40-105, 40-100, 40-95, 40-90, 40-85, 40-80, 40-75, 40-70, 40-65, 40-60, 40-55, 40-50, or 40-45 nucleotides in length. Optionally, different unique oligonucleotide species in a composition or method have different lengths form each other. Optionally, all of the unique oligonucleotide species in a composition or method have the same length as each other. Optionally, some unique oligonucleotide species in a composition or method are the same length as each other, while some unique oligonucleotide species are different lengths from each other.

In some embodiments, the uniform region comprises, consists of, or consists essentially of a 5′ to 3′ amplification sequence for a target nucleic acid, or class of target nucleic acids (this amplification sequence may also be referred to as a “target-specific” region). For example, if the target nucleic acids comprise mRNAs, the uniform region can comprise oligo dT. For example, if the target nucleic acids comprise variable regions of a T cell receptor, the uniform region can comprise sequences flanking variable regions of a T cell receptor mRNA. In some embodiments, the uniform region comprises at least 10 nucleotides that are complementary to the target nucleic acid, for example at least 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides complementary to the target, including ranges between any two of the listed values, for example 10-30, 10-29, 10-28, 10-27, 10-26, 10-25, 10-24, 10-23, 10-22, 10-21, 10-20, 11-30, 11-29, 11-28, 11-27, 11-26, 11-25, 11-24, 11-23, 11-22, 11-21, 11-20, 12-30, 12-29, 12-28, 12-27, 12-26, 12-25, 12-24, 12-23, 12-22, 12-21, 12-20, 15-30, 15-29, 15-28, 15-27, 15-26, 15-25, 15-24, 15-23, 15-22, 15-21, 15-20, 20-30, 20-29, 20-28, 20-27, 20-26, 20-25, 20-24, 20-23, 20-22, or 20-21 nucleotides. In some embodiments, the uniform region comprises a target-specific region that hybridizes to a sequence flanking a sequence encoding a variable region of an immune cell receptor, for example a variable region of a T cell receptor, a B cell receptor, or immunoglobulin, for example an antibody. It is noted that because B cell receptors comprise membrane-bound immunoglobulin, target-specific regions specific for immunoglobulin variable region coding sequences are typically suitable for amplifying B cell receptors as well as secreted immunoglobulins (e.g. antibodies). Both options are noted herein to clarify that amplification of membrane-bound immunoglobulins (B cell receptors) and also secreted immunoglobulins (antibodies) is contemplated. As used herein, it will be understood that when primers or uniform regions comprise target-specific regions that comprise “flanking sequences” (and variations of this root term, such as “sequences flanking”) of immune cell receptor and/or immunoglobulin variable regions, the target-specific regions will be understood to comprise at least one of (i) sequences that hybridize downstream (3′) of the sequence encoding the variable region and in particular hybridize to strand of the coding sequence, and thus are configured to produce a strand comprising the reverse complement of the variable region coding sequence upon extension in the 5′ to 3′ direction; or (ii) sequences that hybridize upstream (5′) of the sequence encoding the variable region and in particular hybridize to the strand complementary to that of the coding sequence, and thus are configured to produce a strand comprising the variable region coding sequence upon extension in the 5′ to 3′ direction. Thus, a flanking sequence can be configured for amplification of the variable region coding sequences in conjunction with a suitable priming partner (e.g. a flanking sequence on the other side of the variable region). It will be understood that a flanking sequences does not necessarily need to stop or start exactly where the coding sequence starts or stops, and thus, it is permissible for there to be intervening sequences between a hybridization site of a flanking sequence and the variable region coding sequence itself. It will be understood that while a flanking sequence will generally hybridize to a sequence outside of the variable region coding sequence so as to amplify a broad range of possible variable region coding sequences, in some embodiments, the variable region “flanking sequence” further comprises some sequence of the variable region itself, for example if a subset of possible variable regions is of interest. However, a “flanking sequence” as used herein does not require a single sequence to flank both sides of the variable region. Rather, it will be understood that when flanking sequences are mentioned in conjunction with compositions, methods, and oligonucleotides of some embodiments herein, 5′ and 3′ sequences comprising suitable primer pairs to amplify the variable region coding sequence are also expressly contemplated.

In some embodiments, a unique oligonucleotide species comprises a barcode region as described herein and also comprises a uniform region comprising a target-specific region comprising a sequence flanking an immune cell receptor and/or immunoglobulin variable region coding sequence. A second oligonucleotide primer can also be provided for the other side of the variable region coding sequence, so as to amplify the variable region sequence in conjunction with the target-specific region of the uniform region. In some embodiments, the uniform region comprises a target-specific region, positioned 3′ of the barcode region, and comprising a sequence flanking an immunoglobulin variable region (and thus flanking a B cell receptor variable region as well as a corresponding antibody variable region), for example flanking the variable region of an immunoglobulin heavy chain locus, flanking the variable region of an immunoglobulin (light chain) kappa locus, or flanking the variable region of an immunoglobulin (light chain) lambda locus. In some embodiments, the uniform region comprises a target-specific region, positioned 3′ of the barcode region, and comprising a sequence flanking at least one of a variable region of a T cell receptor alpha chain, a variable region of a T cell receptor beta chain, a variable region of a T cell receptor gamma chain, or a T cell receptor delta chain.

In some embodiments, a kit comprises a composition comprising unique oligonucleotide species as described herein, in which the unique oligonucleotide species each comprise a uniform sequence flanking an immune cell receptor or immunoglobulin variable region coding sequence as described herein. In some embodiments, the kit further comprises an oligonucleotide primer configured to hybridize to the opposite strand and on the other side of the variable region coding sequence compared to the uniform region, and is thus configured to amplify the variable region sequence in conjunction with the target-specific region of the uniform region. In some embodiments, the amplified sequence is at least 1 kb and comprises the variable coding sequence, for example at least 1 kb, 2 kb, 3 kb, 4 kb, or 5 kb, including ranges between any two of the listed values.

It is noted the nucleic acids encoding variable regions of some immune cell receptors or immunoglobulins can be more than 1 kb long. For example, T cell receptor variable region sequences can comprise a CDR3 coding sequence that ends more than 1 kb away from where the CDR1 coding sequence begins. Without being limited by theory, it is noted that some conventional and next-generation sequencing approaches, for example sequencing by synthesis, are limited to short reads that are typically considerably less than 1 kb. Accordingly, it is contemplated that methods and compositions and kits in accordance with some embodiments herein can be useful for barcoding and analyzing nucleic acids encoding immune cell receptor and/or immunoglobulin variable regions, which otherwise would not be amenable to single-read sequencing of less than 1 kb. Accordingly, in some embodiments, the unique oligonucleotide species comprises a sequence flanking an immune cell receptor or immunoglobulin variable region coding sequence, is configured to amplify a sequence of at least 1 kb and comprising the variable region coding sequence, for example at least 1 kb, 2 kb, 3 kb, 4 kb, or 5 kb, including ranges between any two of the listed values.

Barcode Regions

In accordance with compositions, methods, and oligonucleotides of some embodiments herein, a barcode region comprises a nucleic acid sequence that is useful in identifying a nucleic acid, for example a target nucleic acid from a sample, or an amplicon or reverse-transcript derived from a single target nucleic acid of a sample. For example, two mRNA transcripts from a sample can be reverse-transcribed and barcoded so that nucleic acids corresponding to the first mRNA comprise a first barcode, and nucleic acids corresponding to the second mRNA comprise a second barcode. Upon sequencing (or other analysis), information about the individual mRNAs in the sample, for example copy number can be ascertained even after amplification. However, if a large population of mRNAs is stochastically labeled and some barcodes are represented more favorably (for example due to stability, amplification efficiency, etc.), bias can result, skewing the ability to quantify nucleic acids of a sample. As such, in accordance with some embodiments herein, each unique oligonucleotide species in a population can comprise a unique barcode region. The greater the diversity of barcodes, the greater the diversity of unique oligonucleotide species, and the greater the probability that a particular barcode sequence will be associated with only one target nucleic acid of a sample. The barcode region can be, for example, positioned 5′ of a uniform region on an oligonucleotide species. In some embodiments, a barcode region comprises a molecule barcode as described herein. Optionally, a barcode region comprises a molecule barcode and a sample barcode as described herein. Optionally, a barcode region comprises a molecule barcode 5′ of a sample barcode. In some embodiments, methods or compositions comprising unique oligonucleotide species comprising molecule barcodes as described herein reduce bias by increasing sensitivity, decreasing relative standard error, or increasing sensitivity and reducing standard error. As used herein, a “molecule barcode” can also be referred to as a “molecular barcode,” “Molecular Index (MI)”, or Unique Molecular Identifier (UMI). As used herein, a “sample barcode,” can also be referred to as a “Sample Index (SI).”

A barcode region can comprise a molecule barcode. The molecule barcode can comprise a unique sequence, so that when multiple sample nucleic acids (which can be the same and/or different from each other) are associated one-to-one with molecule barcodes, different sample nucleic acids can differentiated from each other by the molecule barcodes. As such, even if a sample comprises two nucleic acids having the same sequence, each of these two nucleic acids can be labeled with a different molecule barcode, so that nucleic acids in the population can be quantified, even after amplification. The molecule barcode can comprise a nucleic acid sequence of at least 5 nucleotides, for example at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides, including ranges between any two of the listed values, for example 5-50, 5-45, 5-40, 5-35, 5-30, 5-25, 5-20, 5-15, 5-14, 5-13, 5-12, 5-11, 5-10, 5-9, 5-8, 5-7, 5-6, 6-50, 6-45, 6-40, 6-35, 6-30, 6-25, 6-20, 6-15, 6-14, 6-13, 6-12, 6-11, 6-10, 6-9, 6-8, 6-7, 7-50, 7-45, 7-40, 7-35, 7-30, 7-25, 7-20, 7-15, 7-14, 7-13, 7-12, 7-11, 7-10, 7-9, 7-8, 8-50, 8-45, 8-40, 8-35, 8-30, 8-25, 8-20, 8-15, 8-14, 8-13, 8-12, 8-11, 8-10, 8-9, 9-50, 9-45, 9-40, 9-35, 9-30, 9-25, 9-20, 9-15, 9-14, 9-13, 9-12, 9-11, 9-10, 10-50, 10-45, 10-40, 10-35, 10-30, 10-25, 10-20, 10-15, 10-14, 10-13, 10-12, or 10-11 nucleotides. In some embodiments, the nucleic acid sequence of the molecule barcode comprises a unique sequence, for example, so that each unique oligonucleotide species in a composition comprises a different molecule barcode. In some embodiments, two or more unique oligonucleotide species can comprise the same molecule barcode, but still differ from each other. For example, if the unique oligonucleotide species include sample barcodes, each unique oligonucleotide species with a particular sample barcode can comprise a different molecule barcode. In some embodiments, a composition comprising unique oligonucleotide species comprises a molecule barcode diversity of at least 1000 different molecule barcodes, and thus at least 1000 unique oligonucleotide species. In some embodiments, a composition comprising unique oligonucleotide species comprises a molecule barcode diversity of at least 6,500 different molecule barcodes, and thus at least 6,500 unique oligonucleotide species. In some embodiments, a composition comprising unique oligonucleotide species comprises a molecule barcode diversity of at least 65,000 different molecule barcodes, and thus at least 65,000 unique oligonucleotide species.

Without being limited by any theory, it is contemplated that a molecule barcode comprising a low G content (e.g., 50% G or less, for example, less than 50% G, 45% G, 40% G, 35% G, 30% G, 25% G, 20% G, 15% G, 12.5% G, 10% G, 7.5% G, 5% G, or 2.5% G) can minimize bias for a composition or pool of unique oligonucleotide species being used to barcode a population of nucleic acids (e.g. minimize bias that would preferentially amplify barcodes comprising a higher G content). It is noted that conventional barcoding approaches typically exhibit a bias in favor of higher G content. For example, andC illustrate samples of nucleotide usage in conventional molecule barcodes of compositions comprising numerous unique molecule barcodes for ES32, TRAC (), ES32 TRBC (), and ES32 OligodT (). That is, if anything, conventional molecule barcodes and barcode regions, designed without respect to certain guidance provided herein, can comprise a higher G-content that would be expected by random chance. In some embodiments, methods or compositions comprising unique oligonucleotide species comprising molecule barcodes as described herein reduce bias by increasing sensitivity, decreasing relative standard error, or increasing sensitivity and reducing standard error.

In some embodiments, all of the molecule barcodes of a composition or composition used in the methods described herein comprise unique oligonucleotide species collectively have a G content of 50% G or less, for example, less than 50% G, 45% G, 40% G, 35% G, 30% G, 25% G, 20% G, 15% G, 12.5% G, 10% G, 7.5% G, 5% G, or 2.5% G, or 0% G, including ranges between any two of the listed values, for example 2.5-50% G, 2.5-45% G, 2.5-40% G, 2.5-35% G, 2.5-30% G, 2.5-25% G, 2.5-20% G, 2.5-15% G, 2.5-10% G, 2.5-7.5% G, 2.5-5% G, 5-50% G, 5-45% G, 5-40% G, 5-35% G, 5-30% G, 5-25% G, 5-20% G, 5-15% G, 5-10% G, 5-7.5% G, 7.5-50% G, 7.5-45% G, 7.5-40% G, 7.5-35% G, 7.5-30% G, 7.5-25% G, 7.5-20% G, 7.5-15% G, 7.5-10% G, 10-50% G, 10-45% G, 10-40% G, 10-35% G, 10-30% G, 10-25% G, 10-20% G, 10-15% G, 10-12.5% a G, 12.5-50% G, 12.5-45% G, 12.5-40% G, 12.5-35% G, 12.5-30% G, 12.5-25% G, 12.5-20% G, 12.5-15% G, 15-50% G, 15-45% G, 15-40% G, 15-35% G, 15-30% G, 15-25% G, 15-20% G, 20-50% G, 20-45% G, 20-40% G, 20-35% G, 20-30% G, or 20-25% G. By “all of the molecule barcodes of a composition of unique oligonucleotide species collectively have a G content of . . . ”, it is meant that if the total G content among all of molecule barcodes in the whole composition was calculated (e.g., a population of at least 10, 50, 100, 200, 500, 1000, 2000, 5000, 6500, 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, or 65,000 unique oligonucleotide species), this total G content of the sum total of the barcodes would fall below the recited values or within the recited ranges. While it would still be possible for an individual unique oligonucleotide species to have a molecule barcode with a G content above the indicated value or outside the indicated range, the collective nucleotide content of the unique oligonucleotide species of the composition would be below the indicated value or within the indicated range. In some embodiments, all of the molecule barcodes in a composition comprising at least 1000 unique oligonucleotide species collectively have a G content of 50% G or less, for example, less than 50% G, 45% G, 40% G, 35% G, 30% G, 25% G, 20% G, 15% G, 12.5% G, 10% G, 7.5% G, 5% G, or 2.5% G, or 0% G, including ranges between any two of the listed values. In some embodiments, all of the molecule barcodes in a composition comprising at least 6500 unique oligonucleotide species collectively have a G content of 50% G or less, for example, less than 50% G, 45% G, 40% G, 35% G, 30% G, 25% G, 20% G, 15% G, 12.5% G, 10% G, 7.5% G, 5% G, or 2.5% G, or 0% G, including ranges between any two of the listed values. In some embodiments, all of the molecule barcodes in a composition comprising at least 65,000 unique oligonucleotide species collectively have a G content of 50% G or less, for example, less than 50% G, 45% G, 40% G, 35% G, 30% G, 25% G, 20% G, 15% G, 12.5% G, 10% G, 7.5% G, 5% G, or 2.5% G, or 0% G, including ranges between any two of the listed values. In some embodiments, all of the barcode regions of the composition as described herein collectively have a G content of less than 50% as described herein.

In some embodiments, for a composition comprising unique oligonucleotide species (or such a composition used in a method), the composition consists of or consists essentially of unique oligonucleotide species that each comprise a molecule barcode G content of 50% G or less, for example, less than 50% G, 45% G, 40% G, 35% G, 30% G, 25% G, 20% G, 15% G, 12.5% G, 10% G, 7.5% G, 5% G, or 2.5% G, or 0% G, including ranges between any two of the listed values, for example 2.5-50% G, 2.5-45% G, 2.5-40% G, 2.5-35% G, 2.5-30% G, 2.5-25% G, 2.5-20% G, 2.5-15% G, 2.5-10% G, 2.5-7.5% G, 2.5-5% G, 5-50% G, 5-45% G, 5-40% G, 5-35% G, 5-30% G, 5-25% G, 5-20% G, 5-15% G, 5-10% G, 5-7.5% G, 7.5-50% G, 7.5-45% G, 7.5-40% G, 7.5-35% G, 7.5-30% G, 7.5-25% G, 7.5-20% G, 7.5-15% G, 7.5-10% G, 10-50% G, 10-45% G, 10-40% G, 10-35% G, 10-30% G, 10-25% G, 10-20% G, 10-15% G, 10-12.5% G, 12.5-50% G, 12.5-45% G, 12.5-40% G, 12.5-35% G, 12.5-30% G, 12.5-25% G, 12.5-20% G, 12.5-15% G, 15-50% G, 15-45% G, 15-40% G, 15-35% G, 15-30% G, 15-25% G, 15-20% G, 20-50% G, 20-45% G, 20-40% G, 20-35% G, 20-30% G, or 20-25% G. By “the composition consists of or consists essentially of unique oligonucleotide species that each have a molecule barcodes G content less than . . . ”, it is mean that each or essentially each of the unique oligonucleotide species in a composition, population, or pool have a molecule barcode G content below the indicated value, or outside the indicated range. That is, for a composition, population, or pool “consisting essentially of” unique oligonucleotide species that each have the indicated G content, it would be possible for an analytically insignificant portion of the unique oligonucleotides in the composition to have molecule barcodes with a G content above the indicated value or outside the recited range. For example, the analytically insignificant portion of the unique oligonucleotides can be, or can be no more than, 5%, 4%, 3%, 2%, 1%, 0.5%, 0.4%, 0.3%, 0.2%, 0.1%, or less of the unique oligonucleotide in a composition. In some embodiments, less than 1% of the unique oligonucleotide species in a composition comprising at least 1000 unique oligonucleotides comprise a molecule barcode having a G content of greater than 50%, for example less than 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, 0.1%, 0.05%, 0.01%, 0.001%, or 0.0001% of the unique oligonucleotide species have the a G content of greater than 50%. In some embodiments, less than 1% of the unique oligonucleotide species in a composition comprising at least 6500 unique oligonucleotides comprise a molecule barcode having a G content of greater than 50%, for example less than 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, 0.1%, 0.05%, 0.01%, 0.001%, or 0.0001% of the unique oligonucleotide species have a G content of greater than 50%. In some embodiments, less than 1% of the unique oligonucleotide species in a composition comprising at least 65,000 unique oligonucleotides comprise a molecule barcode having a G content of greater than 50%, for example less than 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, 0.1%, 0.05%, 0.01%, 0.001%, or 0.0001% of the unique oligonucleotide species have a G content of greater than 50%. In some embodiments, less than 1% of the unique oligonucleotide species in a composition comprising at least 1000 unique oligonucleotides comprise a molecule barcode having a G content of greater than 25%, for example less than 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, 0.1%, 0.05%, 0.01%, 0.001%, or 0.0001% of the unique oligonucleotide species have a G content of greater than 25%. In some embodiments, less than 1% of the unique oligonucleotide species in a composition of at least 6500 unique oligonucleotides comprise a molecule barcode having a G content of greater than 25%, for example less than 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, 0.1%, 0.05%, 0.01%, 0.001%, or 0.0001% of the unique oligonucleotide species have a G content of greater than 25%. In some embodiments, less than 1% of the unique oligonucleotide species in a composition, population, or pool of at least 65,000 unique oligonucleotides comprise a molecule barcode having a G content of greater than 25%, for example less than 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, 0.1%, 0.05%, 0.01%, 0.001%, or 0.0001% of the unique oligonucleotide species have a G content of greater than 25%. Optionally, none of the molecule barcodes of the unique oligonucleotide species of the composition collectively have a G content of greater than 50% G, for example, all of the molecule barcodes of the unique oligonucleotide species have a G content less than 50% G, 45% G, 40% G, 35% G, 30% G, 25% G, 20% G, 15% G, 12.5% G, 10% G, 7.5% G, 5% G, or 2.5% G, or 0% G, including ranges between any two of the listed values.

In some embodiments, the composition as described herein (or such a composition as used in a method) consists of or consists essentially of unique oligonucleotide species that each comprise a barcode region G content of less than 50% as described herein. In some embodiments, for a composition comprising unique oligonucleotide species, the composition consists of or consists essentially of unique oligonucleotide species that each comprise a barcode region G content of 50% G or less, for example, less than 50% G, 45% G, 40% G, 35% G, 30% G, 25% G, 20% G, 15% G, 12.5% G, 10% G, 7.5% G, 5% G, or 2.5% G, or 0% G, including ranges between any two of the listed values, for example 2.5-50% G, 2.5-45% G, 2.5-40% G, 2.5-35% G, 2.5-30% G, 2.5-25% G, 2.5-20% G, 2.5-15% G, 2.5-10% G, 2.5-7.5% G, 2.5-5% G, 5-50% G, 5-45% G, 5-40% G, 5-35% G, 5-30% G, 5-25% G, 5-20% G, 5-15% G, 5-10% G, 5-7.5% G, 7.5-50% G, 7.5-45% G, 7.5-40% G, 7.5-35% G, 7.5-30% G, 7.5-25% G, 7.5-20% G, 7.5-15% G, 7.5-10% G, 10-50% G, 10-45% G, 10-40% G, 10-35% G, 10-30% G, 10-25% G, 10-20% G, 10-15% G, 10-12.5% G, 12.5-50% G, 12.5-45% G, 12.5-40% G, 12.5-35% G, 12.5-30% G, 12.5-25% G, 12.5-20% G, 12.5-15% G, 15-50% G, 15-45% G, 15-40% G, 15-35% G, 15-30% G, 15-25% G, 15-20% G, 20-50% G, 20-45% G, 20-40% G, 20-35% G, 20-30% G, or 20-25% G. By “the composition consists of or consists essentially of unique oligonucleotide species that each have a barcode regions G content less than . . . ”, it is mean that each or essentially each of the unique oligonucleotide species in a composition, population, or pool have a barcode region G content below the indicated value, or outside the indicated range. That is, for a composition, population, or pool “consisting essentially of” unique oligonucleotide species that each have the indicated G content, it would be possible for an analytically insignificant portion of the unique oligonucleotides in the composition to have barcode regions with a G content above the indicated value or outside the recited range. In some embodiments, less than 1% of the unique oligonucleotide species in a composition comprising at least 1000 unique oligonucleotides comprise a barcode region having a G content of greater than 50%, for example less than 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, 0.1%, 0.05%, 0.01%, 0.001%, or 0.0001% of the unique oligonucleotide species have a G content greater than 50%, including ranges between any two of the listed values. In some embodiments, less than 1% of the unique oligonucleotide species in a composition comprising at least 6500 unique oligonucleotides comprise a barcode region having a G content of greater than 50%, for example less than 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, 0.1%, 0.05%, 0.01%, 0.001%, or 0.0001% of the unique oligonucleotide species have a G content greater than 50%, including ranges between any two of the listed values. In some embodiments, less than 1% of the unique oligonucleotide species in a composition comprising at least 65,000 unique oligonucleotides comprise a barcode region having a G content of greater than 50%, for example less than 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, 0.1%, 0.05%, 0.01%, 0.001%, or 0.0001% of the unique oligonucleotide species have a G content greater than 50%, including ranges between any two of the listed values. In some embodiments, less than 1% of the unique oligonucleotide species in a composition comprising at least 1000 unique oligonucleotides comprise a barcode region having a G content of greater than 25%, for example less than 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, 0.1%, 0.05%, 0.01%, 0.001%, or 0.0001% of the unique oligonucleotide species have a G content greater than 25%, including ranges between any two of the listed values. In some embodiments, less than 1% of the unique oligonucleotide species in a composition of at least 6500 unique oligonucleotides comprise a barcode region having a G content of greater than 25%, for example less than 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, 0.1%, 0.05%, 0.01%, 0.001%, or 0.0001% of the unique oligonucleotide species have a G content greater than 50%, including ranges between any two of the listed values. In some embodiments, less than 1% of the unique oligonucleotide species in a composition, population, or pool of at least 65,000 unique oligonucleotides comprise a barcode region having a G content of greater than 25%, for example less than 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, 0.1%, 0.05%, 0.01%, 0.001%, or 0.0001% of the unique oligonucleotide species have a G content greater than 25%, including ranges between any two of the listed values. Optionally, none of the barcode regions of the unique oligonucleotide species of the composition collectively have a G content of greater than 50% G, for example, all of the barcode regions of the unique oligonucleotide species have a G content less than 50% G, 45% G, 40% G, 35% G, 30% G, 25% G, 20% G, 15% G, 12.5% G, 10% G, 7.5% G, 5% G, or 2.5% G, or 0% G, including ranges between any two of the listed values.

In some embodiments, the composition (or such a composition as used in a method) consist of or consists essentially of unique oligonucleotide species that each have a molecule barcode comprising the sequence of at least three repeats of the doublet “HN” (in which each “H” is any of A, C, or T, and in which “N” is any of A, G, C, or T), for example at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 repeats, including ranges between any two of the listed values. Examples of multiple repeats of the doublet “HN” include HN, HNHN, HNHNHN, and HNHNHNHN. It is noted that while the formula “HN” describes constraints on the base content, not every H or every N has to be the same or different. For example, if the molecule barcodes of unique oligonucleotide species in a composition comprised HNHNHN, one molecule barcode can comprise the sequence ACTGCA, while another molecule barcode can comprise the sequence TAACTA, while another molecule barcode could comprise the sequence AGACAC. It is noted that any number of repeats of the doublet “HN” would have a G content of no more than 50%. In some embodiments, at least 95% of the unique oligonucleotide species of a composition comprising at least 1000 unique oligonucleotide species comprise molecule barcodes comprising at least three repeats of the doublet “HN,” for example at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 repeats, including ranges between any two of the listed values. In some embodiments, at least 99% of the unique oligonucleotide species of a composition comprising at least 1000 unique oligonucleotide species comprise molecule barcodes comprising at least three repeats of the doublet “HN,” for example at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 repeats, including ranges between any two of the listed values. In some embodiments, at least 99.9% of the unique oligonucleotide species of a composition comprising at least 1000 unique oligonucleotide species comprise molecule barcodes comprising at least three repeats of the doublet “HN,” for example at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 repeats, including ranges between any two of the listed values. In some embodiments, at least 95% of the unique oligonucleotide species of a composition comprising at least 6500 unique oligonucleotide species comprise molecule barcodes comprising at least three repeats of the doublet “HN,” for example at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 repeats, including ranges between any two of the listed values. In some embodiments, at least 99% of the unique oligonucleotide species of a composition comprising at least 6500 unique oligonucleotide species comprise molecule barcodes comprising at least three repeats of the doublet “HN,” for example at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 repeats, including ranges between any two of the listed values. In some embodiments, at least 99.9% of the unique oligonucleotide species of a composition comprising at least 6500 unique oligonucleotide species comprise molecule barcodes comprising at least three repeats of the doublet “HN,” for example at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 repeats, including ranges between any two of the listed values. In some embodiments, at least 95% of the unique oligonucleotide species of a composition comprising at least 65,000 unique oligonucleotide species comprise molecule barcodes comprising at least three repeats of the doublet “HN,” for example at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 repeats, including ranges between any two of the listed values. In some embodiments, at least 99% of the unique oligonucleotide species of a of composition comprising at least 65,000 unique oligonucleotide species comprise molecule barcodes comprising at least three repeats of the doublet “HN,” for example at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 repeats, including ranges between any two of the listed values. In some embodiments, at least 99.9% of the unique oligonucleotide species of a composition comprising at least 65,000 unique oligonucleotide species comprise molecule barcodes comprising at least three repeats of the doublet “HN,” for example at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 repeats, including ranges between any two of the listed values. In some embodiments, the composition consists of or consists essentially of at least 1000, 6500, or 65,000 unique oligonucleotide species that each have a molecule barcode comprising the sequence HNHNHN. In some embodiments, the composition consists of or consists essentially of of at least 1000, 6500, or 65,000 unique oligonucleotide species that each has a molecule barcode comprising the sequence HNHNHNHN. In some embodiments, at least 95%, 99%, or 99.9% of the barcode regions of the composition as described herein comprise at least three repeats of the doublet “HN” as described herein. Without being limited by any theory, it is noted that having a relatively large number of available nucleotides sequences for molecule barcodes can be helpful when barcoding a population of target nucleic acids from a sample, for example to increase the diversity of barcodes within a given sequence length along with the probability that each target nucleic acid will be uniquely labeled, while minimizing oligonucleotide species size. It is noted that limiting the G content of molecule barcodes and/or barcode regions can limit the diversity of these barcodes and barcode regions by decreasing the number of available nucleotides from which barcodes can be constructed (and the number of available different sequences per length of nucleic acid). As such, having some G′ s in molecule barcodes or barcode regions in accordance with various embodiments herein can be helpful in increasing diversity, while limiting the G content can be helpful minimizing bias. It is noted, and has been observed (see Example 2 and) that sequences comprising repeated “HN” doublets can yield low bias, while providing a compromise between reducing bias and maintaining a relatively large quantity of available nucleotide sequences, so that relatively high diversity can be obtained in a relatively short sequence, while still minimizing bias. In some embodiments, methods or compositions comprising unique oligonucleotide species comprising molecule barcodes comprising repeated “HN” doublets as described herein reduce bias by increasing sensitivity, decreasing relative standard error, or increasing sensitivity and reducing standard error.

In some embodiments, the composition (or such a composition as used in a method) comprises, consists of, or consists essentially of unique oligonucleotide species that each comprise a molecule barcode comprising at least six consecutive “H's” (in which each “H” is any of A, C, or T), for example at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 consecutive H's, including ranges between any two of the listed values. It is noted that while the formula “H” describes constraints on the base content, not every H has to be the same (or different). For example, if the molecule barcodes of unique oligonucleotide species in a population each comprised the sequence HHHH, one molecule barcode of a unique oligonucleotide species could comprise ACTA, one molecule barcode of another unique oligonucleotide species could comprise TTAC, and one molecule barcode of another unique oligonucleotide species could comprise ACAT. In some embodiments, a composition comprises, consists of, or consists essentially of at least 1000 unique oligonucleotide species, of which at least 95% comprise a molecule barcode that comprises at least six consecutive H's, for example at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 consecutive H's, including ranges between any two of the listed values. In some embodiments, a composition comprises, consists of, or consists essentially of at least 1000 unique oligonucleotide species, of which at least 99% comprise a molecule barcode that comprises at least six consecutive H's, for example at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 consecutive H's, including ranges between any two of the listed values. In some embodiments, a composition comprises, consists of, or consists essentially of at least 1000 unique oligonucleotide species, of which at least 99.9% comprise a molecule barcode that comprises at least six consecutive H's, for example at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 consecutive H's, including ranges between any two of the listed values. In some embodiments, a composition comprises, consists of, or consists essentially of at least 6500 unique oligonucleotide species, of which at least 95% comprise a molecule barcode that comprises at least six consecutive H's, for example at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 consecutive H's, including ranges between any two of the listed values. In some embodiments, a composition comprises, consists of, or consists essentially of at least 6500 unique oligonucleotide species, of which at least 99% comprise a molecule barcode that comprises at least six consecutive H's, for example at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 consecutive H's, including ranges between any two of the listed values. In some embodiments, a composition comprises, consists of, or consists essentially of at least 6500 unique oligonucleotide species, of which at least 99.9% comprise a molecule barcode that comprises at least six consecutive H's, for example at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 consecutive H's, including ranges between any two of the listed values. In some embodiments, a composition comprises, consists of, or consists essentially of at least 65,000 unique oligonucleotide species, of which at least 95% comprise a molecule barcode that comprises at least six consecutive H's, for example at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 consecutive H's, including ranges between any two of the listed values. In some embodiments, a composition comprises, consists of, or consists essentially of at least 65,000 unique oligonucleotide species, of which at least 99% comprise a molecule barcode that comprises at least six consecutive H's, for example at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 consecutive H's, including ranges between any two of the listed values. In some embodiments, a composition comprises, consists of, or consists essentially of at least 65,000 unique oligonucleotide species, of which at least 99.9% comprise a molecule barcode that comprises at least six consecutive H's, for example at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 consecutive H's, including ranges between any two of the listed values. In some embodiments, at least 95% of the molecule barcodes of the unique oligonucleotide species comprise a sequence totaling at least 6 alternating H's and N's, for example at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 alternating H's and N's, including ranges between any two of the listed values. In some embodiments, at least 95% of the molecule barcodes of the unique oligonucleotide species comprise the sequence HNHNHNHN, wherein each “H” is any one of A, C, or T, and wherein each “N” is any one of A, G, C, or T. In some embodiments, each molecule barcode of the unique oligonucleotide species comprises the sequence HNHNHNHN, wherein each “H” is any one of A, C, or T, and wherein each “N” is any one of A, G, C, or T. In some embodiments, each molecule barcode of the unique oligonucleotide species comprises the sequence HHHHHHHH, wherein each “H” is any one of A, C, or T.

In some embodiments, at least 95% (e.g., 95%, 96%, 97%, 98%, 99% or more) of the unique oligonucleotide species of the composition as described herein (or such a composition as used in a method) comprise barcode regions comprising at least 6 consecutive H's, for example at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 consecutive H's, including ranges between any two of the listed values as described herein. In some embodiments, at least 99% of the unique oligonucleotide species of the composition as described herein comprise barcode regions comprising at least 6 consecutive H's, for example at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 consecutive H's, including ranges between any two of the listed values as described herein. In some embodiments, at least 99.9% of the unique oligonucleotide species of the composition as described herein comprise barcode regions comprising at least 6 consecutive H's, for example at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 consecutive H's, including ranges between any two of the listed values as described herein.

In some embodiments, the sample barcode of each unique oligonucleotide species has a G content of 50% or less, for example 50% or less, 40% or less, 25% or less, 20% or less, 12.5% or less, 10% or less, or 5% or less, including ranges between any two of the listed values.

In some embodiments, the barcode region of each unique oligonucleotide species has a G content of 50% or less, for example 50% or less, 40% or less, 25% or less, 20% or less, 12.5% or less, 10% or less, or 5% or less, including ranges between any two of the listed values.

In some embodiments, the molecule barcodes of the unique oligonucleotide species collectively have a G content of less than 12.5%, for example less than 12.5%, less than 10%, less than 7.5%, less than 5%, less than 2.5%, or less than 1%, including ranges between any two of the listed values.

In some embodiments, the barcode regions of the unique oligonucleotide species collectively have a G content of less than 12.5%, for example less than 12.5%, less than 10%, less than 7.5%, less than 5%, less than 2.5%, or less than 1%, including ranges between any two of the listed values.

In some embodiments, for at least 95% of the unique oligonucleotide species, any G in the molecule barcode is not adjacent to another G. In some embodiments, for at least 99% of the unique oligonucleotide species, any G in the molecule barcode is not adjacent to another G. In some embodiments, for all or substantially all of the unique oligonucleotide species, any G in the molecule barcode is not adjacent to another G.

Each barcode region can optionally comprise a sample barcode. In accordance with compositions, methods, and oligonucleotides of some embodiments herein, each unique oligonucleotide species in a pool can comprise the same sample barcode, but there can be two or more pools that are each associated with different sample barcodes. As such, all or essentially all of the unique oligonucleotide species in pool #1 can comprise sample barcode #1, and all or essentially all of the unique oligonucleotide species in pool #2 can comprise sample barcode #2. Nucleic acids from a first sample can be associated with the unique oligonucleotide species in pool #1, and nucleic acids from a second sample can associated with unique oligonucleotide species in pool #2, for example by hybridization and amplification. As such, all or essentially all of the amplified nucleic acids corresponding to the first sample will comprise sample barcode #1 (but can comprise different molecule barcodes), and all of the amplified nucleic acids corresponding to the second sample will comprise sample barcode #2. In some embodiments, there are at least 24, 48, 96, or 192 pools.

The sample barcode can comprise a nucleic acid sequence of at least 3 nucleotides, for example at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 nucleotides, including ranges between any two of the listed values, for example 3-50, 3-45, 3-40, 3-35, 3-30, 3-25, 3-20, 3-15, 3-14, 3-13, 3-12, 3-11, 3-10, 3-9, 3-8, 3-7, 3-6, 3-5, 3-4, 4-50, 4-45, 4-40, 4-35, 4-30, 4-25, 4-20, 4-15, 4-14, 4-13, 4-12, 4-11, 4-10, 4-9, 4-8, 4-7, 4-6, 4-5, 5-50, 5-45, 5-40, 5-35, 5-30, 5-25, 5-20, 5-15, 5-14, 5-13, 5-12, 5-11, 5-10, 5-9, 5-8, 5-7, 5-6, 6-50, 6-45, 6-40, 6-35, 6-30, 6-25, 6-20, 6-15, 6-14, 6-13, 6-12, 6-11, 6-10, 6-9, 6-8, 6-7, 7-50, 7-45, 7-40, 7-35, 7-30, 7-25, 7-20, 7-15, 7-14, 7-13, 7-12, 7-11, 7-10, 7-9, 7-8, 8-50, 8-45, 8-40, 8-35, 8-30, 8-25, 8-20, 8-15, 8-14, 8-13, 8-12, 8-11, 8-10, 8-9, 9-50, 9-45, 9-40, 9-35, 9-30, 9-25, 9-20, 9-15, 9-14, 9-13, 9-12, 9-11, 9-10, 10-50, 10-45, 10-40, 10-35, 10-30, 10-25, 10-20, 10-15, 10-14, 10-13, 10-12, or 10-11 nucleotides. In some embodiments, the nucleic acid sequence of the sample barcode comprises a unique sequence, for example, so that each unique oligonucleotide species in a population comprises a different molecule barcode.

Without being limited by any theory, it is contemplated that a sample barcode comprising a low G content (e.g., less than 50% G, for example, less than 45% G, 40% G, 35% G, 30% G, 25% G, 20% G, 15% G, 12.5% G, 10% G, 7.5% G, 5% G, or 2.5% G) can be positioned 3′ of a sample barcode that comprises a relatively higher G content and 5′ of a uniform region (e.g. a target-specific sequence or oligo dT sequence), so as to minimize bias by separating the relatively G-rich sample barcode from the uniform region. In some embodiments, the barcode region comprises a sample barcode comprising 50% G content or less, for example, less than 50% G, 45% G, 40% G, 35% G, 30% G, 25% G, 20% G, 15% G, 12.5% G, 10% G, 7.5% G, 5% G, or 2.5% G, or 0% G, including ranges between any two of the listed values, for example 2.5-50% G, 2.5-45% G, 2.5-40% G, 2.5-35% G, 2.5-30% G, 2.5-25% G, 2.5-20% G, 2.5-15% G, 2.5-10% G, 2.5-7.5% G, 2.5-5% G, 5-50% G, 5-45% G, 5-40% G, 5-35% G, 5-30% G, 5-25% G, 5-20% G, 5-15% G, 5-10% G, 5-7.5% G, 7.5-50% G, 7.5-45% G, 7.5-40% G, 7.5-35% G, 7.5-30% G, 7.5-25% G, 7.5-20% G, 7.5-15% G, 7.5-10% G, 10-50% G, 10-45% G, 10-40% G, 10-35% G, 10-30% G, 10-25% G, 10-20% G, 10-15% G, 10-12.5% G, 12.5-50% G, 12.5-45% G, 12.5-40% G, 12.5-35% G, 12.5-30% G, 12.5-25% G, 12.5-20% G, 12.5-15% G, 15-50% G, 15-45% G, 15-40% G, 15-35% G, 15-30% G, 15-25% G, 15-20% G, 20-50% G, 20-45% G, 20-40% G, 20-35% G, 20-30% G, or 20-25% G.

In some embodiments, for a composition comprising unique oligonucleotide species (or such a composition as used in a method), at least 95% of the sample barcodes of the unique oligonucleotides of the composition each have less than 50% G content, for example, less than 45% G, 40% G, 35% G, 30% G, 25% G, 20% G, 15% G, 12.5% G, 10% G, 7.5% G, 5% G, or 2.5% G, or 0% G, including ranges between any two of the listed values, for example 2.5-50% G, 2.5-45% G, 2.5-40% G, 2.5-35% G, 2.5-30% G, 2.5-25% G, 2.5-20% G, 2.5-15% G, 2.5-10% G, 2.5-7.5% G, 2.5-5% G, 5-50% G, 5-45% G, 5-40% G, 5-35% G, 5-30% G, 5-25% G, 5-20% G, 5-15% G, 5-10% G, 5-7.5% G, 7.5-50% G, 7.5-45% G, 7.5-40% G, 7.5-35% G, 7.5-30% G, 7.5-25% G, 7.5-20% G, 7.5-15% G, 7.5-10% G, 10-50% G, 10-45% G, 10-40% G, 10-35% G, 10-30% G, 10-25% G, 10-20% G, 10-15% G, 10-12.5% G, 12.5-50% G, 12.5-45% G, 12.5-40% G, 12.5-35% G, 12.5-30% G, 12.5-25% G, 12.5-20% G, 12.5-15% G, 15-50% G, 15-45% G, 15-40% G, 15-35% G, 15-30% G, 15-25% G, 15-20% G, 20-50% G, 20-45% G, 20-40% G, 20-35% G, 20-30% G, or 20-25% G.

Patent Metadata

Filing Date

Unknown

Publication Date

October 14, 2025

Inventors

Unknown

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search