US-10388404

Using machine-learning to perform linear regression on a DNA-computing platform

PublishedAugust 20, 2019

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A method and associated systems for using machine-learning methods to perform linear regression on a DNA-computing platform. One or more processors generate and initialize beta coefficients of a system of linear equations. These initial values are encoded into nucleobase chains that are then padded to a standard length. The chains are allowed to bind with complementary template chains in a DNA-computing reaction, and the resulting DNA molecules are decoded to reveal the relative the relative likelihood of each chain to bind. The initial values of the beta coefficients are weighted proportionally to these likelihoods, and the process is repeated iteratively until the beta coefficients converge to optimal values.

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method for using machine-learning on a DNA-computing platform to perform linear regression, the method comprising: a processor of a computer system identifying a first set of equal-length nucleobase sequences, wherein the identifying each sequence comprises selecting an order of nucleobases that each encode two or more numeric values of a set of numeric values, and wherein each value encoded into a first sequence of the first set of sequences represents an estimated value of a beta coefficient comprised by a corresponding first equation of a system of linear equations; the processor further identifying a second set of equal-length nucleobase sequences that are each capable of bonding through a Watson Crick reaction to a corresponding sequence of the first set of equal-length nucleobase sequences; the processor adjusting each value of the set of numeric values as a function of a result of a Watson Crick reaction between a first set of nucleobase chains and a second set of nucleobase chains, wherein each chain of the first set of nucleobase chains comprises nucleobases ordered in a sequence of the first set of nucleobase sequences, and wherein each chain of the second set of nucleobase chains comprises nucleobases ordered in a sequence of the second set of nucleobase sequences; the processor replacing each value of the set of numeric values with its corresponding adjusted value; and the processor repeating the identifying, further identifying, adjusting, and replacing until the adjusted values converge over two or more successive adjustings to a constant set of values.

2. The method of claim 1 , further comprising: the processor encoding each value of the first sequence of the first set of sequences, wherein the encoding comprises: the processor translating each value of a beta coefficient comprised by the first equation into a base-3 numeral of a set of base-3 numerals; the processor representing each base-3 numeral of the of set of base-3 numerals as an analogous sequence of nucleobases, wherein each 0, 1, and 2 digit of the each base-3 numeral is uniquely represented as a nucleobase selected from a group comprising adenine, cytosine, guanine, and taurine; the processor identifying the first sequence of the first set of sequences as comprising analogous sequences of nucleobases that together represent base-3 values of all beta coefficients comprised by the first equation; the processor selecting a filler nucleobase from a group comprising adenine, cytosine, guanine, and taurine, wherein the filler nucleobase has not been used to represent a 0, 1, or 2 digit of any base-3 numeral of the set of base-3 numerals; and the processor setting the first sequence to the equal length by padding the first sequence, if necessary, with one or more instances of the filler nucleobase.

3. The method of claim 2 , wherein each sequence of the first set of equal-length nucleobase sequences comprises base-3 encoded representations of all beta coefficients comprised by a corresponding equation of the system of linear equations, and wherein the equal length is the shortest length necessary to store a longest base-3 encoded representation of all beta coefficients comprised by an equation of the system of linear equations.

4. The method of claim 2 , wherein the processor sets a template sequence of the second set of sequences to the equal length by padding the template sequence, if necessary, with one or more instances of a nucleobase that forms a base pair with the filler nucleobase.

5. The method of claim 1 , further comprising: the processor computing a residual sum of squares error rate of a set of error rates for each set of beta coefficients of one equation of the system of linear equations; the processor normalizing the set of error rates such that a sum of all error rates of the set of error rates equals 100%; and the processor directing that the first set of nucleobase chains comprises multiple copies of each sequence of the first set of nucleobase sequences, wherein a relative number of copies of each sequence is a function of the set of error rates.

6. The method of claim 5 , further comprising: the processor directing that a ratio between a sequence of the first set of nucleobase sequences and a corresponding template sequence of the second set of nucleobase sequences be at least 1000:1.

7. The method of claim 1 , wherein the adjusting further comprises: the processor assigning a weighting to the beta coefficients of the first equation as a function of a number of DNA molecules formed by the Watson Crick reaction that comprise a nucleobase sequence that represents base-3 encoded values of beta coefficients of the first equation, relative to the total number of DNA molecules formed by the Watson Crick reaction.

8. The method of claim 1 , further comprising providing at least one support service for at least one of creating, integrating, hosting, maintaining, and deploying computer-readable program code in the computer system, wherein the computer-readable program code in combination with the computer system is configured to implement the identifying, further identifying, adjusting, replacing, and repeating.

9. A computer program product, comprising a computer-readable hardware storage device having a computer-readable program code stored therein, said program code configured to be executed by a processor of a computer system to implement a method for using machine-learning on a DNA-computing platform to perform linear regression, the method comprising: the processor identifying a first set of equal-length nucleobase sequences, wherein the identifying each sequence comprises selecting an order of nucleobases that each encode two or more numeric values of a set of numeric values, and wherein each value encoded into a first sequence of the first set of sequences represents an estimated value of a beta coefficient comprised by a corresponding first equation of a system of linear equations; the processor further identifying a second set of equal-length nucleobase sequences that are each capable of bonding through a Watson Crick reaction to a corresponding sequence of the first set of equal-length nucleobase sequences; the processor adjusting each value of the set of numeric values as a function of a result of a Watson Crick reaction between a first set of nucleobase chains and a second set of nucleobase chains, wherein each chain of the first set of nucleobase chains comprises nucleobases ordered in a sequence of the first set of nucleobase sequences, and wherein each chain of the second set of nucleobase chains comprises nucleobases ordered in a sequence of the second set of nucleobase sequences; the processor replacing each value of the set of numeric values with its corresponding adjusted value; and the processor repeating the identifying, further identifying, adjusting, and replacing until the adjusted values converge over two or more successive adjustings to a constant set of values.

10. The computer program product of claim 9 , further comprising: the processor encoding each value of the first sequence of the first set of sequences, wherein the encoding comprises: the processor translating each value of a beta coefficient comprised by the first equation into a base-3 numeral of a set of base-3 numerals; the processor representing each base-3 numeral of the of set of base-3 numerals as an analogous sequence of nucleobases, wherein each 0, 1, and 2 digit of the each base-3 numeral is uniquely represented as a nucleobase selected from a group comprising adenine, cytosine, guanine, and taurine; the processor identifying the first sequence of the first set of sequences as comprising analogous sequences of nucleobases that together represent base-3 values of all beta coefficients comprised by the first equation; the processor selecting a filler nucleobase from a group comprising adenine, cytosine, guanine, and taurine, wherein the filler nucleobase has not been used to represent a 0, 1, or 2 digit of any base-3 numeral of the set of base-3 numerals; and the processor setting the first sequence to the equal length by padding the first sequence, if necessary, with one or more instances of the filler nucleobase.

11. The computer program product of claim 10 , wherein each sequence of the first set of equal-length nucleobase sequences comprises base-3 encoded representations of all beta coefficients comprised by a corresponding equation of the system of linear equations, and wherein the equal length is the shortest length necessary to store a longest base-3 encoded representation of all beta coefficients comprised by an equation of the system of linear equations.

12. The computer program product of claim 10 , wherein the processor sets a template sequence of the second set of sequences to the equal length by padding the template sequence, if necessary, with one or more instances of a nucleobase that forms a base pair with the filler nucleobase.

13. The computer program product of claim 9 , further comprising: the processor computing a residual sum of squares error rate of a set of error rates for each set of beta coefficients of one equation of the system of linear equations; the processor normalizing the set of error rates such that a sum of all error rates of the set of error rates equals 100%; and the processor directing that the first set of nucleobase chains comprises multiple copies of each sequence of the first set of nucleobase sequences, wherein a relative number of copies of each sequence is a function of the set of error rates.

14. The computer program product of claim 9 , wherein the adjusting further comprises: the processor assigning a weighting to the beta coefficients of the first equation as a function of a number of DNA molecules formed by the Watson Crick reaction that comprise a nucleobase sequence that represents base-3 encoded values of beta coefficients of the first equation, relative to the total number of DNA molecules formed by the Watson Crick reaction.

15. A computer system comprising a processor, a memory coupled to said processor, and a computer-readable hardware storage device coupled to said processor, said storage device containing program code configured to be run by said processor via the memory to implement a method for using machine-learning on a DNA-computing platform to perform linear regression, the method comprising: the processor identifying a first set of equal-length nucleobase sequences, wherein the identifying each sequence comprises selecting an order of nucleobases that each encode two or more numeric values of a set of numeric values, and wherein each value encoded into a first sequence of the first set of sequences represents an estimated value of a beta coefficient comprised by a corresponding first equation of a system of linear equations; the processor further identifying a second set of equal-length nucleobase sequences that are each capable of bonding through a Watson Crick reaction to a corresponding sequence of the first set of equal-length nucleobase sequences; the processor adjusting each value of the set of numeric values as a function of a result of a Watson Crick reaction between a first set of nucleobase chains and a second set of nucleobase chains, wherein each chain of the first set of nucleobase chains comprises nucleobases ordered in a sequence of the first set of nucleobase sequences, and wherein each chain of the second set of nucleobase chains comprises nucleobases ordered in a sequence of the second set of nucleobase sequences; the processor replacing each value of the set of numeric values with its corresponding adjusted value; and the processor repeating the identifying, further identifying, adjusting, and replacing until the adjusted values converge over two or more successive adjustings to a constant set of values.

16. The computer system of claim 15 , further comprising: the processor encoding each value of the first sequence of the first set of sequences, wherein the encoding comprises: the processor translating each value of a beta coefficient comprised by the first equation into a base-3 numeral of a set of base-3 numerals; the processor representing each base-3 numeral of the of set of base-3 numerals as an analogous sequence of nucleobases, wherein each 0, 1, and 2 digit of the each base-3 numeral is uniquely represented as a nucleobase selected from a group comprising adenine, cytosine, guanine, and taurine; the processor identifying the first sequence of the first set of sequences as comprising analogous sequences of nucleobases that together represent base-3 values of all beta coefficients comprised by the first equation; the processor selecting a filler nucleobase from a group comprising adenine, cytosine, guanine, and taurine, wherein the filler nucleobase has not been used to represent a 0, 1, or 2 digit of any base-3 numeral of the set of base-3 numerals; and the processor setting the first sequence to the equal length by padding the first sequence, if necessary, with one or more instances of the filler nucleobase.

17. The computer system of claim 16 , wherein each sequence of the first set of equal-length nucleobase sequences comprises base-3 encoded representations of all beta coefficients comprised by a corresponding equation of the system of linear equations, and wherein the equal length is the shortest length necessary to store a longest base-3 encoded representation of all beta coefficients comprised by an equation of the system of linear equations.

18. The computer system of claim 16 , wherein the processor sets a template sequence of the second set of sequences to the equal length by padding the template sequence, if necessary, with one or more instances of a nucleobase that forms a base pair with the filler nucleobase.

19. The computer system of claim 15 , further comprising: the processor computing a residual sum of squares error rate of a set of error rates for each set of beta coefficients of one equation of the system of linear equations; the processor normalizing the set of error rates such that a sum of all error rates of the set of error rates equals 100%; and the processor directing that the first set of nucleobase chains comprises multiple copies of each sequence of the first set of nucleobase sequences, wherein a relative number of copies of each sequence is a function of the set of error rates.

20. The computer system of claim 15 , wherein the adjusting further comprises: the processor assigning a weighting to the beta coefficients of the first equation as a function of a number of DNA molecules formed by the Watson Crick reaction that comprise a nucleobase sequence that represents base-3 encoded values of beta coefficients of the first equation, relative to the total number of DNA molecules formed by the Watson Crick reaction.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G16B G06N

Patent Metadata

Filing Date

October 27, 2015

Publication Date

August 20, 2019

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search