Patentable/Patents/US-20260024611-A1
US-20260024611-A1

Method for Predicting and Screening Interactions Between Lactobacillus Bulgaricus and Streptococcus Thermophilus Two-By-Two

PublishedJanuary 22, 2026
Assigneenot available in USPTO data we have
Technical Abstract

Lactobacillus bulgaricus Streptococcus thermophilus Lactobacillus bulgaricus Streptococcus thermophilus A method for predicting and screening symbiotic interactions betweenandis provided, belonging to the technical field of fermented milk production. In the method, a comprehensive feature vector is generated by combining KEGG features and k-mer feature frequencies ofandstrains. The top 200 important features are screened from the real labeled samples using the chi-square test, gradient boosting, and variance analysis. Subsequently, pseudo-labeled samples are generated using GAN, and a machine learning model is constructed by combining the real labeled samples, which is configured to predict the interaction effects of strain combinations. Finally, the accuracy of the model predictions is verified through fermentation experiments, and the optimal model is selected. The present disclosure can efficiently predict the potential for symbiotic interaction between strains, thereby improving the efficiency and quality of fermented milk production.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

Lactobacillus bulgaricus Streptococcus thermophilus Lactobacillus bulgaricus Streptococcus thermophilus k step S1, calculating k-mer data of a whole genome of two strains ofand two strains of, respectively, calculating respective Σ4dimensional feature vectors according to the k-mer data, and forming a KEGG matrix by calculating a gene copy number of each strain; step S2, fusing the KEGG features of the four strains by adding copy numbers of overlapping genes and replicating copy numbers of non-overlapping genes, thereby obtaining n features; obtaining m features by accumulating the k-mer feature frequencies of the four strains; and obtaining n+m features by concatenating n features and m features; a chi-square test; gradient boosting; and a variance analysis of n+m features according to the p real labeled samples; step S3, setting a number of real labeled samples to p, and screening a top 200 features in a feature importance ranking list according to three feature selection methods: step S4, for p real labeled samples, completing an iterative process of generating false data and discriminating true and false data by alternately working with a generator and a discriminator of GAN, wherein 10×p pseudo labeled samples are generated; Lactobacillus bulgaricus Streptococcus thermophilus step S5, constructing a machine learning model based on the real labeled sample and the pseudo labeled sample, and then predicting symbiotic interactions betweenandusing the constructed machine learning model; and step S6, performing fermentation experiments by selecting a plurality of combinations from predicted results, comprehensively evaluating fermentation effect of the strain combination according to the fermentation features, and selecting an optimal model with a highest prediction accuracy by comparing the experimental results with the prediction results of the machine learning model. . A method for predicting and screening symbiotic interactions betweenand, comprising the following steps:

2

Lactobacillus bulgaricus Streptococcus thermophilus claim 1 . The method for predicting and screening symbiotic interactions betweenandaccording to, wherein in step S1, k=5-9.

3

Lactobacillus bulgaricus Streptococcus thermophilus claim 1 . The method for predicting and screening symbiotic interactions betweenandaccording to, wherein in step S5, the machine learning model comprises logistic regression, support vector machine, random forest, K-nearest neighbor, and Gaussian naive Bayes.

Detailed Description

Complete technical specification and implementation details from the patent document.

Lactobacillus bulgaricus Streptococcus thermophilus. The present disclosure relates to the technical field of food production, specifically to the technical field of fermented milk production, and particularly to a method for predicting and screening symbiotic interactions betweenand

Lactobacillus bulgaricus Streptococcus thermophilus Fermented milk is a curd-like product made by fermentation ofandin milk (sterilized milk or concentrated milk) with or without milk powder (or skim milk powder). The finished product contains a large number of corresponding active microorganisms. Fermented milk is characterized by its high nutrient content, including calcium, protein, riboflavin, and vitamins. Currently, it has been proven that fermented milk has the effects of balancing intestinal flora, improving immunity, lowering cholesterol and delaying aging. An increasing number of people consume fermented milk on a daily basis, raising more stringent requirements for the quality of fermented milk production.

Streptococcus thermophilus Lactobacillus bulgaricus In the prior art, two strains ofand two strains ofare put together, randomly combined through biological experiments, the phenotypic data of acid production rate and proteolysis ability is test, and finally determines whether the four strains can interact symbiotically to accelerate the fermentation speed in the fermented milk production process, and improve the fermentation properties such as viscosity and water holding capacity. This method is time-consuming and labor-intensive, and yields low output. And the process of determining whether a group of bacteria interacts may be taken 3-4 months.

Lactobacillus bulgaricus Streptococcus thermophilus An objective of the present disclosure is to provide a method for predicting and screening interactions betweenandtwo-by-two, which can achieve high-throughput and efficient prediction while guaranteeing prediction accuracy.

Lactobacillus bulgaricus Streptococcus thermophilus Lactobacillus bulgaricus Streptococcus thermophilus k step S1, calculating k-mer data of a whole genome of two strains ofand two strains of, respectively, calculating respective Σ4dimensional feature vectors according to the k-mer data, and forming a Kyoto Encyclopedia of Genes and Genomes (KEGG) matrix by calculating a gene copy number of each strain; step S2, fusing the KEGG features of the four strains according to a principle of adding copy numbers of overlapping genes and replicating copy numbers of non-overlapping genes, thereby obtaining n features; obtaining m features by accumulating the k-mer feature frequencies of the four strains; and obtaining n+m features by concatenating n features and m features; step S3, setting a number of real labeled samples to p, and screening a top 200 features in a feature importance ranking list by three feature selection methods of a chi-square test, a gradient boosting and a variance analysis on n+m features according to the p real labeled samples; step S4, for p real labeled samples, completing an iterative process of generating false data and discriminating true and false data by alternately working with a generator and a discriminator of generative adversarial networks (GAN), and finally generating 10×p pseudo labeled samples; Lactobacillus bulgaricus Streptococcus thermophilus step S5, constructing a machine learning model based on the real labeled sample and the pseudo labeled sample, and then predictingandbased on a two-by-two combination by using the constructed machine learning model; and step S6, performing fermentation experiments by selecting multiple combinations from predicted results, comprehensively evaluating fermentation effect of the strain combination according to the fermentation features, and selecting an optimal model with a highest prediction accuracy by comparing the experimental results with the prediction results of the machine learning model. In order to achieve the above objective, the present disclosure provides a method for predicting and screening interactions betweenandtwo-by-two, and the method includes the following steps:

In some embodiments, in step S1, k=5-9.

In some embodiments, in step S5, the machine learning model includes logistic regression (LR), support vector machine (SVM), random forest (RF), K-nearest neighbor (KNN), and Gaussian naive Bayes (GNB).

Lactobacillus bulgaricus Streptococcus thermophilus Therefore, the present disclosure adopts the above-mentioned method for predicting and screening interactions betweenandtwo-by-two, and the beneficial technical effects are as follows:

Lactobacillus bulgaricus Streptococcus thermophilus Lactobacillus bulgaricus Streptococcus thermophilus A high-precision prediction model for the interaction between two strains ofand two strains ofis successfully constructed through in-depth analysis of the genomes ofand, combining with a series of operations such as KEGG operation, k-mer feature extraction, fine feature selection, and GAN data enhancement. This model can efficiently predict whether any combination of these four strains can achieve interaction and symbiosis in batches.

In the feature selection process, the feature combinations that have the most significant impact on the interaction are accurately screened out, thus ensuring that the prediction model can focus on the most critical information. Meanwhile, the efficiency of machine learning modeling is further improved with the help of data enhancement technology, which not only improves prediction efficiency and throughput, but also ensures the accuracy of prediction results. The implementation of these optimization measures collectively promotes the potential application of the present disclosure in the dairy fermentation and other related fields.

The technical scheme of the present disclosure is further explained below by embodiments.

Unless otherwise defined, the technical or scientific terms used in the present disclosure shall be those to which the present disclosure belongs.

Lactobacillus bulgaricus Streptococcus thermophilus Lactobacillus bulgaricus Streptococcus thermophilus. step S1, feature extraction ofand A method for predicting and screening interactions betweenandtwo-by-two, the method includes the following steps:

Lactobacillus bulgaricus Streptococcus thermophilus The k-mer (k=5-9) data of the whole genome of two strains ofand two strains ofare calculated, respectively.

k Lactobacillus bulgaricus Streptococcus thermophilus. Step S2, feature combination of two strains ofand two strains of The respective Σ4dimensional feature vectors are calculated according to the k-mer data, and the gene copy number of each strain is calculated by CENSOR, CNVnator, and other software to form the KEGG matrix.

The KEGG features of the four strains are fused according to the principle of adding copy numbers of overlapping genes and replicating copy numbers of non-overlapping genes, and n features are obtained.

m features are obtained by accumulating the k-mer feature frequencies of the four strains.

Step S3, feature selection is performed according to a small amount of existing labeled data. n+m features are obtained by concatenating n features and m features.

Step S4: data enhancement. If the number of labeled samples is p, the top 200 features are screened in a feature importance ranking list of the three methods by three feature selection methods of the chi-square test, the gradient boosting and the variance analysis on n+m features according to the p labeled samples;

Step S5: Model construction. For p labeled samples, the iterative process of generating false data and discriminating true and false data is completed by alternating steps with the generator and discriminator of GAN, and finally 10×p pseudo labeled data is generated.

Lactobacillus bulgaricus Streptococcus thermophilus Step S6, laboratory verification and determination of optimal model. Five machine learning models are constructed using 11×p samples (10p are generated samples, and p is real label positive sample) with LR, SVM, RF, KNN and GNB modeling. The models are used to predict 265,364, 100 2:2 combinations composed of 181 strains ofand 181 strains ofexisting in the laboratory, thereby obtaining the model prediction results of all combinations (0 or 1, 0 denotes no interaction, 1 denote interaction), and the prediction results are submitted to the laboratory for verification.

30 groups are randomly selected from step S5 to perform fermentation experiments, and then 30 groups of strain combinations are comprehensively evaluated to determine whether the fermentation labels are 0 or 1 based on the fermentation features such as fermentation time, viscosity, and water holding capacity obtained from the fermentation experiments.

Lactobacillus bulgaricus Streptococcus thermophilus Step S7, a set of experiments is performed by using the optimal model obtained in step S6, that is, the interaction between two strains of(IMAU20360 and IMAU20428) and two strains of(IMAU10630 and IMAU40145) is predicted. The prediction result is output in five seconds, and the prediction results indicated interaction. It shows that the time consumed in screening a 2:2 starter strain according to the present disclosure is much less than the time consumed in the laboratory. The results of laboratory verification are compared with the prediction results of five machine learning models, and the optimal model is selected, which is the logistic regression model.

It should be noted that any content not detailed in the present disclosure is prior art and is well known to those skilled in the art.

Lactobacillus bulgaricus Streptococcus thermophilus Therefore, the present disclosure uses the above-mentioned method for predicting and screening interactions betweenandtwo-by-two, which can achieve high-throughput and efficient prediction while guaranteeing prediction accuracy.

Finally, it should be noted that the above embodiments are merely used for describing the technical solutions of the present disclosure, rather than limiting the same. Although the present disclosure has been described in detail with reference to the preferred examples, those of ordinary skill in the art should understand that the technical solutions of the present disclosure may still be modified or equivalently replaced. However, these modifications or substitutions should not make the modified technical solutions deviate from the spirit and scope of the technical solutions of the present disclosure.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

September 30, 2025

Publication Date

January 22, 2026

Inventors

Zhihong SUN
Gaifang DONG
Mei BAI
Jie YU
Jie ZHAO
Weicheng LI
Hao JIN

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “METHOD FOR PREDICTING AND SCREENING INTERACTIONS BETWEEN LACTOBACILLUS BULGARICUS AND STREPTOCOCCUS THERMOPHILUS TWO-BY-TWO” (US-20260024611-A1). https://patentable.app/patents/US-20260024611-A1

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.

METHOD FOR PREDICTING AND SCREENING INTERACTIONS BETWEEN LACTOBACILLUS BULGARICUS AND STREPTOCOCCUS THERMOPHILUS TWO-BY-TWO — Zhihong SUN | Patentable