There is provided a non-transitory computer-readable medium storing a calculation program for causing a computer to execute a process. The process includes expanding a discrete variable indicating 1 or 0 to a continuous variable ranging from 0 to 1, and controlling strength of a bias of whether the continuous variable expanded resembles ½, or 0 or 1 during a sampling from a probability distribution of a discrete variable is performed.
Legal claims defining the scope of protection, as filed with the USPTO.
expanding a discrete variable indicating 1 or 0 to a continuous variable ranging from 0 to 1, and controlling strength of a bias of whether the continuous variable expanded resembles ½, or 0 or 1 during a sampling from a probability distribution of a discrete variable is performed. . A non-transitory computer-readable medium storing a calculation program for causing a computer to execute a process, the process comprising:
claim 1 wherein the sampling is performed by controlling γ using a following equation: . The non-transitory computer-readable medium according to, when the probability distribution expanded to the continuous variable is p(w), the strength of the bias is γ, and σ is a sigmoid operation,
claim 2 wherein the γ is defined from small to large in a following equation: . The non-transitory computer-readable medium according to, and i and j are exchanged during the sampling.
claim 1 wherein a cost function obtained by formulation performed by a large-scale language model is used as the probability distribution of the discrete variables, and wherein the process comprises providing the cost function obtained by the sampling to an external solver. . The non-transitory computer-readable medium according to,
expanding a discrete variable indicating 1 or 0 to a continuous variable ranging from 0 to 1, and controlling strength of a bias of whether the continuous variable expanded resembles ½, or 0 or 1 during a sampling from a probability distribution of a discrete variable is performed. . A calculation method implemented by a computer, the method comprising:
claim 5 wherein the sampling is performed by controlling γ using a following equation: . The method according to, when the probability distribution expanded to the continuous variable is p(w), the strength of the bias is γ, and σ is a sigmoid operation,
claim 6 wherein the γ is defined from small to large in a following equation: . The method according to, and i and j are exchanged during the sampling.
claim 5 wherein a cost function obtained by formulation performed by a large-scale language model is used as the probability distribution of the discrete variables, and wherein the process comprises providing the cost function obtained by the sampling to an external solver. . The method according to,
a memory; and a processor coupled to the memory and the processor configured to execute a process, the process comprising: expanding a discrete variable indicating 1 or 0 to a continuous variable ranging from 0 to 1, and controlling strength of a bias of whether the continuous variable expanded resembles ½, or 0 or 1 during a sampling from a probability distribution of a discrete variable is performed. . An information processing device comprising:
claim 9 wherein the sampling is performed by controlling γ using a following equation: . The information processing device according to, when the probability distribution expanded to the continuous variable is p(w), the strength of the bias is γ, and σ is a sigmoid operation,
claim 10 wherein the γ is defined from small to large in a following equation: . The information processing device according to, and i and j are exchanged during the sampling.
claim 9 wherein a cost function obtained by formulation performed by a large-scale language model is used as the probability distribution of the discrete variables, and wherein the process comprises providing the cost function obtained by the sampling to an external solver. . The information processing device according to,
Complete technical specification and implementation details from the patent document.
This application is based upon and claims the benefit of priority of Japanese Patent Application No. 2024-144331 filed on Aug. 26, 2024, the entire contents of which are incorporated herein by reference.
A certain aspect of the present embodiments relates to a non-transitory computer-readable medium, a calculation method, and an information processing device.
Technologies for sampling complex probability distributions have been disclosed (see, for example, Japanese Patent Application Publication No. 2024-513576, U.S. Patent Application Publication No. 2009/0287623, and Japanese Patent Application Publication No. 2021-197189).
According to an aspect of the present disclosure, there is provided a non-transitory computer-readable medium storing a calculation program for causing a computer to execute a process, the process including: expanding a discrete variable indicating 1 or 0 to a continuous variable ranging from 0 to 1, and controlling strength of a bias of whether the continuous variable expanded resembles ½, or 0 or 1 during a sampling from a probability distribution of a discrete variable is performed.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
If the probability distribution is complex, such as having multiple peaks, it may not be possible to achieve appropriate sampling.
The Markov chain Monte Carlo method (MCMC method) has come to be applied to a wide range of statistical problems, such as large-scale numerical calculations of proteins or Bayesian statistics. For example, the Markov chain Monte Carlo method is applied to many-body problems that appear in physics. This is because many-body problems that appear in physics are generally impossible to calculate analytically, and it is necessary to sample the state of the physical system to investigate its properties. The Markov chain Monte Carlo method is also applied to Bayesian statistics for data analysis. In Bayesian statistics for data analysis, sampling from the posterior distribution is necessary in Bayesian statistics when considering fitting experimental data to an effective model.
1 FIG.A Sampling here means obtaining a specific sample from a probability distribution p(x) that is explicitly given by a formula.illustrates an example of a probability distribution p(x). Obtaining a sample corresponds to generating random numbers from a given probability distribution.
The Monte Carlo method is a general term for methods of sampling from a probability distribution p(x). In a broad sense, the Monte Carlo method is a general term for a method of performing numerical calculations using random numbers.
1 FIG.B 1 FIG.B The static Monte Carlo method is a general term for a method of performing sampling without using a Markov chain. A Markov chain is a stochastic process in which the current state depends only on the previous state. As illustrated in, sampling is performed without depending on the previous state in the sampling process. In, each circle is a sampling point, and they do not have any dependency on each other. With such a static Monte Carlo method, it is difficult to sample a high-dimensional probability distribution.
1 FIG.C 1 FIG.C The Markov chain Monte Carlo method is a general term for a method of performing sampling using a Markov chain. In the Markov chain Monte Carlo method, sampling is performed depending on the previous state in the sampling process, as illustrated in. In, the starting point of the arrow represents the previous sampling point, and the end point of the arrow represents the next sampling point.
In the Markov chain Monte Carlo method, it is possible to transition to a state that is as different as possible from the previous state. By transitioning to a state that is different from the previous state, the autocorrelation of the sample sequence is reduced, and the number of valid samples that can be considered independent increases. Furthermore, the Markov chain Monte Carlo method is capable of transitioning to the entire space of random variables in a realistic amount of time.
2 FIG.A 2 FIG.B For example, if transitions were only made to states that were not very different from the previous state, the sampling area would be biased, as illustrated in, which could result in inefficient sampling. In contrast, if transitions were made to a state that was as different as possible from the previous state, the sampling area would not be biased, as illustrated in, which would improve sampling efficiency.
For a Markov chain to converge to a desired probability distribution, the transition probability K(X′|X) from a state X to a state X′ should satisfy the following two necessary conditions.
The transition probability between any two states X and X′ is not zero, but can be expressed as the product of a finite number of non-zero transition probabilities.
It is difficult to construct a Markov chain that satisfies the balance condition (1) above. Instead, the transition probability is constructed using the detailed balance condition p(x)K(x′|x)=p(x′)K(x|x′), which is a stronger condition.
Next, we will explain the Metropolis method, which is a specific example of a Markov chain Monte Carlo method that satisfies the detailed balance condition above. First, generate x′ according to a certain proposal distribution q(x′|x). Next, x′ is selected as the next state with the selection probability A(x′, x) in the following equation (1).
3 FIG. Typically, local transitions are used. For example, a local proposal distribution is used for the proposal distribution q(x′|x). In the case of two values, the dimension of x is selected randomly and the selected value of x is inverted. For example, as illustrated in, the fourth “1” from the top of x is inverted to “0” in x′. In this case, if A(x′, x) becomes small, x′ is selected and x is rejected.
3 FIG. 4 FIG. 4 FIG. The problems with local transitions are as follows. First, if a large transition is made from the previous state, most states will not be selected. For example, if all the variables of x inare inverted, none will be selected. Also, in the case of a discrete probability distribution, it is difficult to use gradient information. Furthermore, for certain problems such as multi-modal distributions, the probability of transition to a certain state may become small, and the transition may not actually be made, leading to erroneous results. For example,illustrates an example of the Metropolis method applied to a two-dimensional two-component Gaussian distribution. The broken line indicates the first 150 transitions. In the example of, sampling is performed around one peak (the peak in the upper right), but not around the other peak (the peak in the lower left).
Therefore, in the following embodiment, an example in which appropriate sampling can be performed will be described.
First, the principle of this embodiment will be described. In the first embodiment, a continuously relaxed extended ensemble system is defined, and an efficient Markov chain Monte Carlo method is realized by exchange within the ensemble and gradient-based local transition.
N N N N First, a probability distribution defined in a discrete state space x∈{0,1}is transformed using a real random variable w∈Ras p(x)→p(σ(w))≡p(w). Here, x is a discrete variable vector represented by 0 and 1, and has N elements. “σ” represents an element-wise sigmoid operation from R→R, and takes continuous values from 0 to 1.
As an example, the transformation of the following equation (2) for a system with QUBO-format energy can be given. QUBO stands for Quadratic Unconstrained Binary Optimization, which is a format that allows binary optimization without quadratic constraints.
N Next, a term is introduced into the probability distribution p(w). The term represents the strength of the bias of whether the continuous variable extended from the discrete variable is similar to ½ or similar to 0 or 1. For example, the probability distribution of the following equation (3) that takes continuous relaxation annealing into account is defined. In the following equation (3), by setting γ to −∞, the unimodal probability distribution is concentrated at σ(w)=1N/2. By setting γ to +∞, the support is concentrated at the end point of σ(w)∈{0,1}. Therefore, γ represents the strength of the bias of whether the continuous variable extended from the discrete variable is similar to ½ or similar to 0 or 1. While controlling this γ during the sampling process, sampling of local transitions is repeated.
Next, the distribution of the following equation (4) is defined. By defining the distribution of the following equation (4), it is possible to define γ from small to large. By executing gradient-based Markov chain Monte Carlo (Langevin, Hamiltonian MCMC) for the distribution of the following equation (4) and exchanging a randomly selected i with j at an appropriate timing, it is possible to transition the state space quickly. Note that in gradient-based Markov chain Monte Carlo, the state transition is determined based on the probability based on the rate of change of the gradient of the probability distribution.
i j The method of exchanging i and j is not particularly limited, but it may be designed to satisfy the detailed balance condition at the time of exchange. For example, a method of exchanging wand wwith the probability of the following equation (5) can be used.
i j As described above, in this embodiment, when sampling from the discrete probability distribution of a discrete variable, the discrete variable indicating 1 or 0 is expanded to a continuous variable ranging from 0 to 1, and the strength of the bias of whether the expanded continuous variable resembles ½, or 0 or 1 during the sampling process is controlled. By controlling the strength of the bias in this way, gradient-based local transition becomes possible, and appropriate sampling becomes possible. If appropriate sampling can be performed, the computational resources of the computer can be reduced, and the computer can be improved. In addition, by exchanging γand γduring the sampling process, the state space can be transitioned at high speed.
Next, a method of application to mathematical optimization will be explained. Mathematical optimization is an optimization problem formulated as in the following equation (6). In the following equation (6), x is a vector represented by 0 and 1, and has N elements.
For this function, for example, the exponential distribution of the following equation (7) is defined. In the following formula (7), x in p(x;β) represents a variable and β in p(x;β) represents a constant. This exponential distribution converges to a uniform distribution on optimization in the β→∞ limit.
The probability distribution where β is sufficiently large is sampled by the gradient-based Markov chain Monte Carlo method using continuous relaxation in this embodiment. The sampled x becomes a sample sequence near the minimum value of E(x). In this way, this embodiment can be applied to mathematical optimization.
Next, a method of application to an application will be explained. ChatOpt is an expert-free chat application that uses cooperation between the ability to formulate problems from prompts using a large-scale language model and an external solver. The problem of ChatOpt is that the formulation performed by the large-scale language model often results in a nonlinear cost function. If the cost function becomes nonlinear, there is a risk that it will become difficult to use existing optimization solvers (linear programming such as Gurobi, quadratic forms such as Digital Annealer, or the like). In contrast, an extended ensemble using continuous relaxation annealing as in this embodiment can handle any form, making it easy to connect to a large-scale language model. For example, the cost function obtained by the formulation performed by the large-scale language model is regarded as a probability distribution, and the probability distribution is sampled by the gradient-based Markov chain Monte Carlo method using the continuous relaxation of this embodiment. Even if the probability distribution is nonlinear, the probability distribution can be formulated by sampling the probability distribution by the gradient-based Markov chain Monte Carlo method using the continuous relaxation of this embodiment. This makes it possible to provide the probability distribution to an external solver. Note that the large-scale language model is, for example, a trained language model composed of a neural network. Also, the large-scale language model is, for example, a language model that learns using a large amount of calculation, data, and parameters, and executes the learned process using natural language as input and returns a response.
5 FIG.A 5 FIG.A 100 100 100 10 20 30 40 Next, a device configuration for realizing the above solution principle will be described.is a functional block diagram of the overall configuration of an information processing deviceaccording to the first embodiment. The information processing deviceis a server for sampling or the like. As illustrated in, the information processing devicefunctions as a probability distribution storage, a bias generator, a sampler, an outputterand so on.
30 For example, when sampling from the probability distribution of a discrete variable, the samplerexpands the discrete variable indicating 1 or 0 to a continuous variable ranging from 0 to 1, and during the sampling process, controls the strength of bias of whether the expanded continuous variable resembles ½, or 0 or 1.
30 Furthermore, when the probability distribution expanded to a continuous variable is p(w), the strength of bias is γ, and σ is a sigmoid operation, the samplerperforms sampling by controlling γ using the following equation.
30 The sampleralso defines γ from small to large in the following equation, and exchanges i and j during sampling.
30 The sampleralso uses a cost function obtained by formulation performed by a large-scale language model as a probability distribution of discrete variables, and provides the cost function obtained by sampling to an external solver.
5 FIG.B 5 FIG.B 100 100 101 102 103 104 105 is a hardware configuration diagram of the information processing device. As illustrated in, the information processing deviceincludes a CPU, a RAM, a storage device, an input device, a display device, and the like.
101 101 102 101 101 103 103 103 104 105 40 100 101 100 The CPUis a central processing unit. The CPUincludes one or more cores. The RAM (Random Access Memory)is a volatile memory that temporarily stores the program executed by the CPUand the data processed by the CPU. The storage deviceis a non-volatile storage device. For example, a ROM (Read Only Memory), a solid state drive (SSD) such as a flash memory, or a hard disk driven by a hard disk drive can be used as the storage device. The storage devicestores a calculation program. The input deviceis a device for a user to input necessary information, such as a keyboard or a mouse. The display deviceis a display device that displays the sampling results output by the outputteron a screen. Each part of the information processing deviceis realized by the CPUexecuting the calculation program. Note that each part of the information processing devicemay be hardware such as a dedicated circuit.
6 FIG. 6 FIG. 100 20 10 1 is a flowchart of an example of the operation of the information processing device. As illustrated in, the bias generatorgenerates Path γ of the above equation (4) for the probability distribution stored in the probability distribution storage(step S).
30 2 30 Then, the samplersets the initial state of the Markov chain Monte Carlo method (step S). For example, the samplersets each parameter of the Markov chain Monte Carlo method to a predetermined initial value.
30 3 Then, the samplerexecutes the gradient local transition of each γi M (≥2) times using the probability distribution of the above equation (3) (step S).
30 4 i i+1 Then, the samplerexecutes the exchange of (γ, γ) (step S).
30 5 30 Then, the samplerjudges whether or not the stopping condition is satisfied (step S). For example, the samplerjudges the degree of convergence of the statistics by detecting whether or not the amount of change of the statistics such as the mean, median, variance, and standard deviation of each sampled sampling point becomes equal to or less than a threshold value.
5 3 5 40 6 105 40 If the judgment in step Sis “No”, the process is executed again from step S. If the judgement in step Sis “Yes”, the outputteroutputs the sampling result (step S). For example, the display devicedisplays the sampling result output by the outputter. Then, the execution of the flowchart ends.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various change, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
July 21, 2025
June 4, 2026
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.