08D. Gillepsie algorithm¶
Mingyang Lu¶
3/11/2024¶
Intro¶
Here, we will discuss to use Gillepie algorithm to simulate the stochastic dynamics of a chemical reaction network. First, we setup a vector $\mathbf{S}$ to represent the state of the system, composed of the number of molecules of each species. For each chemical reaction $j$,
$$ \sum_i^{n_s}{\alpha_{ij}S_i} \rightarrow \sum_i^{n_s}{\beta_{ij}S_i}$$ where $n_S$ is the total number of species.
Second, we define the stoichiometry matrix $\Gamma$, where each element $\Gamma_{ij} = \beta_{ij} - \alpha_{ij}$. The stoichiometry vector $\mathbf{\gamma_j}$ for each reaction $j$ is the $j^th$ column vector of $\Gamma$.
Third, we define the reaction propensity $R_j$ as the product of its specific probability rate (from the law of mass action, $R_j = k_j \Pi_{i=1}^{n_S} S_i^{\alpha_{ij}}$) and its reaction degeneracy (see reference90283-V)).
Workflow for Gillespie algorithm¶
1. Set the initial state $\mathbf{S}$ and initialize time $t$ to 0.
2. Calculate the reaction propensities $R_j(S)$
3. Randomly pick a reaction j based on $R_j(S)$
4. Randomly pick the waiting time $\tau$ based on $R_{tot} = \sum_{j}{R_j(\mathbf{S})}$
5. Increment the simulation time $t \rightarrow t + \tau$
6. Update the state $\mathbf{S} \rightarrow \mathbf{S} + \mathbf{\gamma_j}$ reflect the fact that reaction j has occurred.
7. Return to step 2.
The following is a generic implementation of the Gillespie algorithm in Python.
import numpy as np
import matplotlib.pyplot as plt
# A generic Gillespie algorithm for one iteration
def gillespie_step(t, Xs, propensity, stoichiometry, **kwargs):
p_all = propensity(Xs, **kwargs) # propensity
p_sum = np.sum(p_all)
if np.abs(p_sum) < 1e-9: # if p_sum = 0, simulation stops
return {'t': t, 'Xs': Xs, 'if_stop': True}
dt = np.random.exponential(scale=1/p_sum) # compute the waiting wait from an exponential distribution
event = np.random.choice(range(len(p_all)), p=p_all/p_sum) # choose the event
dx = stoichiometry(event, **kwargs) # find the stoichiometry vector corresponding to the selected reaction
t_new = t + dt
Xs_new = Xs + dx
return {'t': t_new, 'Xs': Xs_new, 'if_stop': False}
# the Gillespie simulation
# Users need to specify:
# (1) the initial state X0 (a vector of n species)
# (2) maximum time (tmax)
# (3) maximum allowed iterations (iter_max)
# (4) other parameters required for the propensity (mostly kinetic rates)
def ssa(propensity, stoichiometry, X0, tmax, iter_max, **kwargs):
Xs = np.array(X0)
n = len(Xs)
t_all = np.zeros(iter_max)
X_all = np.zeros((iter_max, n))
t = 0
iter = 0
t_all[iter] = t
X_all[iter, :] = Xs
if_stop = False
while t < tmax and iter < iter_max-1:
output = gillespie_step(t = t, Xs = Xs, propensity = propensity,
stoichiometry = stoichiometry, **kwargs)
if output['if_stop']:
break
iter += 1
t = output['t']
Xs = output['Xs']
t_all[iter] = t
X_all[iter, :] = Xs
return np.column_stack((t_all[:iter], X_all[:iter, :]))
In the following, we provide a few examples of its usage.
1: A gene constitutively transcribed¶
We consider a gene $x$ with transcriptional rate $g$ and degradation rate $k$.
$$ 0 {\stackrel{g}{\rightarrow}} x {\stackrel{k}{\rightarrow}} 0 $$
The corresponding ODE of the system is
$$\frac{dx}{dt} = g - kx$$ The stochastic dynamics is a typical birth-death process.
# specify propensity functions
def p_func1(Xs, g, k):
# Xs: current state
# g, k: parameters
x = Xs[0]
return np.array([g, k * x])
# specify stoichiometry matrix
def s_func1(r_ind, g, k):
# r_ind: the choice of the reaction
s_matrix = np.array([[1], # x adds 1
[-1]]) # x substracts 1
return s_matrix[r_ind]
We first check the stochastic dynamics near the stable steady state in the deterministic scenario. The the following, we select $g = 100$ and $k = 0.1$. The deterministic steady state is $\bar{x} = g/k = 1000$. We set $x(t = 0) = \bar{x} = 1000$
# Parameters
g = 100
k = 0.1
np.random.seed(101)
X0 = [1000]
tmax = 100
iter_max = 20000
# Run the simulation
results_1 = ssa(propensity = p_func1, stoichiometry = s_func1,
X0 = X0, tmax = tmax, iter_max = iter_max, g = g, k = k)
# Plot the results
plt.step(results_1[:,0], results_1[:, 1], color = "red", label='x')
plt.plot(results_1[:,0], np.repeat(X0, len(results_1)), color='green', label='Mean x')
plt.xlabel('Time')
plt.ylabel('x')
plt.xlim(0, tmax)
plt.ylim(800, 1200)
plt.legend()
plt.show()
Now if we set $g = 10$ and $k = 0.1$. $x(t = 0) = \bar{x} = 100$.
# Parameters
g = 10
k = 0.1
np.random.seed(101)
X0 = [100]
tmax = 100
iter_max = 2000
# Run the simulation
results_2 = ssa(propensity = p_func1, stoichiometry = s_func1,
X0 = X0, tmax = tmax, iter_max = iter_max, g = g, k = k)
# Plot the results
plt.step(results_2[:,0], results_2[:, 1], color = "red", label='x')
plt.plot(results_2[:,0], np.repeat(X0, len(results_2)), color='green', label='Mean x')
plt.xlabel('Time')
plt.ylabel('x')
plt.xlim(0, tmax)
plt.ylim(50, 150)
plt.legend()
plt.show()
We then set $g = 1$ and $k = 0.1$. $x(t = 0) = \bar{x} = 10$.
# Parameters
g = 1
k = 0.1
np.random.seed(101)
X0 = [10]
tmax = 100
iter_max = 2000
# Run the simulation
results_3 = ssa(propensity = p_func1, stoichiometry = s_func1,
X0 = X0, tmax = tmax, iter_max = iter_max, g = g, k = k)
# Plot the results
plt.step(results_3[:,0], results_3[:, 1], color = "red", label='x')
plt.plot(results_3[:,0], np.repeat(X0, len(results_3)), color='green', label='Mean x')
plt.xlabel('Time')
plt.ylabel('x')
plt.xlim(0, tmax)
plt.ylim(0, 20)
plt.legend()
plt.show()
We now evaluate the mean and standard deviation (SD) of $x$ for each simulation.
$$Mean(x) = \frac{\sum_{i=1}^{N} x(t_i)\Delta t_i}{\sum_{i=1}^{N} \Delta t_i}$$ $$Mean(x^2) = \frac{\sum_{i=1}^{N} x(t_i)^2\Delta t_i}{\sum_{i=1}^{N} \Delta t_i}$$ $$SD(x) = \frac{\sum_{i=1}^{N} [x(t_i)-Mean(x)]^2\Delta t_i}{\sum_{i=1}^{N} \Delta t_i}$$ But the simplest way is to first compute $Mean(x)$ and $Mean(x^2)$, then obtain $SD(x)$ from
$$SD(x) = \sqrt{Mean(x^2) - Mean(x)^2}$$
Often, we also evaluate the relative error (SD normalized by the mean value). Pay attention that it is incorrect to directly apply $sd()$ and $mean()$ functions to the results matrix, as the waiting times $\Delta t_i$ is not a constant. Also, if we only need to compute the statistics for mean, variance, standard deviations, etc., we can do the calculations for $\sum_{i=1}^{N} x(t_i)\Delta t_i$, $\sum_{i=1}^{N} [x(t_i)-Mean(x)]^2\Delta t_i$, and $\sum_{i=1}^{N} \Delta t_i$ during the stochastic simulation. And the time trajectories (i.e., the whole result matrix does not need to be stored).
def cal_stat(results_mat):
# results_mat: 1st column: t, 2nd - (n+1)th columns: n variables for each t
mat = results_mat[~np.isnan(results_mat).any(axis=1)]
nt, nx = mat.shape
dt = mat[1:, 0] - mat[:-1, 0]
t_tot = np.sum(dt)
mean = np.dot(np.transpose(mat[1:, 1:nx]), dt) / t_tot
mean2 = np.dot(np.transpose(mat[1:, 1:nx]**2), dt) / t_tot
sd = np.sqrt(mean2 - mean**2)
error = sd / mean
return {'mean': mean, 'sd': sd, 'error': error}
ss_all = np.array([1000, 100, 10])
# Example dictionaries for demonstration purposes
stat1 = cal_stat(results_1)
stat2 = cal_stat(results_2)
stat3 = cal_stat(results_3)
x_1 = np.arange(1, 1001, 1)
plt.figure(figsize=(10, 5))
# Plot SD
plt.subplot(1, 2, 1)
plt.plot(ss_all, np.concatenate((stat1['sd'], stat2['sd'], stat3['sd'])), 'o-', color='blue')
plt.plot(x_1, x_1**0.5, '-', color='green', label='5*x^0.5')
plt.xlabel('Steady state x')
plt.ylabel('SD')
plt.xlim(0, 1000)
plt.ylim(0, 30)
plt.legend()
# Plot SD/Mean
plt.subplot(1, 2, 2)
plt.plot(ss_all, np.concatenate((stat1['error'], stat2['error'], stat3['error'])), 'o-', color='blue')
plt.xlabel('Steady state x')
plt.ylabel('SD/Mean')
plt.xlim(0, 1000)
plt.ylim(0, 0.4)
plt.tight_layout()
plt.show()
Now, let's consider the same systems for the above three conditions but starting from the zero initial condition. The purpose is to evaluate the decaying dynamics towards the steady state.
# Parameters
g = 100
k = 0.1
np.random.seed(101)
X0 = [0]
tmax = 100
iter_max = 20000
# Run the simulation
results_1 = ssa(propensity = p_func1, stoichiometry = s_func1,
X0 = X0, tmax = tmax, iter_max = iter_max, g = g, k = k)
# Plot the results
plt.step(results_1[:,0], results_1[:, 1], color = "red", label='x')
plt.xlabel('Time')
plt.ylabel('x')
plt.xlim(0, tmax)
plt.ylim(0, 1200)
x_1 = results_1[:, 0]
analytical_solution = g / k * (1 - np.exp(-k * x_1))
plt.plot(x_1, analytical_solution, '-', color='green', label='Analytical')
plt.legend()
plt.show()
# Parameters
g = 10
k = 0.1
np.random.seed(101)
X0 = [0]
tmax = 100
iter_max = 2000
# Run the simulation
results_2 = ssa(propensity = p_func1, stoichiometry = s_func1,
X0 = X0, tmax = tmax, iter_max = iter_max, g = g, k = k)
# Plot the results
plt.step(results_2[:,0], results_2[:, 1], color = "red", label='x')
plt.xlabel('Time')
plt.ylabel('x')
plt.xlim(0, tmax)
plt.ylim(0, 150)
x_2 = results_2[:, 0]
analytical_solution = g / k * (1 - np.exp(-k * x_2))
plt.plot(x_2, analytical_solution, '-', color='green', label='Analytical')
plt.legend()
plt.show()
# Parameters
g = 1
k = 0.1
np.random.seed(101)
X0 = [0]
tmax = 100
iter_max = 2000
# Run the simulation
results_3 = ssa(propensity = p_func1, stoichiometry = s_func1,
X0 = X0, tmax = tmax, iter_max = iter_max, g = g, k = k)
# Plot the results
plt.step(results_3[:,0], results_3[:, 1], color = "red", label='x')
plt.xlabel('Time')
plt.ylabel('x')
plt.xlim(0, tmax)
plt.ylim(0, 15)
x_3 = results_3[:, 0]
analytical_solution = g / k * (1 - np.exp(-k * x_3))
plt.plot(x_3, analytical_solution, '-', color='green', label='Analytical')
plt.legend()
plt.show()
2: Bursting transcription¶
We consider a gene $x$ with transcriptional rate $g/n$, but n molecules every reaction, and degradation rate $k$.
$$ 0 {\stackrel{g/n}{\rightarrow}} nx$$
$$x {\stackrel{k}{\rightarrow}} 0 $$
The corresponding ODE of the system is
$$\frac{dx}{dt} = ng/n - kx = g - kx$$
# Define the propensity function
def p_func2(Xs, g, k, n):
# Xs: current state
# g, k, n: parameters
x = Xs[0]
return np.array([g / n, k * x])
# Define the stoichiometry function
def s_func2(r_ind, g, k, n):
# r_ind: the choice of the reaction
s_matrix = np.array([[n], # x adds n
[-1]]) # x substracts 1
return s_matrix[r_ind]
We check the stochastic dynamics near the stable steady state in the deterministic scenario. We start with $g = 16$, $k = 0.1$, and $n = 1$. The deterministic steady state is $\bar{x} = g/k = 160$. We set $x(t = 0) = \bar{x} = 160$
np.random.seed(91)
g = 16
k = 0.1
n = 1
X0 = np.array([160])
tmax = 1000
iter_max = 50000
results_4 = ssa(propensity = p_func2, stoichiometry = s_func2,
X0 = X0, tmax = tmax, iter_max = iter_max, g = g, k = k, n = n)
# Plot the results
plt.step(results_4[:,0], results_4[:, 1], color = "red", label='x')
plt.plot(results_4[:,0], np.repeat(X0, len(results_4)), color='green', label='Mean x')
plt.xlabel('Time')
plt.ylabel('x')
plt.xlim(0, tmax)
plt.ylim(100, 220)
plt.legend()
plt.show()
Now if we set $g = 16$, $k = 0.1$, and $n = 2$. The steady state level remains the same. $x(t = 0) = \bar{x} = 160$.
g = 16
k = 0.1
n = 2
X0 = np.array([160])
tmax = 2000
iter_max = 500000
results_5 = ssa(propensity = p_func2, stoichiometry = s_func2,
X0 = X0, tmax = tmax, iter_max = iter_max, g = g, k = k, n = n)
# Plot the results
plt.step(results_5[:,0], results_5[:, 1], color = "red", label='x')
plt.plot(results_5[:,0], np.repeat(X0, len(results_5)), color='green', label='Mean x')
plt.xlabel('Time')
plt.ylabel('x')
plt.xlim(0, tmax)
plt.ylim(100, 220)
plt.legend()
plt.show()
We then set $g = 16$, $k = 0.1$, $n = 4$. $x(t = 0) = \bar{x} = 160$.
g = 16
k = 0.1
n = 4
X0 = np.array([160])
tmax = 2000
iter_max = 500000
results_6 = ssa(propensity = p_func2, stoichiometry = s_func2,
X0 = X0, tmax = tmax, iter_max = iter_max, g = g, k = k, n = n)
# Plot the results
plt.step(results_6[:,0], results_6[:, 1], color = "red", label='x')
plt.plot(results_6[:,0], np.repeat(X0, len(results_6)), color='green', label='Mean x')
plt.xlabel('Time')
plt.ylabel('x')
plt.xlim(0, tmax)
plt.ylim(100, 220)
plt.legend()
plt.show()
We then set $g = 16$, $k = 0.1$, $n = 8$. $x(t = 0) = \bar{x} = 160$.
g = 16
k = 0.1
n = 8
X0 = np.array([160])
tmax = 2000
iter_max = 500000
results_7 = ssa(propensity = p_func2, stoichiometry = s_func2,
X0 = X0, tmax = tmax, iter_max = iter_max, g = g, k = k, n = n)
# Plot the results
plt.step(results_7[:,0], results_7[:, 1], color = "red", label='x')
plt.plot(results_7[:,0], np.repeat(X0, len(results_7)), color='green', label='Mean x')
plt.xlabel('Time')
plt.ylabel('x')
plt.xlim(0, tmax)
plt.ylim(100, 220)
plt.legend()
plt.show()
n_all = np.array([1, 2, 4, 8])
# Example dictionaries for demonstration purposes
stat4 = cal_stat(results_4)
stat5 = cal_stat(results_5)
stat6 = cal_stat(results_6)
stat7 = cal_stat(results_7)
x_1 = np.arange(1, 1001, 1)
plt.figure(figsize=(10, 5))
# Plot SD
plt.subplot(1, 2, 1)
plt.plot(n_all, np.concatenate((stat4['mean'], stat5['mean'], stat6['mean'], stat7['mean'])), 'o-', color='blue')
plt.xlabel('n')
plt.ylabel('Mean')
plt.xlim(0, 9)
plt.ylim(0, 200)
# Plot SD/Mean
plt.subplot(1, 2, 2)
plt.plot(n_all, np.concatenate((stat4['sd'], stat5['sd'], stat6['sd'], stat7['sd'])), 'o-', color='blue')
plt.xlabel('n')
plt.ylabel('SD')
plt.xlim(0, 9)
plt.ylim(0, 30)
plt.tight_layout()
plt.show()
3: Stochastic dynamics of a self-inhibiting gene¶
We consider a model of a self-inhibiting gene. The model is the same as the previous model for a self-activating gene, but with the opposite $g_0$ and $g_1$.
# Define the propensity function
def p_func3(Xs, g0, g1, k, kon, koff):
d = Xs[0] # D (0: bound; 1: unbound)
c = Xs[1] # C (0: unbound; 1: bound)
m = Xs[2] # M (number of M)
return np.array([g0 * d, k * m, kon * d * m * (m - 1) / 2, koff * c, g1 * c])
# Define the stoichiometry function
def s_func3(r_ind, g0, g1, k, kon, koff):
# r_ind: the choice of the reaction
s_matrix = np.array([[0, 0, 1], # M adds 1
[0, 0, -1], # M subtracts 1
[-1, 1, -2], # C adds 1, D subtracts 1, and M subtracts 2
[1, -1, 2], # C subtracts 1, D adds 1, and M adds 2
[0, 0, 1]]) # M adds 1
return s_matrix[r_ind]
We set $g_0 = 55$, $g_1 = 5$, $k = 0.1$, $k_{on} = 0.002$, $k_{off} = 90$. With this condition,$K = \sqrt{\frac{2k_{off}}{k_{on}}} = 300$. This is the scenario of high copy number and fast binding/unbinding.
For the initial condition, we first consider $D = 1$, $C = 0$, $M = 300$.
np.random.seed(101)
g0 = 55
g1 = 5
k = 0.1
kon = 0.002
koff = 90
X0 = [1, 0, 300]
tmax = 400
iter_max = 50000
# Run the Gillespie simulation
results_8 = ssa(propensity = p_func3, stoichiometry = s_func3,
X0 = X0, tmax = tmax, iter_max = iter_max,
g0=g0, g1=g1, k=k, kon=kon, koff=koff)
# Plotting
plt.figure(figsize=(10, 5))
# Plot M vs Time
plt.subplot(1, 2, 1)
plt.step(results_8[:,0], results_8[:,3], '-', color='red')
plt.xlabel('Time')
plt.ylabel('M')
plt.xlim(0, tmax)
plt.ylim(0, 600)
# Plot D and C vs Time
plt.subplot(1, 2, 2)
plt.step(results_8[:,0], results_8[:,1], '-', color='blue', label='D')
plt.step(results_8[:,0], results_8[:,2], '-', color='orange', label='C')
plt.xlabel('Time')
plt.ylabel('D / C')
plt.xlim(0, tmax)
plt.ylim(0, 1)
plt.legend()
plt.tight_layout()
plt.show()
stat8 = cal_stat(results_8)
print(stat8['mean'])
print(stat8['sd'])
[ 0.60192302 0.39807698 295.76299912] [ 0.48950148 0.48950148 15.18151906]
Now if we turn off the binding/unbinding ($k_{on} = k_{off} = 0$), and set $g_0 = 30$ so that the system has a similar steady state of $M$. This modified system is the same as a constitutively expressed gene.
np.random.seed(101)
g0 = 30
g1 = 30
k = 0.1
kon = 0.
koff = 0.
X0 = [1, 0, 300]
tmax = 300
iter_max = 50000
# Run the Gillespie simulation
results_9 = ssa(propensity = p_func3, stoichiometry = s_func3,
X0 = X0, tmax = tmax, iter_max = iter_max,
g0=g0, g1=g1, k=k, kon=kon, koff=koff)
# Plotting
plt.figure(figsize=(10, 5))
# Plot M vs Time
plt.subplot(1, 2, 1)
plt.step(results_9[:,0], results_9[:,3], '-', color='red')
plt.xlabel('Time')
plt.ylabel('M')
plt.xlim(0, tmax)
plt.ylim(0, 600)
# Plot D and C vs Time
plt.subplot(1, 2, 2)
plt.step(results_9[:,0], results_9[:,1], '-', color='blue', label='D')
plt.step(results_9[:,0], results_9[:,2], '-', color='orange', label='C')
plt.xlabel('Time')
plt.ylabel('D / C')
plt.xlim(0, tmax)
plt.ylim(-0.1, 1.1)
plt.legend()
plt.tight_layout()
plt.show()
stat9 = cal_stat(results_9)
print(stat9['mean'])
print(stat9['sd'])
[ 1. 0. 299.72246102] [ 0. 0. 12.16520449]
/var/folders/jl/4jg34fpx0252dr9hw47z_16m0000gn/T/ipykernel_11127/1105767567.py:10: RuntimeWarning: invalid value encountered in divide error = sd / mean
Compared to the gene with self-inhibition, the noise level ($SD(M)$) is higher. It is well established that a self-inhibition reduces gene expression noise.
Now if we reduce the copy number of the system with the parameter set $g_0 = 5.5$, $g_1 = 0.5$, $k = 0.1$, $k_{on} = 0.2$, $k_{off} = 90$. With this condition,$K = \sqrt{\frac{2k_{off}}{k_{on}}} = 30$, and the initial condition $D = 1$, $C = 0$, $M = 30$.
np.random.seed(101)
g0 = 5.5
g1 = 0.5
k = 0.1
kon = 0.2
koff = 90
X0 = [1, 0, 30]
tmax = 300
iter_max = 50000
# Run the Gillespie simulation
results_10 = ssa(propensity = p_func3, stoichiometry = s_func3,
X0 = X0, tmax = tmax, iter_max = iter_max,
g0=g0, g1=g1, k=k, kon=kon, koff=koff)
# Plotting
plt.figure(figsize=(10, 5))
# Plot M vs Time
plt.subplot(1, 2, 1)
plt.step(results_10[:,0], results_10[:,3], '-', color='red')
plt.xlabel('Time')
plt.ylabel('M')
plt.xlim(0, tmax)
plt.ylim(0, 60)
# Plot D and C vs Time
plt.subplot(1, 2, 2)
plt.step(results_10[:,0], results_10[:,1], '-', color='blue', label='D')
plt.step(results_10[:,0], results_10[:,2], '-', color='orange', label='C')
plt.xlabel('Time')
plt.ylabel('D / C')
plt.xlim(0, tmax)
plt.ylim(-0.1, 1.1)
plt.legend()
plt.tight_layout()
plt.show()
stat10 = cal_stat(results_10)
print(stat10['mean'])
print(stat10['sd'])
[ 0.53671194 0.46328806 30.65252093] [0.49865041 0.49865041 5.04278226]
The stochastic dynamcis resemble oscillatory dynamics.