08D. Gillepsie algorithm¶

Mingyang Lu¶

3/11/2024¶

Intro¶

Here, we will discuss to use Gillepie algorithm to simulate the stochastic dynamics of a chemical reaction network. First, we setup a vector $\mathbf{S}$ to represent the state of the system, composed of the number of molecules of each species. For each chemical reaction $j$,

$$ \sum_i^{n_s}{\alpha_{ij}S_i} \rightarrow \sum_i^{n_s}{\beta_{ij}S_i}$$ where $n_S$ is the total number of species.

Second, we define the stoichiometry matrix $\Gamma$, where each element $\Gamma_{ij} = \beta_{ij} - \alpha_{ij}$. The stoichiometry vector $\mathbf{\gamma_j}$ for each reaction $j$ is the $j^th$ column vector of $\Gamma$.

Third, we define the reaction propensity $R_j$ as the product of its specific probability rate (from the law of mass action, $R_j = k_j \Pi_{i=1}^{n_S} S_i^{\alpha_{ij}}$) and its reaction degeneracy (see reference90283-V)).

Workflow for Gillespie algorithm¶

1. Set the initial state $\mathbf{S}$ and initialize time $t$ to 0.

2. Calculate the reaction propensities $R_j(S)$

3. Randomly pick a reaction j based on $R_j(S)$

4. Randomly pick the waiting time $\tau$ based on $R_{tot} = \sum_{j}{R_j(\mathbf{S})}$

5. Increment the simulation time $t \rightarrow t + \tau$

6. Update the state $\mathbf{S} \rightarrow \mathbf{S} + \mathbf{\gamma_j}$ reflect the fact that reaction j has occurred.

7. Return to step 2.

The following is a generic implementation of the Gillespie algorithm in Python.

In [1]:
import numpy as np
import matplotlib.pyplot as plt

# A generic Gillespie algorithm for one iteration
def gillespie_step(t, Xs, propensity, stoichiometry, **kwargs):
    p_all = propensity(Xs, **kwargs) # propensity
    p_sum = np.sum(p_all)
    if np.abs(p_sum) < 1e-9:    # if p_sum = 0, simulation stops
        return {'t': t, 'Xs': Xs, 'if_stop': True}

    dt = np.random.exponential(scale=1/p_sum)  # compute the waiting wait from an exponential distribution 
    event = np.random.choice(range(len(p_all)), p=p_all/p_sum)  # choose the event
    dx = stoichiometry(event, **kwargs)  #  find the stoichiometry vector corresponding to the selected reaction
    t_new = t + dt
    Xs_new = Xs + dx
    return {'t': t_new, 'Xs': Xs_new, 'if_stop': False}

# the Gillespie simulation
# Users need to specify:
# (1) the initial state X0 (a vector of n species)
# (2) maximum time (tmax)
# (3) maximum allowed iterations (iter_max)
# (4) other parameters required for the propensity (mostly kinetic rates)

def ssa(propensity, stoichiometry, X0, tmax, iter_max, **kwargs):
    Xs = np.array(X0)
    n = len(Xs)
    t_all = np.zeros(iter_max)
    X_all = np.zeros((iter_max, n))

    t = 0
    iter = 0
    t_all[iter] = t
    X_all[iter, :] = Xs
    if_stop = False

    while t < tmax and iter < iter_max-1:
        output = gillespie_step(t = t, Xs = Xs, propensity = propensity, 
                                stoichiometry = stoichiometry, **kwargs)
        if output['if_stop']:
            break
        iter += 1
        t = output['t']
        Xs = output['Xs']
        t_all[iter] = t
        X_all[iter, :] = Xs

    return np.column_stack((t_all[:iter], X_all[:iter, :]))

In the following, we provide a few examples of its usage.

1: A gene constitutively transcribed¶

We consider a gene $x$ with transcriptional rate $g$ and degradation rate $k$.

$$ 0 {\stackrel{g}{\rightarrow}} x {\stackrel{k}{\rightarrow}} 0 $$

The corresponding ODE of the system is

$$\frac{dx}{dt} = g - kx$$ The stochastic dynamics is a typical birth-death process.

In [2]:
# specify propensity functions
def p_func1(Xs, g, k):
    # Xs: current state
    # g, k: parameters
    x = Xs[0]
    return np.array([g, k * x])

# specify stoichiometry matrix
def s_func1(r_ind, g, k):
     # r_ind: the choice of the reaction
    s_matrix = np.array([[1],    # x adds 1
                         [-1]])  # x substracts 1
    return s_matrix[r_ind]

We first check the stochastic dynamics near the stable steady state in the deterministic scenario. The the following, we select $g = 100$ and $k = 0.1$. The deterministic steady state is $\bar{x} = g/k = 1000$. We set $x(t = 0) = \bar{x} = 1000$

In [3]:
# Parameters
g = 100
k = 0.1
np.random.seed(101)
X0 = [1000]
tmax = 100
iter_max = 20000

# Run the simulation
results_1 = ssa(propensity = p_func1, stoichiometry = s_func1, 
                   X0 = X0, tmax = tmax, iter_max = iter_max, g = g, k = k)

# Plot the results
plt.step(results_1[:,0], results_1[:, 1], color = "red", label='x')
plt.plot(results_1[:,0], np.repeat(X0, len(results_1)), color='green', label='Mean x')
plt.xlabel('Time')
plt.ylabel('x')
plt.xlim(0, tmax)
plt.ylim(800, 1200)
plt.legend()
plt.show()
No description has been provided for this image

Now if we set $g = 10$ and $k = 0.1$. $x(t = 0) = \bar{x} = 100$.

In [4]:
# Parameters
g = 10
k = 0.1
np.random.seed(101)
X0 = [100]
tmax = 100
iter_max = 2000

# Run the simulation
results_2 = ssa(propensity = p_func1, stoichiometry = s_func1, 
                   X0 = X0, tmax = tmax, iter_max = iter_max, g = g, k = k)

# Plot the results
plt.step(results_2[:,0], results_2[:, 1], color = "red", label='x')
plt.plot(results_2[:,0], np.repeat(X0, len(results_2)), color='green', label='Mean x')
plt.xlabel('Time')
plt.ylabel('x')
plt.xlim(0, tmax)
plt.ylim(50, 150)
plt.legend()
plt.show()
No description has been provided for this image

We then set $g = 1$ and $k = 0.1$. $x(t = 0) = \bar{x} = 10$.

In [5]:
# Parameters
g = 1
k = 0.1
np.random.seed(101)
X0 = [10]
tmax = 100
iter_max = 2000

# Run the simulation
results_3 = ssa(propensity = p_func1, stoichiometry = s_func1, 
                   X0 = X0, tmax = tmax, iter_max = iter_max, g = g, k = k)

# Plot the results
plt.step(results_3[:,0], results_3[:, 1], color = "red", label='x')
plt.plot(results_3[:,0], np.repeat(X0, len(results_3)), color='green', label='Mean x')
plt.xlabel('Time')
plt.ylabel('x')
plt.xlim(0, tmax)
plt.ylim(0, 20)
plt.legend()
plt.show()
No description has been provided for this image

We now evaluate the mean and standard deviation (SD) of $x$ for each simulation.

$$Mean(x) = \frac{\sum_{i=1}^{N} x(t_i)\Delta t_i}{\sum_{i=1}^{N} \Delta t_i}$$ $$Mean(x^2) = \frac{\sum_{i=1}^{N} x(t_i)^2\Delta t_i}{\sum_{i=1}^{N} \Delta t_i}$$ $$SD(x) = \frac{\sum_{i=1}^{N} [x(t_i)-Mean(x)]^2\Delta t_i}{\sum_{i=1}^{N} \Delta t_i}$$ But the simplest way is to first compute $Mean(x)$ and $Mean(x^2)$, then obtain $SD(x)$ from

$$SD(x) = \sqrt{Mean(x^2) - Mean(x)^2}$$

Often, we also evaluate the relative error (SD normalized by the mean value). Pay attention that it is incorrect to directly apply $sd()$ and $mean()$ functions to the results matrix, as the waiting times $\Delta t_i$ is not a constant. Also, if we only need to compute the statistics for mean, variance, standard deviations, etc., we can do the calculations for $\sum_{i=1}^{N} x(t_i)\Delta t_i$, $\sum_{i=1}^{N} [x(t_i)-Mean(x)]^2\Delta t_i$, and $\sum_{i=1}^{N} \Delta t_i$ during the stochastic simulation. And the time trajectories (i.e., the whole result matrix does not need to be stored).

In [6]:
def cal_stat(results_mat):
     # results_mat: 1st column: t, 2nd - (n+1)th columns: n variables for each t
    mat = results_mat[~np.isnan(results_mat).any(axis=1)]
    nt, nx = mat.shape
    dt = mat[1:, 0] - mat[:-1, 0]
    t_tot = np.sum(dt)
    mean = np.dot(np.transpose(mat[1:, 1:nx]), dt) / t_tot
    mean2 = np.dot(np.transpose(mat[1:, 1:nx]**2), dt) / t_tot
    sd = np.sqrt(mean2 - mean**2)
    error = sd / mean
    return {'mean': mean, 'sd': sd, 'error': error}
In [7]:
ss_all = np.array([1000, 100, 10])

# Example dictionaries for demonstration purposes
stat1 = cal_stat(results_1)
stat2 = cal_stat(results_2)
stat3 = cal_stat(results_3)

x_1 = np.arange(1, 1001, 1)

plt.figure(figsize=(10, 5))

# Plot SD
plt.subplot(1, 2, 1)
plt.plot(ss_all, np.concatenate((stat1['sd'], stat2['sd'], stat3['sd'])), 'o-', color='blue')
plt.plot(x_1, x_1**0.5, '-', color='green', label='5*x^0.5')
plt.xlabel('Steady state x')
plt.ylabel('SD')
plt.xlim(0, 1000)
plt.ylim(0, 30)
plt.legend()

# Plot SD/Mean
plt.subplot(1, 2, 2)
plt.plot(ss_all, np.concatenate((stat1['error'], stat2['error'], stat3['error'])), 'o-', color='blue')
plt.xlabel('Steady state x')
plt.ylabel('SD/Mean')
plt.xlim(0, 1000)
plt.ylim(0, 0.4)

plt.tight_layout()
plt.show()
No description has been provided for this image

Now, let's consider the same systems for the above three conditions but starting from the zero initial condition. The purpose is to evaluate the decaying dynamics towards the steady state.

In [8]:
# Parameters
g = 100
k = 0.1
np.random.seed(101)
X0 = [0]
tmax = 100
iter_max = 20000

# Run the simulation
results_1 = ssa(propensity = p_func1, stoichiometry = s_func1, 
                   X0 = X0, tmax = tmax, iter_max = iter_max, g = g, k = k)

# Plot the results
plt.step(results_1[:,0], results_1[:, 1], color = "red", label='x')
plt.xlabel('Time')
plt.ylabel('x')
plt.xlim(0, tmax)
plt.ylim(0, 1200)

x_1 = results_1[:, 0]
analytical_solution = g / k * (1 - np.exp(-k * x_1))
plt.plot(x_1, analytical_solution, '-', color='green', label='Analytical')

plt.legend()
plt.show()
No description has been provided for this image
In [9]:
# Parameters
g = 10
k = 0.1
np.random.seed(101)
X0 = [0]
tmax = 100
iter_max = 2000

# Run the simulation
results_2 = ssa(propensity = p_func1, stoichiometry = s_func1, 
                   X0 = X0, tmax = tmax, iter_max = iter_max, g = g, k = k)

# Plot the results
plt.step(results_2[:,0], results_2[:, 1], color = "red", label='x')
plt.xlabel('Time')
plt.ylabel('x')
plt.xlim(0, tmax)
plt.ylim(0, 150)

x_2 = results_2[:, 0]
analytical_solution = g / k * (1 - np.exp(-k * x_2))
plt.plot(x_2, analytical_solution, '-', color='green', label='Analytical')

plt.legend()
plt.show()
No description has been provided for this image
In [10]:
# Parameters
g = 1
k = 0.1
np.random.seed(101)
X0 = [0]
tmax = 100
iter_max = 2000

# Run the simulation
results_3 = ssa(propensity = p_func1, stoichiometry = s_func1, 
                   X0 = X0, tmax = tmax, iter_max = iter_max, g = g, k = k)

# Plot the results
plt.step(results_3[:,0], results_3[:, 1], color = "red", label='x')
plt.xlabel('Time')
plt.ylabel('x')
plt.xlim(0, tmax)
plt.ylim(0, 15)

x_3 = results_3[:, 0]
analytical_solution = g / k * (1 - np.exp(-k * x_3))
plt.plot(x_3, analytical_solution, '-', color='green', label='Analytical')

plt.legend()
plt.show()
No description has been provided for this image

2: Bursting transcription¶

We consider a gene $x$ with transcriptional rate $g/n$, but n molecules every reaction, and degradation rate $k$.

$$ 0 {\stackrel{g/n}{\rightarrow}} nx$$

$$x {\stackrel{k}{\rightarrow}} 0 $$

The corresponding ODE of the system is

$$\frac{dx}{dt} = ng/n - kx = g - kx$$

In [11]:
# Define the propensity function
def p_func2(Xs, g, k, n):
    # Xs: current state
    # g, k, n: parameters
    x = Xs[0]
    return np.array([g / n, k * x])

# Define the stoichiometry function
def s_func2(r_ind, g, k, n):
     # r_ind: the choice of the reaction
    s_matrix = np.array([[n],    # x adds n
                         [-1]])  # x substracts 1
    return s_matrix[r_ind]

We check the stochastic dynamics near the stable steady state in the deterministic scenario. We start with $g = 16$, $k = 0.1$, and $n = 1$. The deterministic steady state is $\bar{x} = g/k = 160$. We set $x(t = 0) = \bar{x} = 160$

In [12]:
np.random.seed(91)
g = 16
k = 0.1
n = 1
X0 = np.array([160])
tmax = 1000
iter_max = 50000

results_4 = ssa(propensity = p_func2, stoichiometry = s_func2, 
                   X0 = X0, tmax = tmax, iter_max = iter_max, g = g, k = k, n = n)

# Plot the results
plt.step(results_4[:,0], results_4[:, 1], color = "red", label='x')
plt.plot(results_4[:,0], np.repeat(X0, len(results_4)), color='green', label='Mean x')
plt.xlabel('Time')
plt.ylabel('x')
plt.xlim(0, tmax)
plt.ylim(100, 220)

plt.legend()
plt.show()
No description has been provided for this image

Now if we set $g = 16$, $k = 0.1$, and $n = 2$. The steady state level remains the same. $x(t = 0) = \bar{x} = 160$.

In [13]:
g = 16
k = 0.1
n = 2
X0 = np.array([160])
tmax = 2000
iter_max = 500000

results_5 = ssa(propensity = p_func2, stoichiometry = s_func2, 
                   X0 = X0, tmax = tmax, iter_max = iter_max, g = g, k = k, n = n)

# Plot the results
plt.step(results_5[:,0], results_5[:, 1], color = "red", label='x')
plt.plot(results_5[:,0], np.repeat(X0, len(results_5)), color='green', label='Mean x')
plt.xlabel('Time')
plt.ylabel('x')
plt.xlim(0, tmax)
plt.ylim(100, 220)

plt.legend()
plt.show()
No description has been provided for this image

We then set $g = 16$, $k = 0.1$, $n = 4$. $x(t = 0) = \bar{x} = 160$.

In [14]:
g = 16
k = 0.1
n = 4
X0 = np.array([160])
tmax = 2000
iter_max = 500000

results_6 = ssa(propensity = p_func2, stoichiometry = s_func2, 
                   X0 = X0, tmax = tmax, iter_max = iter_max, g = g, k = k, n = n)

# Plot the results
plt.step(results_6[:,0], results_6[:, 1], color = "red", label='x')
plt.plot(results_6[:,0], np.repeat(X0, len(results_6)), color='green', label='Mean x')
plt.xlabel('Time')
plt.ylabel('x')
plt.xlim(0, tmax)
plt.ylim(100, 220)

plt.legend()
plt.show()
No description has been provided for this image

We then set $g = 16$, $k = 0.1$, $n = 8$. $x(t = 0) = \bar{x} = 160$.

In [15]:
g = 16
k = 0.1
n = 8
X0 = np.array([160])
tmax = 2000
iter_max = 500000

results_7 = ssa(propensity = p_func2, stoichiometry = s_func2, 
                   X0 = X0, tmax = tmax, iter_max = iter_max, g = g, k = k, n = n)

# Plot the results
plt.step(results_7[:,0], results_7[:, 1], color = "red", label='x')
plt.plot(results_7[:,0], np.repeat(X0, len(results_7)), color='green', label='Mean x')
plt.xlabel('Time')
plt.ylabel('x')
plt.xlim(0, tmax)
plt.ylim(100, 220)

plt.legend()
plt.show()
No description has been provided for this image
In [16]:
n_all = np.array([1, 2, 4, 8])

# Example dictionaries for demonstration purposes
stat4 = cal_stat(results_4)
stat5 = cal_stat(results_5)
stat6 = cal_stat(results_6)
stat7 = cal_stat(results_7)

x_1 = np.arange(1, 1001, 1)

plt.figure(figsize=(10, 5))

# Plot SD
plt.subplot(1, 2, 1)
plt.plot(n_all, np.concatenate((stat4['mean'], stat5['mean'], stat6['mean'], stat7['mean'])), 'o-', color='blue')
plt.xlabel('n')
plt.ylabel('Mean')
plt.xlim(0, 9)
plt.ylim(0, 200)

# Plot SD/Mean
plt.subplot(1, 2, 2)
plt.plot(n_all, np.concatenate((stat4['sd'], stat5['sd'], stat6['sd'], stat7['sd'])), 'o-', color='blue')
plt.xlabel('n')
plt.ylabel('SD')
plt.xlim(0, 9)
plt.ylim(0, 30)

plt.tight_layout()
plt.show()
No description has been provided for this image

3: Stochastic dynamics of a self-inhibiting gene¶

We consider a model of a self-inhibiting gene. The model is the same as the previous model for a self-activating gene, but with the opposite $g_0$ and $g_1$.

In [17]:
# Define the propensity function
def p_func3(Xs, g0, g1, k, kon, koff):
    d = Xs[0]  # D (0: bound; 1: unbound)
    c = Xs[1]  # C (0: unbound; 1: bound)
    m = Xs[2]  # M (number of M)
    return np.array([g0 * d, k * m, kon * d * m * (m - 1) / 2, koff * c, g1 * c])

# Define the stoichiometry function
def s_func3(r_ind, g0, g1, k, kon, koff):
    # r_ind: the choice of the reaction
    s_matrix = np.array([[0, 0, 1],    # M adds 1
                         [0, 0, -1],   # M subtracts 1
                         [-1, 1, -2],  # C adds 1, D subtracts 1, and M subtracts 2
                         [1, -1, 2],   # C subtracts 1, D adds 1, and M adds 2
                         [0, 0, 1]])  # M adds 1
    return s_matrix[r_ind]

We set $g_0 = 55$, $g_1 = 5$, $k = 0.1$, $k_{on} = 0.002$, $k_{off} = 90$. With this condition,$K = \sqrt{\frac{2k_{off}}{k_{on}}} = 300$. This is the scenario of high copy number and fast binding/unbinding.

For the initial condition, we first consider $D = 1$, $C = 0$, $M = 300$.

In [18]:
np.random.seed(101)
g0 = 55
g1 = 5
k = 0.1
kon = 0.002
koff = 90
X0 = [1, 0, 300]
tmax = 400
iter_max = 50000

# Run the Gillespie simulation
results_8 = ssa(propensity = p_func3, stoichiometry = s_func3, 
                X0 = X0, tmax = tmax, iter_max = iter_max,
                g0=g0, g1=g1, k=k, kon=kon, koff=koff)

# Plotting
plt.figure(figsize=(10, 5))

# Plot M vs Time
plt.subplot(1, 2, 1)
plt.step(results_8[:,0], results_8[:,3], '-', color='red')
plt.xlabel('Time')
plt.ylabel('M')
plt.xlim(0, tmax)
plt.ylim(0, 600)

# Plot D and C vs Time
plt.subplot(1, 2, 2)
plt.step(results_8[:,0], results_8[:,1], '-', color='blue', label='D')
plt.step(results_8[:,0], results_8[:,2], '-', color='orange', label='C')
plt.xlabel('Time')
plt.ylabel('D / C')
plt.xlim(0, tmax)
plt.ylim(0, 1)
plt.legend()

plt.tight_layout()
plt.show()
No description has been provided for this image
In [19]:
stat8 = cal_stat(results_8)
print(stat8['mean'])
print(stat8['sd'])
[  0.60192302   0.39807698 295.76299912]
[ 0.48950148  0.48950148 15.18151906]

Now if we turn off the binding/unbinding ($k_{on} = k_{off} = 0$), and set $g_0 = 30$ so that the system has a similar steady state of $M$. This modified system is the same as a constitutively expressed gene.

In [20]:
np.random.seed(101)
g0 = 30
g1 = 30
k = 0.1
kon = 0.
koff = 0.
X0 = [1, 0, 300]
tmax = 300
iter_max = 50000

# Run the Gillespie simulation
results_9 = ssa(propensity = p_func3, stoichiometry = s_func3, 
                X0 = X0, tmax = tmax, iter_max = iter_max,
                g0=g0, g1=g1, k=k, kon=kon, koff=koff)

# Plotting
plt.figure(figsize=(10, 5))

# Plot M vs Time
plt.subplot(1, 2, 1)
plt.step(results_9[:,0], results_9[:,3], '-', color='red')
plt.xlabel('Time')
plt.ylabel('M')
plt.xlim(0, tmax)
plt.ylim(0, 600)

# Plot D and C vs Time
plt.subplot(1, 2, 2)
plt.step(results_9[:,0], results_9[:,1], '-', color='blue', label='D')
plt.step(results_9[:,0], results_9[:,2], '-', color='orange', label='C')
plt.xlabel('Time')
plt.ylabel('D / C')
plt.xlim(0, tmax)
plt.ylim(-0.1, 1.1)
plt.legend()

plt.tight_layout()
plt.show()
No description has been provided for this image
In [21]:
stat9 = cal_stat(results_9)
print(stat9['mean'])
print(stat9['sd'])
[  1.           0.         299.72246102]
[ 0.          0.         12.16520449]
/var/folders/jl/4jg34fpx0252dr9hw47z_16m0000gn/T/ipykernel_11127/1105767567.py:10: RuntimeWarning: invalid value encountered in divide
  error = sd / mean

Compared to the gene with self-inhibition, the noise level ($SD(M)$) is higher. It is well established that a self-inhibition reduces gene expression noise.

Now if we reduce the copy number of the system with the parameter set $g_0 = 5.5$, $g_1 = 0.5$, $k = 0.1$, $k_{on} = 0.2$, $k_{off} = 90$. With this condition,$K = \sqrt{\frac{2k_{off}}{k_{on}}} = 30$, and the initial condition $D = 1$, $C = 0$, $M = 30$.

In [22]:
np.random.seed(101)
g0 = 5.5
g1 = 0.5
k = 0.1
kon = 0.2
koff = 90
X0 = [1, 0, 30]
tmax = 300
iter_max = 50000

# Run the Gillespie simulation
results_10 = ssa(propensity = p_func3, stoichiometry = s_func3, 
                X0 = X0, tmax = tmax, iter_max = iter_max,
                g0=g0, g1=g1, k=k, kon=kon, koff=koff)

# Plotting
plt.figure(figsize=(10, 5))

# Plot M vs Time
plt.subplot(1, 2, 1)
plt.step(results_10[:,0], results_10[:,3], '-', color='red')
plt.xlabel('Time')
plt.ylabel('M')
plt.xlim(0, tmax)
plt.ylim(0, 60)

# Plot D and C vs Time
plt.subplot(1, 2, 2)
plt.step(results_10[:,0], results_10[:,1], '-', color='blue', label='D')
plt.step(results_10[:,0], results_10[:,2], '-', color='orange', label='C')
plt.xlabel('Time')
plt.ylabel('D / C')
plt.xlim(0, tmax)
plt.ylim(-0.1, 1.1)
plt.legend()

plt.tight_layout()
plt.show()
No description has been provided for this image
In [23]:
stat10 = cal_stat(results_10)
print(stat10['mean'])
print(stat10['sd'])
[ 0.53671194  0.46328806 30.65252093]
[0.49865041 0.49865041 5.04278226]

The stochastic dynamcis resemble oscillatory dynamics.