BEBI103 Winter 2020

"a Bayesian is one who, vaguely expecting a horse,
and catching a glimpse of a donkey, strongly believes she has seen a mule" ...

A Bayesian Apprach

The following is a continuation of our study of individual Caulobacter Crescentus cell division; the first part can be found here. A much more conscientious analysis can be performed by constructing a hierarchical model that captures the specific framework of the experiment and the livelihood of our caulobacter. The way in which we define the parameters, along with their actual numerical results, will ultimately hold more meaning and make for a slicker, stronger inference.

When we first look at our data, bacterium I undergoes ~20 growth events while bacterium II ~90. Previously, to capture this nuance, we divided our analysis for the two sets of data, and gained a greater understanding for parameters that were very specific to the cells that were being studied. This makes the notion of generalizing these parameters to the entire species require quite the cognitive leap. Comparing the resulting parameters between the cells was awkard as well, given their massive difference in the number of collected growth events. This separation is not so obscene in this experiment since we are truly interested in the qualitative notions of linearity vs. exponentiality, and not so much the actual values of $a_0$ and $k$ themselves. However, we do not have to submit ourselves to such masochism! A bayesian approach will understand the additional layer of two distinct cells, along with the fact that they both inherit more global parameters, and will ultimately help guide our reasoning with more meaningful, more reflective, and more generalizable results.

Hierarchical Model

From before, we are considering the linear and exponential models with initial size $a_0$ and growth factor $k$. Each growth event should have its own $a_o$ along with its own $k$ since each event starts out at a different initial size, and there is perhaps a specific set of processes that work to collectively slow down or accelerate division in that particular growth cycle. We further expect the growth cycles' parameters to be distributed around some location and scale parameter of a parent $a_o$ and $k$ belonging to the cell itself, each bacterium with its own set. We can further then model the set of location and scale parameters belonging to each bacterium to be drawn from the global hyperparameters belonging to the species as a whole. So we have something that looks like this:

The chosen prior distributions, in shape and range, are discussed below $\to$ click click click!

PRIOR DISTRIBUTION FOR $a_0$

\begin{align} ao_{\mathrm{I}i} &\sim \mathrm{Normal}(\mu_{ao\mathrm{I}}, \sigma_{ao\mathrm{I}})\\[0.5em] ao_{\mathrm{II}i} &\sim \mathrm{Normal}(\mu_{ao\mathrm{II}}, \sigma_{ao\mathrm{II}})\\[1em] \mu_{ao\mathrm{I}} &\sim \mathrm{LogNormal}(0.15, 0.15) \\[0.5em] \mu_{ao\mathrm{II}} &\sim \mathrm{LogNormal}(0.15, 0.15) \\[0.5em] \sigma_{ao} &\sim \mathrm{Exponential}(20) \\[0.5em] \end{align}

PRIOR DISTRIBUTION FOR $k$

$$ \begin{align} k_{\mathrm{I} i} &\sim \mathrm{Normal}(\mu_{k\mathrm{I}}, \sigma_{k\mathrm{I}})\\[0.5em] k_{\mathrm{II} i} &\sim \mathrm{Normal}(\mu_{k\mathrm{II}}, \sigma_{k\mathrm{II}})\\[1em] \mu_{k\mathrm{I}} &\sim \mathrm{Normal}(0.0116, 0.0032) \\[0.5em] \mu_{k\mathrm{II}} &\sim \mathrm{Normal}(0.0116, 0.0032) \\[0.5em] \sigma_{k} &\sim \mathrm{Exponential}(1000) \\[0.5em] \end{align} $$

PRIOR DISTRIBUTION FOR $\sigma$

$$ \begin{align} \sigma_{\mathrm{I}i} &\sim \mathrm{Normal}(\mu_{\sigma}, \sigma_{\sigma})\\[0.5em] \sigma_{\mathrm{II}i} &\sim \mathrm{Normal}(\mu_{\sigma}, \sigma_{\sigma})\\[1em] \mu_{\sigma} &\sim \mathrm{Exponential}(100)\\[0.5em] \sigma_{\sigma} &\sim \mathrm{Exponential}(500)\\[0.5em] \end{align} $$

Prior Predictives: any funny funnels?

Now that we have our model, let's sample from it and

check if we can sample informatively from these priors

check if our priors can generate our data

But before we do so, we will noncenter our parameters. The idea here is that our model is sampling parameters from distributions (exponentials) that drop at varying speeds (different betas), so we will likely run into areas of high curvatures in our higher-dimensional parameter space. Noncentering will allow our sampler to walk along smoother surfaces by introducing an intermediate Gaussian. We can later manually rescale and transform these noncentered parameters back, recovering our original parameters of interest. We have the following:

Our will be relatively simple.


        functions {
            real a_theor(real ao, real k) {
                return ao + k*(100);
                /*return ao * exp(k*100); */
            }
        }

        parameters {
            real mu_ao;
            real sig_ao;
            real sig_k;
            real mu_sigma;
            real sig_sigma;
            
            real ao_tilde;
            real k_tilde;
            real mu_k_tilde;
            real sigma_tilde;
            real a_tilde;
        }

        transformed parameters {
            real ao;
            real mu_k;
            real k;
            real sigma;
            real mu;
            real a;
            
            ao    = mu_ao    + ao_tilde * sig_ao;
            mu_k  = 0.0116   + 0.0032  * mu_k_tilde;
            k     = mu_k     + k_tilde * sig_k;
            sigma = mu_sigma + sigma_tilde * sig_sigma;
            mu    = a_theor(ao, k);
            a     = mu       + a_tilde * sigma;
        }

        model {
            ao_tilde     ~ normal(0,1);
            mu_ao        ~ lognormal(0.15,0.15);
            sig_ao       ~ exponential(20);
            
            k_tilde      ~ normal(0,1);
            mu_k_tilde   ~ normal(0,1);
            sig_k        ~ exponential(1500);
            
            sigma_tilde  ~ normal(0,1);
            mu_sigma     ~ exponential(100);
            sig_sigma    ~ exponential(500);
            
            a_tilde      ~ normal(0, 1);
        }

When checking our priors, Stan is consistently drawing from normal distributions (0, 1) and then transforming our parameters. Note that all our tilde terms should be unbounded. They are being drawn from normals that should allow for negative values.

To study funnels in particular, we'll look at each parameter projected along its σ.

Bokeh Plot

No funny funnels! Some nice smooth fans. We are getting some negative values for $k$; it turns out a fraction of 0.002 are. I am willing to accept this as a tradeoff for letting $k$'s sigma be able to reach 0.02 or a tad higher, but both $a_0$ and μ are 100% positive!

DIAGNOSIS

REALITY CHECK

Underlying Analysis

Here's a broad picture of how the analysis will go: if growth is exponential, we will observe a smaller $k$ from the exponential model, because the linear model has to compensate for the final growth areas with a higher $k$. On the other hand, if growth is linear, we will observe higher $k$'s from the exponential model. The predominant values are in the near-linear region, and it will try to fit the initial flatter parts to a more linear function. Try it!

remark: posterior .nc files did not survive the massive post-term file-cleanse of 2020, so interactivity stops here :( plots were pulled from original hw10.1.html file

Posterior Sampling: and so we begin...

These and stan files were used to perform our sampling.


      functions {
          real a_theor_exp(real ao, real k, real t) {
              return ao * exp(k*t);}
          real a_theor_lin(real ao, real k, real t) {
              return ao + k*t;}
      }

      data {
          int len_df1;
          int len_df2;
          int nI_cycles;
          int nII_cycles;
          
          real timesI[len_df1];
          real areasI[len_df1];
          int cyclesI[len_df1];
          
          real timesII[len_df2];
          real areasII[len_df2];
          int cyclesII[len_df2];
      }

      parameters {
          real sigma_tilde;
          real<lower=0, upper=0.06> mu_sigma;
          real<lower=0, upper=0.1> sig_sigma;
          
          real<lower=0, upper=0.3> sig_ao;
          real<lower=0, upper=0.002> sig_k;
          
          real<lower=0> mu_aoI;
          real<lower=0> mu_aoII;
          real mu_kI_tilde;
          real mu_kII_tilde;
          
          real aoI_tilde[nI_cycles];
          real aoII_tilde[nII_cycles];
          real kI_tilde[nI_cycles];
          real kII_tilde[nII_cycles];
      }

      transformed parameters {
          real sigma;
          real mu_kI;
          real mu_kII;
          real aoI[nI_cycles];
          real aoII[nII_cycles];
          real kI[nI_cycles];
          real kII[nII_cycles];
          real muI[len_df1];
          real muII[len_df2];
          
      
          sigma = mu_sigma + sigma_tilde * sig_sigma;
          mu_kI = 0.0116 + 0.0032 * mu_kI_tilde;
          mu_kII = 0.0116 + 0.0032 * mu_kII_tilde;
          
          for (i in 1:nI_cycles) {
              aoI[i] = mu_aoI + aoI_tilde[i] * sig_ao;
              kI[i] = mu_kI + kI_tilde[i] * sig_k;}
          for (i in 1:nII_cycles) {
              aoII[i] = mu_aoII + aoII_tilde[i] * sig_ao;
              kII[i] = mu_kII + kII_tilde[i] * sig_k;}
          
          for (i in 1:len_df1) {
              muI[i] = a_theor_lin(aoI[cyclesI[i]+1],
                                      kI[cyclesI[i]+1],
                                      timesI[i]);}
          for (i in 1:len_df2) {
              muII[i] = a_theor_lin(aoII[cyclesII[i]+1],
                                      kII[cyclesII[i]+1],
                                      timesII[i]);}
      }

      model {
          sigma_tilde ~ normal(0,1);
          mu_sigma ~ exponential(100);
          sig_sigma ~ exponential(500);
          
          sig_ao ~ exponential(20);
          sig_k ~ exponential(1200);
          
          mu_aoI ~ lognormal(0.15,0.15);
          mu_aoII ~ lognormal(0.15,0.15);
          mu_kI_tilde ~ normal(0,1);
          mu_kII_tilde ~ normal(0,1);
          
          for (i in 1:nI_cycles) {
              aoI_tilde[i] ~ normal(0,1);
              kI_tilde[i] ~ normal(0,1);}
          
          for (i in 1:nII_cycles) {
              aoII_tilde[i] ~ normal(0,1);
              kII_tilde[i] ~ normal(0,1);}
          
          areasI ~ normal(muI, sigma);

      generated quantities {
          real areasI_post[len_df1];
          real areasII_post[len_df2];
          
          areasI_post = normal_rng(muI, sigma);
          areasII_post = normal_rng(muII, sigma);
      }


          functions {
              real a_theor_exp(real ao, real k, real t) {
                  return ao * exp(k*t);}
              real a_theor_lin(real ao, real k, real t) {
                  return ao + k*t;}
          }

          data {
              int len_df1;
              int len_df2;
              int nI_cycles;
              int nII_cycles;
              
              real timesI[len_df1];
              real areasI[len_df1];
              int cyclesI[len_df1];
              
              real timesII[len_df2];
              real areasII[len_df2];
              int cyclesII[len_df2];
          }

          parameters {
              real sigma_tilde;
              real<lower=0, upper=0.06> mu_sigma;
              real<lower=0, upper=0.1> sig_sigma;
              
              real<lower=0, upper=0.3> sig_ao;
              real<lower=0, upper=0.002> sig_k;
              
              real<lower=0> mu_aoI;
              real<lower=0> mu_aoII;
              real mu_kI_tilde;
              real mu_kII_tilde;
              
              real aoI_tilde[nI_cycles];
              real aoII_tilde[nII_cycles];
              real kI_tilde[nI_cycles];
              real kII_tilde[nII_cycles];
          }

          transformed parameters {
              real sigma;
              real mu_kI;
              real mu_kII;
              real aoI[nI_cycles];
              real aoII[nII_cycles];
              real kI[nI_cycles];
              real kII[nII_cycles];
              real muI[len_df1];
              real muII[len_df2];
              
          
              sigma = mu_sigma + sigma_tilde * sig_sigma;
              mu_kI = 0.0116 + 0.0032 * mu_kI_tilde;
              mu_kII = 0.0116 + 0.0032 * mu_kII_tilde;
              
              for (i in 1:nI_cycles) {
                  aoI[i] = mu_aoI + aoI_tilde[i] * sig_ao;
                  kI[i] = mu_kI + kI_tilde[i] * sig_k;}
              for (i in 1:nII_cycles) {
                  aoII[i] = mu_aoII + aoII_tilde[i] * sig_ao;
                  kII[i] = mu_kII + kII_tilde[i] * sig_k;}
              
              for (i in 1:len_df1) {
                  muI[i] = a_theor_exp(aoI[cyclesI[i]+1],
                                          kI[cyclesI[i]+1],
                                          timesI[i]);}
              for (i in 1:len_df2) {
                  muII[i] = a_theor_exp(aoII[cyclesII[i]+1],
                                          kII[cyclesII[i]+1],
                                          timesII[i]);}
          }

          model {
              sigma_tilde ~ normal(0,1);
              mu_sigma ~ exponential(100);
              sig_sigma ~ exponential(500);
              
              sig_ao ~ exponential(20);
              sig_k ~ exponential(1200);
              
              mu_aoI ~ lognormal(0.15,0.15);
              mu_aoII ~ lognormal(0.15,0.15);
              mu_kI_tilde ~ normal(0,1);
              mu_kII_tilde ~ normal(0,1);
              
              for (i in 1:nI_cycles) {
                  aoI_tilde[i] ~ normal(0,1);
                  kI_tilde[i] ~ normal(0,1);}
              
              for (i in 1:nII_cycles) {
                  aoII_tilde[i] ~ normal(0,1);
                  kII_tilde[i] ~ normal(0,1);}
              
              areasI ~ normal(muI, sigma);
              
              
          generated quantities {
              real areasI_post[len_df1];
              real areasII_post[len_df2];
              
              areasI_post = normal_rng(muI, sigma);
              areasII_post = normal_rng(muII, sigma);
          }

Here are some diagnostics:

divergences_[1]: 0 for both linear and exponential

rhat_[2]: less than 1.1 for all parameters

tail-ESS_[3]: low (100-400) for centers of $a_0$ and $k$ in both bacteria in the linear model; only parameter in the exponential model that dipped below 400 was the center for $a_0$ of bacteria II. Due to time constraints, we will move on and make sure to thin our samples.

PARCOORDS

TRACE

SIGMA PROJECTIONS

you have potheads, potterheads, and plotterheads...

CENTERS AND SPREADS FOR $a_0$ & $k$:

Combining linear and exponential parameters on the same plot gives us an idea of how consistently the exponential model gives us a smaller spread, meaning that its parameters have to work less hard to account for our data. Although this is just a sliced view of samples from one cycle, this general pattern was observed for all cycles. What's interesting to note is that the growth constant $k$ appears to be bimodal in the linear model. Bimodality within a single growth cycle is... concerning.

Inspired by some starry plots from last term, we can visualize all growth cycles together by plotting both parameters' high posterior densities from 1000 stan samples. Each bar represents one growth cycle, with intrsecting medians of $a_0$ and $k$.

Let's look at our parameters in a more quantitative way. An interesting observation we can make from these ecdf's is that these two bacteria grow differently, in the sense that, they both start division at different parts along the exponential curve. This can be seen by how the distance between the linear and exponential $a_0$ varies between bacteria I and II. If the linear model has to work harder (greater distance), we are in a steeper region of the curve.

Note that in relation to our prior sampling scatterplot above, our posterior parameters have shrunk considerably. We are learning from the data!

LOCATION OF $\mu_\sigma$:

Finally, the last parameter of interest is the center of our scale. Exponential wins!

(1)
Divergences:
MCMC approximates the target distribution (a complicated integral) by instead discretely summing over all states of Markov chains. Strong geometric ergodicity is necessary for fast convergence (fast by our computational standards and resources); however, actual proof of ergodicity is often implausible. Instead, after given our input parameters for how we choose to discretize the sampler, we can look for instances where our chain exhibits symptoms of straying from ergodicity in the form of divergences, where high curvatures in our distribution make for a sticky terrain the chain cannot properly explore.
As divergences serve as a lower bound for ergodicity, some divergences aren't actually instances of ergodic failures. Decreasing the step size (increasing adapt_delta) minimizes this mislabeling. In the case that decreasing step size does not reduce divergences and they remain fairly constant, it is likely the case that the Hamiltonian transition itself is not ergodic in the first place. Ironically, if lowering the step size increases your divergences, this means you are sampling deeper and deeper into the funnel. Something that keeps us from going down the rabbit hole is noncentering! ➰

(2)
r-hat:
After taking a weighted average of variances between B and within W chains, r-hat measures $\sqrt{1+B/W}$, for which ergodic processes goes to 1 as $N \to \infty$. To reflect a state of convergence within each chain, the chains are split in half, the first link then compared to the second. This is effectively a measure of how well the chains have mixed. Intuitively, when chains don't mix well, their between variances will be grossly larger than individual chains, the reason being that all else held equal, each chain is different only in its initialization in different regions of parameter space. With proper mixing, each chain should find its steady state on similar order of magnitudes timescales, and mimic true independent sampling, where the samples aren't able to recognize which chain it came from. The faster the chains mix, the faster their previous dependencies decay. Formally, an r-hat of less than 1.1 is thought to be sufficient, however 1.01 is a stronger threshold. ➰

(3)
ess-tail:
By construction, there will be some lingering anticorrelation within a chain. To estimate what constitutes an effective sample size, ess-bulk studies the center, while ess-tail explores convergence properties of the boundary values (5% and 95% quantiles). This is useful in recognizing how distribution tails of varying magnitudes can affect our sampler. Note that these diagnostic measures are human-designed probes to try and diagnose misbehavior; there are many models which escape detection, and many more that require more careful thought. ➰

A Bayesian Apprach

Hierarchical Model

PRIOR DISTRIBUTION FOR \(a_0\)

Sinusoidal Suspicions

Shearing Transforms

Bounding \(a_0\)

Bounding \(\sigma_{ao}\)

PRIOR DISTRIBUTION FOR \(k\)

PRIOR DISTRIBUTION FOR \(\sigma\)