PLN models

class: center, middle, inverse, title-slide

# PLN models
### Mahendra Mariadassou, INRAE
### Shandong University, Weihai (CN)<br>Summer School 2021

---

---
class: middle

# Reminder

.pull-left[
- Interest for .alert[multivariate count data] 
- Counts in ecology have .alert[peculiar] features:
  - sparse (.grey[grey cell] = null count)
  - overdispersed 
  - not centered around a central location parameter
- Example from Jakuschkin, Fievet, Schwaller, Fort, Robin, and Vacher [Jak+16]  
  
<img src="04_pln_models_files/figure-html/oaks_histo-1.png" width="800px" />
  
]

.pull-right[

![](04_pln_models_files/figure-html/unnamed-chunk-1-1.png)
]

---
# PLN models

1. Multivariate Normal
1. The Log Normal distribution 
1. The Poisson Log Normal distribution
1. Variational Approximation
1. Parameter Estimation

---
class: inverse, center, middle

# Multivariate Normal

---

## Multivariate Normal distribution (MVN)

The .alert[multivariate normal distribution] of a p-dimensional random vector `$\bY = (Y_1,\ldots, Y_p)$` can be written in the following notation:

$$
\bY \sim \N(\bmu, \bSigma) \quad \text{or sometimes} \quad \N_p(\bmu, \bSigma)
$$
where `$\bmu$` is the mean vector of `$\bY$`  
$$
\bmu = E[\bY] = (E[Y_1], \dots, E[Y_p])
$$
and `$\bSigma$` is the `$p \times p$` variance matrix

$$
`\begin{equation}
\bSigma = (\Sigma_{jj'})_{j,j'} \; \text{ where } \; \Sigma_{j,j'} = Cov(Y_j, Y_{j'})
\end{equation}`
$$

The density of `$Y$` is given by 
$$
`\begin{align}
P(Y; \bmu, \bSigma) & = \frac{1}{\sqrt{(2\pi)^p |\bSigma|}} \exp\left( -\frac{(\bY - \bmu)^\top \bSigma^{-1} (\bY - \bmu)}{2} \right) 
\end{align}`
$$
and for any `$\bx \in \Rbb^p$`, `$\bx^\top \bY \sim \N(\bx^\top \bmu, \bx^\top \bSigma \bx)$`
---
## Moments of the MVN (I)

- Let `$\bOmega = \bSigma^{-1}$` and note `$\bfeta_1 = \bSigma^{-1} \bmu$` and `$\bfeta_2 = -\bSigma^{-1}/2$`. 
- Note that for any vector `$\bx$` and symmetric matrix `$\bA$`, 
  - `$\bx^\top A \bx = \langle \bx \bx^\top, \bA\rangle$` where `$\langle \bA, \bB \rangle = Tr(\bA^\top \bB)$` is the matrix dot product.

$$
`\begin{align}
P(Y; \bmu, \bSigma) & = \frac{1}{\sqrt{(2\pi)^p |\bSigma|}} \exp\left( -\frac{(\bY - \bmu)^\top \bSigma^{-1} (\bY - \bmu)}{2} \right) \\
& = \frac{1}{(2\pi)^{p/2}} \exp\left( \bY^\top \bSigma^{-1} \bmu -\frac{1}{2}\bY^T \bSigma^{-1}\bY - \frac{1}{2}\bmu^\top\bSigma^{-1}\bmu - \frac{1}{2}\log|\bSigma|\right) \\
& = \frac{1}{(2\pi)^{p/2}} \exp\left( \bY^\top \bSigma^{-1} \bmu -\frac{1}{2}\langle \bY \bY^T, \bSigma^{-1}\rangle - \frac{1}{2}\bmu^\top\bSigma^{-1}\bmu - \frac{1}{2}\log|\bSigma|\right) \\
& = \frac{1}{(2\pi)^{p/2}} \exp\left( \bY^\top \bfeta_1 + \langle \bY \bY^T, \bfeta_2 \rangle +  \frac{1}{4}\bfeta_1^\top\bfeta_2\bfeta_1 + \frac{1}{2}\log|-2\bfeta_2|\right) \\
\end{align}`
$$
In particular, the MVN is an exponential family with natural parameters `$(\bfeta_1, \bfeta_2) = (\bOmega\bmu, -\bOmega/2)$`, sufficient statistics `$(\bY, \bY\bY^\top)$` and log-normalizer `$A(\bfeta_1, \bfeta_2) = -\frac{1}{4}\bfeta_1^\top\bfeta_2\bfeta_1 - \frac{1}{2}\log|-2\bfeta_2|$`.

---
## Moments of the MVN (II)

By definition of the MVN, `$E[\bY]$` and `$V[\bY]$` are given by:
$$
E[\bY] = \bmu \quad \text{ and } V[\bY] = \bSigma
$$
By definition of the variance matrix, 
$$
V[\bY] = E[\bY \bY^\top] - E[\bY]E[\bY]^\top \Rightarrow E[\bY \bY^\top] = \bSigma + \bmu\bmu^\top
$$

In addition, for all `$\bx \in \Rbb^p$`, the moment generating function of the MVN is given by

$$
E[\exp(\bx^\top \bY)] = \exp\left( \bx^\top \bmu + \frac{1}{2} \bx^t \bSigma \bx\right)
$$

The last useful equality is that for `$\bx$` and symmetric matrix `$\bS$`

$$
E[(\bY - \bx)^\top \bS (\bY - \bx)] = (\bmu - \bx)^\top \bS (\bmu - \bx) + Tr(\bSigma \bS)
$$

---
## MGF of the MVN: demo

Based on the MGF of a univariate gaussian function

---
class: inverse, center, middle

# Log Normal distribution

---

## Log normal (LN) distribution

A random variable `$\bY$` is .alert[multivariate lognormal] with parameters `$(\bmu, \bSigma)$`, noted `$\bY \sim \LN(\bmu, \bSigma)$`, iff 
- `$\bY$` has positive components
- the vector `$\bZ = \log(\bY) = (\log(Y_1), \dots, \log(Y_p))$` is gaussian with `$\bZ \sim \N(\mu, \bSigma)$`.

The density of the multivariate density is quite involved but the density of the univariate log-normal is easy. If `$Y \sim \LN(\mu, \sigma^2)$`, the density of `$Y$` is given by

$$
f_Y(x) = \frac{1}{x\sigma\sqrt{2\pi}} \exp\left( - \frac{(\ln(x) - \mu)^2}{2\sigma^2} \right)
$$
--
Demo: Based on the cumulative distribution function

---
## Moments of the LN

Since `$\bY \sim e^{\N(\bmu, \bSigma)}$`, one could be tempted to say `$E[\bY] = e^{\bmu}$` and `$V[\bY] = e^{\bSigma}$` where the exponential is applied coefficient by coefficient.

But this is wrong due to Jensen's inequality and the convexity of `$\exp$`. Fortunately, we can use the MGF of MVN to compute `$E[Y_j]$`, `$V[Y_j]$` and `$Cov(Y_j, Y_{j'})$`:

.pull-left[ 
$$
`\begin{align}
E[Y_j] & = e^{\mu_j + \Sigma_{jj}/2} \\
V[Y_j] & = e^{2\mu_j + \Sigma_{jj}}(e^{\Sigma_{jj}} - 1) \\
       & = E[Y_j]^2 (e^{\Sigma_{jj}} - 1) \\
Cov[Y_j, Y_{j'}] & = e^{\mu_j + \mu_{j'} + \frac{1}{2}(\Sigma_{jj}+\Sigma_{j'j'})}(e^{\Sigma_{jj'}} - 1) \\
& = E[Y_j]E[Y_{j'}] (e^{\Sigma_{jj'}} - 1)
\end{align}`
$$
]

.pull-right[
Note `$\bZ = \log(\bY) \sim \N(\bmu, \bSigma)$`. Proof based on the fact that
$$
`\begin{align}
E[Y_j] & = E[e^{Z_j}] \\
E[Y_j^2] & = E[e^{2Z_j}] \\
E[Y_j Y_{j'}] & = E[e^{Z_j + Z_{j'}}] \\
\end{align}`
$$
]

In particular `$Cov[Y_j, Y_{j'}]$` has the .alert[same sign] as `$\Sigma_{jj'}$` and can thus be .alert[positive or negative].

---
class: inverse, center, middle

# Poisson Log Normal distribution

---
## Poisson Log Normal (PLN) distribution

A `$p$`-dimensional random variable `$\bY$` taking values in `$\Nbb^p$` is said to be .alert[PLN] [AH89] with parameters `$(\bmu, \bSigma)$` iff there exists a `$p$`-dimensional random variable `$\bZ$` such that

$$
`\begin{align}
\bZ & \sim \LN(\bmu, \bSigma) \\
Y_j | Z_j & \sim \P(Z_j) \; \text{independently for all } j\\
\end{align}`
$$

Or equivalently

$$
`\begin{align}
\bZ & \sim \N(0, \bSigma) \\
Y_j | Z_j & \sim \P(e^{\bmu + Z_j}) \; \text{independently for all } j
\end{align}`
$$
Since the canonical form of the Poisson distribution is `$\P(e^\theta)$`, we prefer the .alert[last formulation].

---
## Moments of the PLN

`$\bY$` is an (infinite) mixture of independent Poisson variable so we can use the laws of .alert[total expectation] and .alert[total covariance] to compute its mean and covariance.

.pull-left[

$$
`\begin{align}
E[Y_j] & = E[E[Y_j | Z_j]] \\
       & = E[e^{Z_{j}}] = e^{\mu_j + \Sigma_{jj}/2}
\end{align}`
$$

$$
`\begin{align}
V[Y_j] & = E[V[Y_j | Z_j]] + V[E[Y_j | Z_j]] \\
& = E[e^{Z_j}] + V[e^{Z_j}] \\
& = E[Y_j] + E[Y_j]^2(E) (e^{\Sigma_{jj}} - 1)
\end{align}`
$$
Note that `$V[Y_j] > E[Y_j]$` `$\Rightarrow$` .alert[overdispersion] compared to Poisson with same mean. 
]

.pull-right[
$$
`\begin{align}
Cov[Y_j, Y_{j'}] & = E[Cov[\underbrace{Y_j, Y_{j'}}_{\text{independent}}| Z_j, Z_{j'}]] \\
& \quad + Cov[E[Y_j | Z_j]E[Y_{j'} | Z_{j'}]] \\
& = Cov[e^{Z_j}, e^{Z_{j'}}] \\
& = E[Y_j]E[Y_{j'}](e^{\Sigma_{jj'}} - 1)
\end{align}`
$$
Note that `$Cov[Y_j, Y_{j'}]$` has the .alert[same sign] as `$\Sigma_{jj'}$` and can be .alert[negative or positive]. 
]

---
## Density

.pull-left[
If `$\bZ$` is .alert[observed]
$$
`\begin{align}
P(\bY, \bZ; \bmu, \bSigma) & = P(\bY | \bZ) P(\bZ; \bmu, \bSigma) \\
& = \prod_{j=1}^p \underbrace{P(Y_j|Z_j)}_{\text{Poisson}} \underbrace{P(\bZ; \bmu, \bSigma)}_{\text{MVN}} \\
\log P(\bY, \bZ; \bmu, \bSigma) & = \sum_{j=1}^p \log P(Y_j|Z_j) + \log P(\bZ; \bmu, \bSigma)) \\
& = \sum_{j=1}^p (Z_jY_j - e^{Z_j} + \log(Y_j!)) + \\
& \quad -\frac{1}{2}\log|\bSigma| - \frac{p}{2}\log(2\pi) \\
& \quad -\frac{(\bZ - \bmu)^\top \bSigma^{-1} (\bZ - \bmu)}{2}
\end{align}`
$$
]

.pull-right[
If `$\bZ$` is .alert[not observed]
$$
`\begin{align}
P(\bY; \bmu, \bSigma) & = \int_{\bZ} P(\bY | \bZ) P(\bZ; \bmu, \bSigma) d\bZ \\
& = \frac{1}{\prod_j Y_j!}\int_{\bZ} [ e^{\bY^\top \bZ - \sum_{j} e^{Z_j}} ]{}P(\bZ; \bmu, \bSigma)dZ \\
& = \frac{1}{\prod_j Y_j!} E_{\bZ}[e^{\bY^\top \bZ - \sum_{j} e^{Z_j}}]
\end{align}`
$$
We can compute `$E_{\bZ}[e^{\bY^\top \bZ}]$` using the MGF of `$\bZ$` but not `$E_{\bZ}[e^{\bY^\top \bZ} - \sum_j e^{Z_j}]$`...
$$
\log P(\bY; \bmu, \bSigma) = ?
$$
]

--
.center[
We can compute `$\ell_{comp}(\bmu, \bSigma)$` but not `$\ell_{obs}(\bmu, \bSigma)$`. 
]

---
## Accouting for covariates and offset

In many cases we have access to covariates `$\bx$` and we're interested in the matrix `$\bTheta$` of the effects of covariates on species `$j$`.

We can include them in the model by setting 
$$
\bmu = \bTheta\bx
$$

We also have access to offsets to account for differences in observation effort `$\bo$`, we can include them by setting

$$
Y_j | Z_j \sim \P(e^{o_j + Z_j})
$$

This change the complete log-likelihood to 
$$
`\begin{align}
\log P(\bY, \bZ; \bTheta, \bSigma) & = \sum_{j=1}^p Y_j (o_j + Z_j) - e^{o_j + Z_j} + \log(Y_j!)) \\  & \quad -\frac{1}{2}\log|\bSigma| - \frac{p}{2}\log(2\pi) -\frac{(\bZ - \bTheta\bx)^\top \bSigma^{-1} (\bZ - \bTheta\bx)}{2}
\end{align}`
$$

---
## Maximum Likelihood Estimation

The observed likelihood `$\ell_{obs}(\bTheta, \bSigma)$` is .alert[untractable].

Remember the .alert[lower bound] from the Mixture of Multinomial section. For any distribution `$Q(.)$` over `$\bZ$`,

$$
`\begin{align}
\ell_{obs}(\bTheta, \bSigma) & = \log\left( \int_{\bZ} P(\bY, \bZ; \bTheta, \bSigma)d\bZ\right) \\
& = \log\left( \int_{\bZ} \frac{P(\bY, \bZ; \bTheta, \bSigma)}{q(\bZ)} q(\bZ)d\bZ\right) \\
& \geq \int_{\bZ} \log\left( \frac{P(\bY, \bZ; \bTheta, \bSigma)}{q(\bZ)}\right) q(\bZ) \\
& = E_Q\left[ \log(P(\bY, \bZ; \bTheta, \bSigma)\right] - E_Q\left[ \log Q(\bZ)\right] \\
& = F(Q, (\bTheta, \bSigma)) \\
% & = E_Q\left[ \log(P(\bY | \bZ)\log(P(\bZ; \bTheta, \bSigma)\right] - E_Q\left[ \log Q(\bZ)\right] \\
% & = E_Q\left[ \log(P(\bY | \bZ)] + E_Q[\log(P(\bZ; \bTheta, \bSigma)\right] - E_Q\left[ \log Q(\bZ)\right] \\
\end{align}`
$$

We can maximize `$\ell_{obs}(\bTheta, \bSigma)$` by iteratively .alert[updating]
- E-step: `$Q = \text{arg}\max_Q F(Q, (\bTheta, \bSigma))$`
- M-step: `$(\bTheta, \bSigma) = \text{arg}\max_{(\bTheta, \bSigma)} F(Q, (\bTheta, \bSigma))$`

---
## Variational Approximation

In the mixture of MN, we solved the E-step explicitly by setting

$$
`\begin{equation}
Q(.) = P(.|\bY; (\bTheta, \bSigma)) = \frac{P(.|\bY; (\bTheta, \bSigma))}{\int_{\bZ} P(\bY, \bZ; (\bTheta, \bSigma))}
\end{equation}`
$$

but we can't compute `$\int_{\bZ} P(\bY, \bZ; (\bTheta, \bSigma))$`...

Instead of finding the .alert[absolute best] distribution `$Q$`, we .alert[restrict] the search to a space of distribution which makes `$F(Q, (\bTheta, \bSigma))$` tractable.

Assuming we have `$n$` samples `$\bY = (\bY_1, \dots, \bY_n)$`, we consider `$Q$` of the form
$$
Q(\bZ_1, \dots, \bZ_n) = Q_1(\bZ_1) \otimes \dots \otimes Q_n(\bZ_n) = \N(\bm_1, Diag(\bs^2_1)) \otimes \dots \otimes \N(\bm_n, Diag(\bs^2_n))
$$

where `$\bm_i$` and `$\bs_i$` are .alert[variational parameters] for sample `$i$` and can be interpreted as `$\bm_i \simeq E[\bY_i | \bZ_i; \bTheta, \bSigma]$` and `$\bs^2_i \simeq V[\bY_i | \bZ_i; \bTheta, \bSigma]$`

---
## Variational Objective

Note that `$Q$` depends only on `$\bM = (\bm_1, \dots, \bm_n)$` and `$\bS = (\bs^2_1, \dots, \bs^2_n)$`. 
$$
`\begin{align}
F(Q, (\bTheta, \bSigma)) & = F((\bM, \bS), (\bTheta, \bSigma)) \\
& = \sum_{i=1}^n E_Q\left[ \log(P(\bY_i, \bZ_i; \bTheta, \bSigma)\right] - E_Q\left[ \log Q(\bZ_i)\right] \\
& = \sum_{i=1}^n E_Q\left[ \log(P(\bY_i | \bZ_i)\log(P(\bZ_i; \bTheta, \bSigma) \right] - E_Q\left[ \log Q(\bZ_i)\right] \\
& = \sum_{i=1}^n E_Q\left[ \log(P(\bY_i | \bZ_i)\right] + E_Q\left[\log(P(\bZ_i; \bTheta, \bSigma) \right] - E_Q\left[ \log Q(\bZ_i)\right]
\end{align}`
$$
---
## `$E_Q\left[ \log P(\bY_i | \bZ_i)\right]$`

Remember that the coordinates of `$\bY_i|\bZ_i$` are Poisson-distributed.

$$
`\begin{align}
\log P(\bY_i | \bZ_i) & = \sum_{j=1}^p Y_{ij}(o_{ij} + Z_{ij}) - e^{Z_{ij}} + C \\
E_Q\left[ \log P(\bY_i | \bZ_i) \right] & = \sum_{j=1}^p Y_{ij} (o_{ij} + E_Q[Z_{ij}]) - E_Q[e^{Z_{ij}}] + C\\
& = \sum_{j=1}^p Y_{ij} (o_{ij} + m_{ij}) - e^{m_{ij} + s_{ij}^2/2} + C \\
& = \bY_i^\top (\bo_i + \bm_i) + 1_p^\top e^{\bm_i + \bs_i^2/2}
\end{align}`
$$

---
## `$E_Q\left[\log(P(\bZ_i; \bTheta, \bSigma) \right]$`

Remember that `$P(\bZ_i; \bTheta, \bSigma)$` is a Gaussian density

$$
`\begin{align}
\log(P(\bZ_i; \bTheta, \bSigma) & = -\frac{1}{2}\left[ \log|\bSigma| + (\bZ_i -\bTheta\bx_i)^\top \bSigma^{-1} (\bZ_i -\bTheta\bx_i) \right] + C\\
E_Q\left[ \log(P(\bZ_i; \bTheta, \bSigma) \right] & = -\frac{1}{2}\left( \log|\bSigma| + E_Q\left[(\bZ_i -\bTheta\bx_i)^\top \bSigma^{-1} (\bZ_i -\bTheta\bx_i) \right] \right) + C \\
& = -\frac{1}{2}\left( \log|\bSigma| + (\bm_i -\bTheta\bx_i)^\top \bSigma^{-1} (\bm_i -\bTheta\bx_i) + Tr\left[\bSigma^{-1} Diag(\bs_i^2)\right] \right) + C \\
\end{align}`
$$

---
## `$E_Q\left[\log(Q(\bZ_i) \right]$`

Remember that `$Q(\bZ_i)$` is a Gaussian density with parameters `$\bm_i$` and `$\bs_i^2$`

$$
`\begin{align}
\log Q(\bZ_i) & = -\frac{1}{2}\left[ \log|Diag(\bs_i^2)| + (\bZ_i -\bm_i)^\top Diag(\bs_i^2)^{-1} (\bZ_i -\bm_i) \right] + C\\
E_Q\left[ \log Q(\bZ_i) \right] & = -\frac{1}{2}\left( \log|Diag(\bs_i^2)| + E_Q\left[(\bZ_i -\bm_i)^\top Diag(\bs_i^2)^{-1} (\bZ_i -\bm_i) \right] \right) + C \\
& = -\frac{1}{2}\left( \log|Diag(\bs_i^2)| + p \right) + C \\
& = -\frac{1}{2}\log|Diag(\bs_i^2)| + C \\
\end{align}`
$$
---
class: inverse, center, middle

# Parameter estimation

---
## Variational Objective

Putting all the parts together, the variational objective is
$$
`\begin{align}
F(\bM, \bS, \bTheta, \bSigma) = & -\frac{n}{2}\log|\bSigma| - \frac{1}{2}\sum_{i=1}^n \left( (\bm_i - \bTheta\bx_i)^\top \bSigma^{-1} (\bm_i - \bTheta\bx_i) + Tr(\bSigma^{-1} Diag(\bS_i^2)) \right) \\
& + \sum_{i=1}^n \bY_i^\top (\bo_i + \bm_i) - 1_p^\top \exp(\bo_i + \bm_i + \bs_i^2/2)
\end{align}`
$$
We can thus compute the partial derivatives of `$F$` with respect to `$\bM, \bS, \bTheta, \bSigma$`.

---
## M step

In matrix form, we can write

$$
`\begin{equation}
\frac{\partial F}{\partial \bTheta} = - \bSigma^{-1}\left[ \bX^\top\bX \bTheta - \bX^\top \bM\right] \quad \text{and} \quad \frac{\partial F}{\partial \bSigma^{-1}} = \frac{n}{2}\bSigma^{-1} - \frac{1}{2}\left[  \sum_{i=1}^n (\bm_i - \bTheta\bx_i)^\top (\bm_i - \bTheta\bx_i) + \sum_{i=1}^n Diag(\bs_i^2) \right]
\end{equation}`
$$

Setting the derivatives to `$0$` leads to the following update
$$
`\begin{align}
\bTheta^{(t+1)} & = (\bX^\top\bX)^{-1}\bX^\top \bM^{(t)} \\
\bSigma^{(t+1)} & =  \frac{1}{n}\left[  \sum_{i=1}^n (\bm_i^{(t)} - \bTheta^{(t+1)}\bx_i)^\top (\bm_i^{(t)} - \bTheta^{(t+1)}\bx_i) + \sum_{i=1}^n Diag({\bs_i^2}^{(t)}) \right] \\
\end{align}`
$$

---
## E step

Noting `$\bA$` the matrix with coordinates `$A_{ij} = \exp(o_{ij} + m_{ij} + s_{ij}^2/2)$`, we have
$$
`\begin{align}
\frac{\partial F}{\partial \bm_i} & = \bSigma^{-1} (\bTheta^{(t+1)}\bx_i - \bm_i) + (\bY_i - \bA_i) \\
\frac{\partial F}{\partial \bs_i^2} & = \frac{1}{2} \left[ Diag\left({\bSigma^{(t+1)}}^{-1}\right) - \bA_i + Diag(\bs_i^2)^{-1} \right]
\end{align}`
$$

There is no explicit solution for setting those derivatives to `$0$` so we use .alert[gradient descent] to find `$\text{arg}\max_{(\bM, \bS)} F(\bM, \bS, \bTheta^{(t+1)}, \bSigma^{(t+1)})$`

---
class: middle, center

---
## Take Home Message

- ### The PLN model is .alert[flexible] for modelling (no sign constraint)
- ### Parameter estimation is tricky and relies a lot of computational tricks and .alert[variational estimation] (see [CMR21] for details)
- ### Variational estimation is a .alert[powerful tool] to estimate parameter in .alert[untractable models]

---
Aitchison, J. and C. Ho (1989). "The multivariate Poisson-log normal
distribution". In: _Biometrika_ 76.4, pp. 643-653.

Chiquet, J., M. Mariadassou, and S. Robin (2021). "The
Poisson-lognormal model as a versatile framework for the joint analysis
of species abundances". In: _Frontiers in Ecology and Evolution_ 9, p.
188. DOI:
[10.3389/fevo.2021.588292](https://doi.org/10.3389%2Ffevo.2021.588292).

Jakuschkin, B., V. Fievet, L. Schwaller, et al. (2016). "Deciphering
the pathobiome: intra-and interkingdom interactions involving the
pathogen Erysiphe alphitoides". In: _Microbial ecology_ 72.4, pp.
870-880.