Tests

class: center, middle, inverse, title-slide

.title[
# Tests
]
.author[
### Mahendra Mariadassou, INRAE <br> .small[from original slides by Tristan Mary-Huard]
]
.date[
### Shandong University, Weihai (CN)<br>Summer School 2023
]

---

---
class: middle, inverse, center

# Warm-up

## Introducing the Fisher distribution

---

## Prerequesites: Fisher distribution

Let `$Z_1$` and `$Z_2$` be two positive random variables such that

- `$Z_1\sim\chi^2(n_1)$`,  
- `$Z_2\sim\chi^2(n_2)$`,  
- `$Z_1\perp Z_2$`.

Define
`$$F = \frac{Z_1/n_1}{Z_2/n_2}$$`
--

Then `$F$` is said to follow a Fisher distribution with degrees of freedom `$n_1$` and `$n_2$`.

One notes
`$$F \sim \F(n_1,n_2)$$`
---
  
## Example 1: Infection

Back to the phage infection example. Assume now that if at least 90% of the bacteria colony is infected then the biologist applies a treatment to fight the infection. What decision rule should be applied ?

```
2 2 2 2 3 3 2 4 1 1 
5 1 3 1 1 5 1 1 4 2 
```
  
.blue[Objective]
Perform a test to decide whether the proportion of infected bacteria is higher than `$90\%$` or not.  
  
---
  
## Modeling
  
Denote `$X_i$` the number of phages obtained for the `$i^{th}$` bacterium.

- One assumes that bacteria are independent:  
`$X_1 \perp X_2 \perp \dots \perp X_{n},\quad\text{with } n= 20$` 
- Measurements are discrete
`$X_1,..., X_{n}  \sim \mathcal{P}(\bullet), \text{ i.i.d}$`
- The infection level is unknown
`$X_1,..., X_{n} \sim \mathcal{P}(\lambda), \text{ i.i.d}$`

---

## Hypotheses

Hypotheses *must* be formulated such that they *concern a parameter* of the model.  
Here the model is

`$$X_1,..., X_{n} \sim \mathcal{P}(\lambda), \text{ i.i.d}$$`

Consequently the hypotheses should concern `$\lambda$`.

`$$\begin{eqnarray*}
\hspace{-0.8cm}\text{Proportion of infected bacteria } \geq \text{90\%} &\Rightarrow& P(X>0)\geq 0.9 \\
&\Rightarrow& P(X=0)\leq 0.1 \\
&\Rightarrow& e^{-\lambda}\leq 0.1 \\
&\Rightarrow& \lambda\geq -\ln(0.1)  \\
&\Rightarrow& \lambda\geq \lambda_0 (=2.3026) \\
\end{eqnarray*}$$`

Hence
`$$H_0: \{\lambda\leq \lambda_0\} \quad \text{vs} \quad H_1: \{\lambda\geq \lambda_0\}$$`

---
  
## Estimation
  
Starting point: derive the ML estimator for the quantity of interest.

`$$\begin{eqnarray*}
  Lik_\lambda(x_1,...,x_n) &=& \prod_{i=1}^n f_\lambda(x_i) \quad\text{(i.i.d. assumption)}\\
  &=& \prod_{i=1}^n \frac{\lambda^{x_i}}{x_i!}e^{-\lambda} \\
  \Rightarrow LLik_\lambda(x_1,...,x_n) &=& \log(\lambda)\sum_{i=1}^{n}x_i - \sum_{i=1}^{n}\log(x_i!) - n\lambda 
\end{eqnarray*}$$`

.blue[Derivation]
`$$\frac{\partial LLik_\lambda(x_1,...,x_n)}{\partial \lambda}=  \frac{1}{\lambda}\sum_{i=1}^n x_i -n$$`
Setting the derivative at 0, one gets: `$\widehat{\lambda} = \frac{1}{n}\sum_{i=1}^n x_i=\bar{x}$`. 
  
  
---

## Decision rule

Should be of the form
`$$\text{If } \hat{\lambda}\geq s \text{ then reject } H_0$$`
One needs to choose threshold `$s$`.

.blue[Type I error control]  
Find `$s$` such that `$P_{H_0}(\overline{X}\geq s)\leq \alpha$`  
`$\Rightarrow$` Requires the (possibly approximate) distribution of `$\overline{X}$`.

One has:
`$$\begin{eqnarray*}
n\overline{X} \sim \P(n\lambda) \quad \text{ and/or } \quad  \sqrt{n}\frac{\overline{X}-\lambda}{\lambda} \overset{approx}{\sim} \N(0,1)
\end{eqnarray*}$$`

.blue[Under `$H_0$`]:  
One has `$\lambda \in [0,\lambda_0]$`, still the `$H_0$` distribution of `$\overline{X}$` is not fully known...

---

## Worst case scenario

Which value of `$\lambda \in [0,\lambda_0]$` leads to the worst (i.e. the max) value of `$P_{H_0}(\overline{X}\geq s)$` ?

![](06_Tests_files/figure-html/unnamed-chunk-3-1.png)

Worst value: `$\lambda =\lambda_0$` !

---

## Back to the decision rule

Considering the worst case scenario,
one looks for `$s$` such that
`$$P_{\lambda_0}\left( \overline{X}\geq s \right)\leq \alpha \Rightarrow P_{\lambda_0} \left( n\overline{X}\geq ns \right)\leq \alpha$$`
Choose `$ns = q_{n\lambda_0,1-\alpha} \Rightarrow s = \frac{q_{n\lambda_0,1-\alpha}}{n}$`,

.blue[Application]
- `$\widehat{\lambda}=$`   2.3
- `$n=$` 20
- `$\alpha= 0.05$`
- `$q_{n\lambda_0,1-\alpha}=$` 57 `$\Rightarrow s=$` 2.85
--

.blue[Conclusion ?]

One does not reject the hypothesis that the proportion of infected bacteria is lower than 90\%

---

## Power of a test procedure

Recall that the test procedure is design such that:

- T1E is controlled at a given level,
- T2E is minimum given T1E is controlled.

T2E is minimum `$\Rightarrow$` `$P_{H_1}( \text{accept } H_0)$` is minimum `$\Rightarrow$` `$P_{H_1}(\text{reject} H_0)$` is maximum.

`$P_{H_1}(\text{reject } H_0)$` is the ability to reject `$H_0$` when it false.
This key quantity is called the __power__ of the test procedure.