Tests

class: center, middle, inverse, title-slide

.title[
# Tests
]
.author[
### Mahendra Mariadassou, INRAE <br> .small[from original slides by Tristan Mary-Huard]
]
.date[
### Shandong University, Weihai (CN)<br>Summer School 2024
]

---

---

## Main Idea

Some research directions can be formulated as .blue[questions] with a yes / no answer:

- Is the expression of gene HER2 large in breast cancer ?
- Is a new variety of tomato more resistant to mildew than the previous one ?
- Is food cheaper in China than in France ?

You collect .alert[noisy data] about the problem and want to give an *evidence based* .blue[answer] to the question

---

## When should you use a test ?

* You have a .alert[yes/no] question

* You have .alert[data] related to the question

* The data can be .alert[modeled] as the result of some .alert[random variable]

* The question can be framed in terms of .alert[parameters] of the distribution

.pull-left[
You can either:
- **reject** the null hypothesis: observed patterns are unexpected under your assumptions
- **retain** the null hypothesis: observed patterns are expected under your assumptions
  - .red[**Does not imply that your assumptions are true**]
]

.pull-right[
<img src="imgs/one-does-not-simply-accept-the-null-hypothesis.jpg" width="300px" style="display: block; margin: auto;" />
]

---
## Four elements of a test

.pull-left-70[
- **Data:** 
    * `$y_1, ..., y_n$` realisation of r.v. `$Y_1, ..., Y_n$`
- A statistical **model**:

* distribution of  `$Y_1,..., Y_n$` depending on some parameters `$\theta$`

- An **assumption**:

* A statement about `$\theta$`. 
    * This is the so called `$H_0$` hypothesis that you usually want to reject, `$H_1$` being the alternative

- A **decision rule**

* If `$T=f(X_1,..., X_n)$` is a test statistic
  	* `$R$`  is subset of values for `$T$` that is unlikely if `$H_0$` is true
]

---
## An example with babies

- **Data**: biological sex of children born in 2020 in France (from [INED and INSEE](https://www.ined.fr/fr/tout-savoir-population/chiffres/france/naissance-fecondite/naissances-sexe/))
  - 356 389 👦 / 340 275 👧

- **Model**: 
  - The sex of a newborn is `$\sim \mathcal{B}(p)$`, a Bernoulli of parameter `$p$` (where 1 stands for girls and 0 for boys)
  - Sex of all children are i.i.d. (.alert[modeling assumption])
  
- **Null Hypothesis**
  - There is an equal chance of being born a boy or a girl `$\Rightarrow p = 1/2$`
  
- **Decision rule**
  - Test statistic: .alert[frequency] `$\hat{p}$` of girls among children born in 2020
  - Reject `$H_0$` if `$\hat{p}$` is **too far** from `$1/2$` 
  
--

.center[.alert[The hard work consists in quantifying what is *too far*]]

---
## Decision errors

The decision is based on .alert[noisy data] and so is .alert[error-prone]

.footnote[Image from [whatilearned](https://whatilearned.fandom.com/wiki/Hypothesis_Testing)]

---
### Type I/ Type II error for pregnancy tests

---

### A few remarks

- Strong .alert[asymetry] between `$H_0$` and `$H_1$`

- You want to accumulate sufficient evidence .alert[against] `$H_0$`.

- Analogy with trials: `$H_0$` is on trial, do you have enough proof to convict it?

.center[
**Necessary trade-off** between type I and type II errors:
]
--

.pull-left-70[
- .alert[more stringent] criteria : 
  - less wrongful convictions (type I errors) 
  - more criminals released (type II errors, lack of power)
<br>
<br>
<br>
]

.pull-right-30[
<img src="imgs/im-on-steroids-really-prove-it.jpg" width="150px" />
]

.pull-left-70[
- .alert[less stringent] criteria: 
  - less criminals released (type II errors)
  - more wrongful convictions (type I error) 
]

.pull-right-30[
<img src="imgs/sakazuki_guilty.jpeg" width="150px" />
]

---

## Construction of a test

.pull-left-70[
- Start with an .alert[estimator] for the parameter of intercept:
    - Frequencies `$F$` of girls among newborn
]

.pull-left-70[    
- Derive the .alert[distribution] of `$F$` under `$H_0$`
    - For `$n = 356 389 + 340 275$`, using the gaussian approximation, we have 
    `$$F \sim \mathcal{N}(1/2, 1/4n) = \mathcal{N}(0.5, 3.6 \times 10^{-7})$$`
]

.pull-right-30[
<img src="06_Tests_files/figure-html/unnamed-chunk-7-1.png" width="700px" />
]

.pull-left-70[
- Calibrate the .alert[region of unlikely values] under the null
    - With probability 0.95, `$|F - 1/2| \leq 1.96 \times \sqrt{3.6\times 10^{-7}} = 0.0012$`
    - `$F \notin [0.4988, 0.5012]$` with probability at most 5%
]

.pull-left-70[
- Compare with the .alert[actual estimation]
    - Here `$f = 340 275 / (356 389 + 340 275) = 0.488$`
    - Unlikely if `$p= 1/2$` `$\Rightarrow$` reject `$H_0$`
    
]

---

## About the p-value

**Idea** Go one step further than **reject / retain** and assess the .alert[significance] of the observed statistic

.pull-left-50[
- `$|F - 1/2|$` .alert[unlikely] to be higher than `$0.0012$`

- `$|f - 1/2|$` .alert[actually] observed to be `$0.012$`
]

.pull-right-50[
<img src="06_Tests_files/figure-html/unnamed-chunk-8-1.png" width="700px" />
]

.pull-left-50[
- How likely are we to observe such deviations if `$p=1/2$`? 
]

.pull-left-50[
$$
`\begin{align}
p &= P_{H_0}(|F - 1/2| > |f - 1/2|) \\
& = P_{H_0}(|F - 1/2| > 0.012) \\
& = P_{H_0}(F < 0.488) + P_{H_0}(F > 0.512) \\
& \simeq 5.5 \times 10^{-89}
\end{align}`
$$
- Not only unlikely but .alert[insanely] unlikely !!!
]

---
## 谢谢山东大学威海校区 🌊⛱️☀️🥟🍜