class: center, middle, inverse, title-slide .title[ # Sampling ] .author[ ### Mahendra Mariadassou, INRAE
.small[from original slides by Tristan Mary-Huard] ] .date[ ### Shandong University, Weihai (CN)
Summer School 2022 ] --- --- class: middle, center, inverse # Technical results ### Mean and variance of a sum --- ## Refresher Recall the following definitions/properties: `$$\begin{align} E[Y] &= \sum_{k=1}^K y_k P(Y = y_k) \\ V[Y] &= E\left[ \left(Y-E[Y]\right)^2\right]\\ &= E[Y^2]-E[Y]^2\\ Cov(X,Y) &= E\left[ (X-E[X])(Y-E[Y]) \right]\\ &= E[XY]-E[X]E[Y]\\ \end{align}$$` -- What about: - `\(f(Y)\)` for a general numeric function `\(f: \mathbb{R} \to \mathbb{R}\)` - `\(aY + b\)` (when `\(a, b \in \mathbb{R}\)`) - `\(f(X, Y)\)` for a general numeric function `\(f: \mathbb{R}^2 \to \mathbb{R}\)` - `\(X + Y\)`, `\(XY\)` --- ## Expectation of `\(f(Y)\)` Falling back to the definition, we have $$ E[f(Y)] = \sum_{k=1}^K f(y_k) P(Y = y_k) $$ In particular, the expectation of `\(aY + b\)` can be expressed simply as -- $$ E[aY + b] = aE[Y] + b $$ .blue[Proof:] -- --- ## Expectation of `\(f(Y_1, Y_2)\)` Falling back to the definition, we have `$$\begin{align} E[f(Y_1, Y_2)] & = \sum_{k_1=1}^{K_1} \sum_{k_2=1}^{K_2} f(y_{k_1}, y_{k_2}) P(Y_1 = y_{k_1}, Y_2 = y_{k_2}) \end{align}$$` In particular, the expectation of `\(X + Y\)` can be expressed simply: -- $$ E[Y_1 + Y_2] = E[Y_1] + E[Y_2] $$ -- Proof: --- ## Another formula for the variance Using the previous result, we have an alternative form for the variance: $$ V[X] = E\left[ (X - E[X])^2 \right] = E[X^2] - E[X]^2 $$ -- Proof: `$$\begin{align} V[X] & = E\left[ (X - E[X])^2 \right] \\ & = E\left[ X^2 - 2XE[X] + E[X]^2 \right] \\ & = E[X^2] - 2E\left[XE[X]\right] + E[X]^2 \\ & = E[X^2] - 2E[X]E\left[X\right] + E[X]^2 \\ & = E[X^2] - 2E[X]^2+ E[X]^2 \\ & = E[X^2] - E[X]^2 \end{align}$$` --- ### Expectation of `\(Y_1 Y_2\)` - The .blue[expectation of a sum] of random variables is the .blue[sum of the expectations] - The .blue[expectation of a product] is -- - no simple formula in general (but important exceptions) - it depends on the covariance between `\(Y_1\)` and `\(Y_2\)` -- <img src="" style="display: block; margin: auto;" /> In each case, `\(E[Y_1] = 0\)` and `\(E[Y_2] = 0\)` but we have .question[quizz1] .pull-left[ - .blue[A:] `\(E[Y_1 Y_2] = \dots\)` - .blue[B:] `\(E[Y_1 Y_2] = \dots\)` - .blue[C:] `\(E[Y_1 Y_2] = \dots\)` ] -- .pull-right[ - `\(E[Y_1 Y_2] = -0.48\)` - `\(E[Y_1 Y_2] = -0.01\)` - `\(E[Y_1 Y_2] = 0.47\)` ] --- ## Special case of independent variables If `\(Y_1\)` and `\(Y_2\)` are .alert[independant], noted `\(Y_1 \perp Y_2\)`, the result is simpler... -- `$$\begin{align} E[Y_1 Y_2] &= \sum_{k_1=1}^{K_1} \sum_{k_2=1}^{K_2} y_{k_1} \times y_{k_2} \underbrace{P(Y_1 = y_{k_1}, Y_2 = y_{k_2})}_{= P(Y_1 = y_{k_1}) \times P(Y_2 = y_{k_2})} \\ & = \sum_{k_1=1}^{K_1} \sum_{k_2=1}^{K_2} y_{k_1} \times y_{k_2} P(Y_1 = y_{k_1}) \times P(Y_2 = y_{k_2}) \\ & = \sum_{k_1=1}^{K_1} \sum_{k_2=1}^{K_2} y_{k_1}P(Y_1 = y_{k_1}) \times y_{k_2}P(Y_2 = y_{k_2}) \\ & = \left( \sum_{k_1=1}^{K_1} y_{k_1}P(Y_1 = y_{k_1}) \right) \times \left( \sum_{k_2=1}^{K_2} y_{k_2}P(Y_2 = y_{k_2}) \right) \\ & = E[Y_1] \times E[Y_2] \end{align}$$` -- - In particular `\(Cov(Y_1, Y_2) = E[Y_1 Y_2] - E[Y_1]E[Y_2] = 0\)` - More generally, `\(E[g_1(Y_1) \times g_2(Y_2)] = E[g_1(Y_1)] \times E[g_2(Y_2)]\)` --- ## Variance of a sum Using the previous result, we have $$ V(Y_1 + Y_2) = \dots $$ -- `$$\begin{align} V(Y_1 + Y_2) & = E\left[ (Y_1 + Y_2)^2 \right] - E\left[ Y_1 + Y_2 \right]^2 \\ & = E[Y_1^2] + 2E[Y_1 Y_2] + E[Y_2^2] - \left( E[Y_1]^2 + 2E[Y_1]E[Y_2] + E[Y_2]^2 \right) \\ & = \underbrace{E[Y_1^2] - E[Y_1]^2}_{=V(Y_1)} + 2 \underbrace{E[Y_1 Y_2] - E[Y_1]E[Y_2]}_{=Cov(Y_1, Y_2)} + \underbrace{E[Y_2^2] - E[Y_2]^2}_{=V(Y_2)} \\ & = V(Y_1) + 2Cov(Y_1, Y_2) + V(Y_2) \end{align}$$` -- In particular, if `\(Y_1 \perp Y_2\)`, then `\(V(Y_1 + Y_2) = V(Y_1) + V(Y_2)\)`. --- ## Sum of independent random variables If `\(Y_1, \dots, Y_n\)` are .blue[i.i.d.], the .blue[sum] `\(S_n = Y_1 + \dots + Y_n\)` has very simple mean and variance: -- .blue[Mean] $$ E[S_n] = E[Y_1 + \dots + Y_n] = E[Y_1] + \dots + E[Y_n] = nE[Y_1] $$ -- .remark[Remark:] The first equality is .alert[always] true and the second requires identical means. -- .blue[Variance] $$ V(S_n) = V[Y_1 + \dots + Y_n] = V[Y_1] + \dots + V[Y_n] = nV[Y_1] $$ -- .remark[Remark:] The equality requires .blue[independence] and the second requires identical variances. --- ## Application to the Binomial distribution Remember that a binomial variable `\(Y \sim \mathcal{B}(K, p)\)` is the sum of `\(K\)` independent Bernoulli trials: $$ X = Y_1 + \dots + Y_K \quad \text{where the } Y_i \text{ are i.i.d. with } Y_i \sim \mathcal{B}(p) $$ Using the previous results, it results that `$$\begin{align} E[X] & = K E[Y_1] = Kp \\ V[X] & = K V[Y_1] = Kp(1 -p) \end{align}$$` --- ## Application to the Poisson distribution We mentioned before that, if `\(X \sim \P(\lambda)\)`, then `\(V[X] = \lambda\)`. Remember that `\(E[X] = \lambda\)`. `$$\begin{align} V(X) & = E[X^2] - E[X]^2 \\ &= E[X(X-1) + E[X] - E[X]^2 \\ & = E[X(X-1)] + E[X] - E[X]^2 \\ & = \lambda^2 + \lambda - \lambda^2 \\ & = \lambda \end{align}$$` --- class: middle, inverse, center # A first try at estimation ## Finite population --- ### The Circle dataset .pull-left[ - Population of `\(n = 100\)` circles - Indexed by `\(i = 1, \dots, n\)` - Each characterized by its diameter `\(D_i\)` ] .pull-right[ <img src="" style="display: block; margin: auto;" /> ] --- ## List of diameters
.blue[Goal]: Provide an **estimate** of the mean circle diameter `\(E[D]=\mu_D\)`. --- ## Obtaining an estimate Requires three steps: .pull-left[ - .blue[Step 1] Collect some data, i.e. a **sample** - .blue[Step 2] Apply some **estimator** to the sample - .blue[Step 3] Get the estimate: ] -- .pull-right[ - `\(D_1,...,D_n\)` collected *at random* - `\(\overline{D}= \frac{1}{n} \sum_{i=1}^{n}D_i\)` - `\(\widehat{\mu}_D= \frac{1}{n} \sum_{i=1}^{n}d_i\)` ] -- .remark[The estimator is a random variable, the estimate is a numeric value] --- ## Remarks and questions .blue[One remark...] - .alert[ALWAYS] distinguish between the true expectation `\(\mu_D\)` and its estimate `\(\widehat{\mu}_D\)`. .blue[... and three questions] - Any (implicit) **assumption** about the way data were **collected** ? - We defined an intuitive estimator: is it a good one ? How can we **assess** its **performance** ? - If intuition fails to provide an estimator (or a good one), can we think of a **systematic strategy** to obtain a *good* estimator ? --- ## Several ways to "sample at random" In the Circle example, one can - Select the circles "by eye", - Use the droppin' pen technics, - Select identifiers at random, - Any other meaningful strategy .def[Definition] A **S**imple **R**andom **S**ampling is such that all samples of size `\(n\)` have the same probability `\({n \choose N}^{-1}\)` to be drawn. -- .def[Property]: In a SRS, each individual has the same probability `\(n/N\)` to be selected. --- ## Applying SRS to the Circle dataset Here are the estimates obtained from 32 SRS trials, with `\(n=10\)`: `$$\displaystyle{\frac{1}{10}\sum_{i=1}^{10}d_i}=$$` ``` Estimate 01: 5.3076 Estimate 02: 5.8976 Estimate 03: 3.8780 Estimate 04: 4.1243 Estimate 05: 5.0101 Estimate 06: 4.5968 Estimate 07: 5.5176 Estimate 08: 4.2012 Estimate 09: 6.0778 Estimate 10: 6.3875 Estimate 11: 6.4796 Estimate 12: 6.4881 Estimate 13: 4.3832 Estimate 14: 5.9892 Estimate 15: 5.3621 Estimate 16: 5.3960 Estimate 17: 5.3632 Estimate 18: 4.2552 Estimate 19: 6.8789 Estimate 20: 4.7823 Estimate 21: 5.6886 Estimate 22: 5.2215 Estimate 23: 6.9647 Estimate 24: 6.7541 Estimate 25: 5.4322 Estimate 26: 4.9458 Estimate 27: 4.6955 Estimate 28: 5.4238 Estimate 29: 6.0931 Estimate 30: 3.8522 Estimate 31: 4.9817 Estimate 32: 5.2376 ``` -- .blue[*True mean:* 5.427] -- .remark[Remark:] One cannot evaluate the performance of an estimator through an estimate since the estimate depends both on the estimator **and** the sample. .center[ Performance has to be evaluated **on average** over **all** possible samples ] --- ## Performance evaluation Estimator `\(\overline{D}\)` is a random variable (which depends on the sample). One can define: - the bias `\(B(\overline{D}) = E[\overline{D}]-\mu_D\)` - the variance `\(V(\overline{D})\)`. .blue[Mean Square Error] One defines `$$MSE(\overline{D}) = E[(\overline{D}-\mu_D)^2].$$` .remark[Remark] How is `\(MSE(\overline{D})\)` related to bias and variance ? -- `$$MSE(\overline{D}) = B^2(\overline{D}) + V(\overline{D})$$` -- .blue[Why is all of this useful at all ?] .question[quizz2] --- class: middle, inverse, center # SRS in finite population --- ## General framework Consider a population `\(\mathcal{P}\)` of size `\(N\)` and note `\(y_i\)` the value of variable `\(Y\)` measured on individual `\(i\)`. Define `$$\begin{eqnarray*} E[Y] &=& \frac{1}{N}\sum_{i=1}^{N}y_i =\mu\\ \text{and } V[Y] &=& \frac{1}{N}\sum_{i=1}^{N}(y_i-\mu)^2 = \sigma^2 \end{eqnarray*}$$` -- Assume `\(\mu\)` is estimated by applying the empirical mean to sample `\(S\)`: `$$\begin{equation} \overline{Y} = \frac{1}{n}\sum_{i\in S}Y_i = \frac{1}{n}\sum_{i=1}^{N}\varepsilon_i y_i \quad \text{where} \quad \varepsilon_i = \begin{cases} 1 & \text{if } i\in S \\ 0 & \text{otherwise} \end{cases} \end{equation}$$` -- .remark[Remark] `\(n\)` (sample) is generally much smaller than `\(N\)` (population) !! --- ## Theoretical properties of `\(\overline{Y}\)` Consider a SRS, i.e. `\(P(S)={n \choose N}^{-1}\)`, then `$$\begin{eqnarray*} E[\overline{Y}] &=& \mu \\ V[\overline{Y}] &=& \frac{1}{n}\left(1-\frac{n}{N}\right)\frac{N}{N-1}\sigma^2 =\frac{1}{n}\left(1-\frac{n}{N}\right)(\sigma^*)^2 \end{eqnarray*}$$` where `\((\sigma^*)^2= \frac{N}{N-1}\sigma^2\)` is the corrected population variance (more on that later) .question[quizz3] .blue[Proof:] Next slide -- Application to the circle dataset: `\(N=100\)`, `\(n=10\)`, `\(\sigma^2\)`=7.779 `\(\Rightarrow V[\overline{Y}] =\frac{1}{10}\times\frac{100}{99}\times\left(1-\frac{10}{100}\right)\times\)` 7.779 = 0.707 --- ## Proof --- ## Stratified sampling Assume that population `\(\mathcal{P}\)` has `\(H\)` strata `\(\mathcal{P}_1,...,\mathcal{P}_H\)`. .pull-left[ Define - `\(N_h\)`: size of stratum `\(\mathcal{P}_h\)`, `\(h=1,...,H\)`, - `\(N=\sum_h N_h\)`: population size, - `\(\mu_h=\frac{1}{N_h}\sum_{i\in\mathcal{P}_h}y_i\)`: mean of stratum `\(h\)`, - `\(\sigma_h^*=\frac{1}{N_h-1}\sum_{i\in\mathcal{P}_h}(y_i-\mu_h)^2\)`: corrected variance of stratum `\(h\)`. ] -- .pull-right[ .blue[Stratified sampling] Perform .alert[one SRS per stratum] where - `\(n_h\)`: number of sampled individuals per stratum, - `\(n=\sum_h n_h\)`: total sample size, - `\(\overline{Y}_h=\frac{1}{n_h}\sum_{i\in\mathcal{P}_h}Y_i\)`: empirical mean of stratum `\(h\)`. ] -- One then estimates the population mean `\(\mu\)` using `$$\overline{Y}_{St} = \sum_{h=1}^{H}\frac{N_h}{N}\overline{Y}_h$$` --- ## Properties of stratified sampling Consider a stratified sampling strategy. Then `$$\begin{eqnarray*} E[\overline{Y}_{St}] &=& \mu \\ V[\overline{Y}_{St}] &=& \sum_{h=1}^H \left(\frac{N_h}{N}\right)^2\times \frac{1}{n_h}\left(1-\frac{n_h}{N_h}\right) \times (\sigma_h^*)^2 \\ V[\overline{Y}_{St}] &\leq& V[\overline{Y}] \end{eqnarray*}$$` .blue[Proof]: based on the properties of SRS (equalities) and on the decomposition of the population variance + assuming `\(\frac{N_h-1}{N_h}\approx1\)` (inequality). --- ## Proof --- class: middle, inverse, center # IID sampling in infinite populations --- ## Sampling in an infinite population Consider a population `\(\mathcal{P}\)` where variable `\(Y\)` has distribution `\(\mathcal{L}\)`. Define $$E[Y] = \mu, \quad V[Y] =\sigma^2 $$ Assume `\(\mu\)` is estimated by applying the empirical mean to sample `\(Y_1,...,Y_n\)`, where `$$Y_i \sim \mathcal{L}, \text{ i.i.d.}$$` Then $$E\left[\overline{Y}\right] = \mu,\quad V\left[\overline{Y}\right] = \frac{\sigma^2}{n} $$ .question[quizz4] .blue[Proof:] Your turn! --- class: middle, center, inverse # Exercises --- ## Exercise 1: Bulb lifetime A bulb manufacturer claims that 90% of the produced bulbs have a lifetime of 500 hours or higher. An investigator bought 12 bulbs and obtained the following lifetimes: ``` 536 490 510 572 564 537 549 535 549 528 569 533 ``` Assuming bulb lifetimes have a Poisson distribution: - What is the average lifetime of the produced bulbs? - Is the manufacturer honest ? -- .blue[Answers] `\(\widehat{E[L]} =\widehat{\lambda}=\)` 539.333, `\(\quad \widehat{P}_{\widehat{\lambda}}(L\geq 500)=\)` 0.954 --- ## Exercise 2: Exponential distribution A a positive real valued random variable `\(X\)` has a exponential distribution `\(\mathcal{E}(\lambda)\)` with parameter `\(\lambda\)`, noted `\(X\sim\mathcal{E}(\lambda)\)`, if $$f_X(x) = \lambda e^{-\lambda x}, \ x\geq0, \ \lambda\geq0. $$ - Check that `\(\int_0^\infty f(t)dt = 1\)` (i.e. `\(f\)` is a density function). - Compute `\(E[X]\)`, where `\(X\sim\mathcal{E}(\lambda)\)`. - Assuming `\(X_1,...X_n\sim\mathcal{E}(\lambda)\)` i.i.d., suggest an estimator for `\(\lambda\)`. --- ## Summary .blue[3 quantities to distinguish] - `\(\mu=E[X]\)`: .alert[true] population mean. - `\(\overline{Y}=\frac{1}{n}\sum_{i=1}^nY_i\)`: estimator. - `\(\widehat{\mu}=\frac{1}{n}\sum_{i=1}^ny_i\)`: estimate. Remember that estimator `\(\neq\)` estimate -- .blue[Sampling] - In the following, we will always assume either SRS (finite pop.) or i.i.d. sampling (infinite pop.). -- .blue[Performance] - Different estimators can be compared through their MSE. - Keep in mind that `\(MSE(T) = B^2(T) + V(T)\)` -- .blue[What's next ?] - What if intuition fails to provide an estimator (or a good one)? - `\(\Rightarrow\)` Need for a *systematic strategy* to obtain a .alert[satisfactory] estimator!