Data Analysis Statistics An Introduction to Statistical Inference and Data Analysis

[ Pobierz całość w formacie PDF ]

was drawn from a normal distribution, then she needs to be able to rec-
ognize normally distributed data. For this reason, the samples studied in
this chapter were generated under carefully controlled conditions, by com-
puter simulation. This allows us to investigate how samples drawn from
specified distributions should behave, thereby providing a standard against
which to compare experimental data for which the true distribution can nev-
er be known. Fortunately, S-Plus provides several convenient functions for
simulating random sampling.
Example 2 Consider the experiment of tossing a fair die n = 20 times.
We can simulate this experiment as follows:
> SampleSpace
> sample(x=SampleSpace,size=20,replace=T)
[1] 1 6 3 2 2 3 5 3 6 4 3 2 5 3 2 2 3 2 4 2
Example 3 Consider the experiment of drawing a sample of size n = 5
from Normal(2, 3). We can simulate this experiment as follows:
> rnorm(5,mean=2,sd=sqrt(3))
[1] 1.3274812 0.5901923 2.5881013 1.2222812 3.4748139
7.1 The Plug-In Principle
We will employ a general methodology for relating samples to populations.
In Chapters 2 6 we developed a formidable apparatus for studying popu-
lations (probability distributions). We would like to exploit this apparatus
fully. Given a sample, we will pretend that the sample is a finite population
(discrete probability distribution) and then we will use methods for studying
finite populations to learn about the sample. This approach is sometimes
called the Plug-In Principle.
7.1. THE PLUG-IN PRINCIPLE 131
The Plug-In Principle employs a fundamental construction:
Definition 7.1 Let x = (x1, . . . , xn) be a sample. The empirical proba-
�
bility distribution associated with x, denoted Pn, is the discrete probability
distribution defined by assigning probability 1/n to each {xi}.
Notice that, if a sample contains several copies of the same numerical value,
then each copy is assigned probability 1/n. This is illustrated in the following
example.
Example 2 (continued) A fair die is rolled n = 20 times, resulting
in the sample
x = {1, 6, 3, 2, 2, 3, 5, 3, 6, 4, 3, 2, 5, 3, 2, 2, 3, 2, 4, 2}. (7.1)
�
The empirical distribution P20 is the discrete distribution that assigns the
following probabilities:
�
xi #{xi} P20({xi})
1 1 0.05
2 7 0.35
3 6 0.30
4 2 0.10
5 2 0.10
6 2 0.10
Notice that, although the true probabilities are P ({xi}) = 1/6, the empirical
�
probabilities range from .05 to .35. The fact that P20 differs from P is
an example of sampling variation. Statistical inference is concerned with
determining what the empirical distribution (the sample) tells us about the
true distribution (the population).
�
The empirical distribution, Pn, is an appealing way to approximate the
actual probability distribution, P , from which the sample was drawn. Notice
that the empirical probability of any event A is just
1
�
Pn(A) = # {xi " A} � ,
n
the observed frequency with which A occurs in the sample. By the Law of
Averages, this quantity tends to the true probability of A as the size of the
132 CHAPTER 7. DATA
sample increases. Thus, the theory of probability provides a mathematical
�
justification for approximating P with Pn when P is unknown.
Because the empirical distribution is an authentic probability distribu-
tion, all of the methods that we developed for studying (discrete) distribu-
tions are available for studying samples. For example,
�
Definition 7.2 The empirical cdf, usually denoted Fn, is the cdf associated
�
with Pn, i.e.
# {xi d" y}
� �
Fn(y) = Pn(X d" y) = .
n
The empirical cdf of sample (7.1) is graphed in Figure 7.1.
-2 -1 0 1 2 3 4 5 6 7 8 9
y
Figure 7.1: An Empirical CDF
7.2 Plug-In Estimates of Mean and Variance
Population quantities defined by expected values are easily estimated by the
plug-in principle. For example, suppose that X1, . . . , Xn
F(y)
0.0
0.2
0.4
0.6
0.8
1.0
7.2. PLUG-IN ESTIMATES OF MEAN AND VARIANCE 133
observe a sample x = {x1, . . . , xn}. Let � = EXi denote the population
mean. Then
Definition 7.3 The plug-in estimate of �, denoted �n, is the mean of the
�
empirical distibution:
n n
1 1
�n = xi � = xi = xn.
� �
n n
i=1 i=1
This quantity is called the sample mean. [ Pobierz całość w formacie PDF ]

Archiwum