Introduction
Population vs Sample
The population includes all objects of interest whereas the sample is only a portion of the
population. Parameters are associated with populations and statistics with samples. Parameters
are usually denoted using Greek letters (mu, sigma) while statistics are usually denoted using
Roman letters (x, s).
There are several reasons why we don't work with populations. They are usually large, and it is
often impossible to get data for every object we're studying. Sampling does not usually occur
without cost, and the more items surveyed, the larger the cost.
We compute statistics, and use them to estimate parameters. The computation is the first part of
the statistics course (Descriptive Statistics) and the estimation is the second part (Inferential
Statistics)
Discrete vs Continuous
Discrete variables are usually obtained by counting. There are a finite or countable number of
choices available with discrete data. You can't have 2.63 people in the room.
Continuous variables are usually obtained by measuring. Length, weight, and time are all
examples of continuous variables. Since continuous variables are real numbers, we usually
round them. This implies a boundary depending on the number of decimal places. For example:
64 is really anything 63.5 <= x < 64.5. Likewise, if there are two decimal places, then 64.03 is
really anything 63.025 <= x < 63.035. Boundaries always have one more decimal place than the
data and end in a 5.
Levels of Measurement
There are four levels of measurement: Nominal, Ordinal, Interval, and Ratio. These go from
lowest level to highest level. Data is classified according to the highest level which it fits. Each
additional level adds something the previous level didn't have.
- Nominal is the lowest level. Only names are meaningful here.
- Ordinal adds an order to the names.
- Interval adds meaningful differences, but there is no starting point (0).
- Ratio adds a zero so that ratios are meaningful.
Types of Sampling
There are five types of sampling: Random, Systematic, Convenience, Cluster, and Stratified.
- Random sampling is analogous to putting everyone's name into a hat and drawing out several
names. Each element in the population has an equal chance of occurring. While this is the
preferred way of sampling, it is often difficult to do. It requires that a complete list of every
element in the population be obtained. Computer generated lists are often used with random
sampling. You can generate random numbers using the TI82 calculator.
- Systematic sampling is easier to do than random sampling. In systematic sampling, the list of
elements is "counted off". That is, every kth element is taken. This is similar to lining
everyone up and numbering off "1,2,3,4; 1,2,3,4; etc". When done numbering, all people
numbered 4 would be used.
- Convenience sampling is very easy to do, but it's probably the worst technique to use. In
convenience sampling, readily available data is used. That is, the first people the surveyor
runs into.
- Cluster sampling is accomplished by dividing the population into groups -- usually
geographically. These groups are called clusters or blocks. The clusters are randomly
selected, and each element in the selected clusters are used.
- Stratified sampling also divides the population into groups called strata. However, this time it
is by some characteristic, not geographically. For instance, the population might be separated
into males and females. A sample is taken from each of these strata using either random,
systematic, or convenience sampling.
Table of contents
James Jones