Frequency Distributions Graphs

Frequency Distributions & Graphs

Definitions

Raw Data: Data collected in original form.
Frequency: The number of times a certain value or class of values occurs.
Frequency Distribution (Table): The organization of raw data in table form with classes and frequencies.
Categorical Frequency Distribution: A frequency distribution in which the data is only nominal or ordinal.
Ungrouped Frequency Distribution: A frequency distribution of numerical data. The raw data is not grouped.
Grouped Frequency Distribution: A frequency distribution where several numbers are grouped into one class.
Class Limits: Separate one class in a grouped frequency distribution from another. The limits could actually appear in the data and have gaps between the upper limit of one class and the lower limit of the next.
Class Boundaries: Separate one class in a grouped frequency distribution from another. The boundaries have one more decimal place than the raw data and therefore do not appear in the data. There is no gap between the upper boundary of one class and the lower boundary of the next class. The lower class boundary is found by subtracting 0.5 units from the lower class limit and the upper class boundary is found by adding 0.5 units to the upper class limit.
Class Width: The difference between the upper and lower boundaries of any class. The class width is also the difference between the lower limits of two consecutive classes or the upper limits of two consecutive classes. It is not the difference between the upper and lower limits of the same class.
Class Mark (Midpoint): The number in the middle of the class. It is found by adding the upper and lower limits and dividing by two. It can also be found by adding the upper and lower boundaries and dividing by two.
Cumulative Frequency: The number of values less than the upper class boundary for the current class. This is a running total of the frequencies.
Relative Frequency: The frequency divided by the total frequency. This gives the percent of values falling in that class.
Cumulative Relative Frequency (Relative Cumulative Frequency): The running total of the relative frequencies or the cumulative frequency divided by the total frequency. Gives the percent of the values which are less than the upper class boundary.
Histogram: A graph which displays the data by using vertical bars of various heights to represent frequencies. The horizontal axis can be either the class boundaries, the class marks, or the class limits.
Frequency Polygon: A line graph. The frequency is placed along the vertical axis and the class midpoints are placed along the horizontal axis. These points are connected with lines.
Ogive: A frequency polygon of the cumulative frequency or the relative cumulative frequency. The vertical axis the cumulative frequency or relative cumulative frequency. The horizontal axis is the class boundaries. The graph always starts at zero at the lowest class boundary and will end up at the total frequency (for a cumulative frequency) or 1.00 (for a relative cumulative frequency).
Pareto Chart: A bar graph for qualitative data with the bars arranged according to frequency.
Pie Chart: Graphical depiction of data as slices of a pie. The frequency determines the size of the slice. The number of degrees in any slice is the relative frequency times 360 degrees.
Pictograph: A graph that uses pictures to represent data.
Stem and Leaf Plot: A data plot which uses part of the data value as the stem and the rest of the data value (the leaf) to form groups or classes. This is very useful for sorting data quickly.
Dotplot: A number line in which one dot is placed above a value on the number line for each occurrence of that value. That is, one dot means the value occurred once, three dots mean the value occurred three times, etc.
Scatter Plot (Diagram): A plot of the paired data where the independent variable is placed along the horizontal axis and the dependent variable is placed along the vertical axis. Basically, this is just plotting a bunch of ordered pairs.
Statistic: Characteristic or measure obtained from a sample
Parameter: Characteristic or measure obtained from a population
Mean: Sum of all the values divided by the number of values. This can either be a population mean (denoted by mu) or a sample mean (denoted by x bar)
Median: The midpoint of the data after being ranked (sorted in ascending order). There are as many numbers below the median as above the median.
Mode: The most frequent number. There may not be a mode; there may be one mode; or there may be many modes.
Skewed Distribution: The majority of the values lie together on one side with a very few values (the tail) to the other side. In a positively skewed distribution, the tail is to the right and the mean is larger than the median. In a negatively skewed distribution, the tail is to the left and the mean is smaller than the median.
Symmetric Distribution: The data values are evenly distributed on both sides of the mean. In a symmetric distribution, the mean is the median.
Weighted Mean: The mean when each value is multiplied by its weight and summed. This sum is divided by the total of the weights.
Midrange: The mean of the highest and lowest values. (Max + Min) / 2
Range: The difference between the highest and lowest values. Max - Min
Variation: The sum of the squares of the deviations from the mean.
Population Variance: The average of the squares of the distances from the population mean. It is the sum of the squares of the deviations from the mean divided by the population size. The units on the variance are the units of the population squared.
Sample Variance: Unbiased estimator of a population variance. Instead of dividing by the population size, the sum of the squares of the deviations from the sample mean is divided by one less than the sample size. The units on the variance are the units of the population squared.
Standard Deviation: The square root of the variance. The population standard deviation is the square root of the population variance and the sample standard deviation is the square root of the sample variance. The sample standard deviation is not the unbiased estimator for the population standard deviation. The units on the standard deviation is the same as the units of the population/sample.
Coefficient of Variation: Standard deviation divided by the mean, expressed as a percentage. We won't work with the Coefficient of Variation in this course.
Chebyshev's Theorem: The proportion of the values that fall within k standard deviations of the mean is at least 1-1/k^2 where k > 1. Chebyshev's theorem can be applied to any distribution regardless of its shape.
Empirical or Normal Rule: Only valid when a distribution in bell-shaped (normal). Approximately 68% lies within 1 standard deviation of the mean; 95% within 2 standard deviations; and 99.7% within 3 standard deviations of the mean.
Standard Score or Z-Score: The value obtained by subtracting the mean and dividing by the standard deviation. When all values are transformed to their standard scores, the new mean (for Z) will be zero and the standard deviation will be one.
Percentile: The percent of the population which lies below that value. The data must be ranked to find percentiles.
Quartile: Either the 25th, 50th, or 75th percentiles. The 50th percentile is also called the median.
Decile: Either the 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, or 90th percentiles.
Lower Hinge: The median of the lower half of the numbers (up to and including the median). The lower hinge is the first Quartile unless the remainder when dividing the sample size by four is 3.
Upper Hinge: The median of the upper half of the numbers (including the median). The upper hinge is the 3rd Quartile unless the remainder when dividing the sample size by four is 3.
Box and Whiskers Plot (Box Plot): A graphical representation of the minimum value, lower hinge, median, upper hinge, and maximum. Some textbooks, and the TI-82 calculator, define the five values as the minimum, first Quartile, median, third Quartile, and maximum.
Five Number Summary: Minimum value, 1^st quartile, median, 3^rd quartile, and maximum. Some text books use the hinges instead of the quartiles.
InterQuartile Range (IQR): The difference between the 3rd and 1st Quartiles.
Outlier: An extremely high or low value when compared to the rest of the values.
Mild Outliers: Values which lie between 1.5 and 3.0 times the InterQuartile Range below the 1st Quartile or above the 3rd Quartile. Note, some texts use hinges instead of Quartiles.
Extreme Outliers: Values which lie more than 3.0 times the InterQuartile Range below the 1st Quartile or above the 3rd Quartile. Note, some texts use hinges instead of Quartiles.

Table of contents
James Jones