Descriptive Statistics
- Mean
- average of all observation
- mean = (sum of all observations)/(sample size)
- Median
- if sample size is odd
- median = ((n+1)/2)th largest value
- if sample size is odd
- Mode
- the most commonly occurring value
- in decreasing order of resistance to outliers, mode > median > mean
Types of Distributions
- Normal
- aka Gaussian, bell-shaped
- for continuous variables
- mean = median = mode
- Bi-modal
- distribution has 2 humps (each being a relative mode)
- if symmetrical, mean = median
- Skewed
- negative skew
- asymmetrical with tail trailing off to left
- mean < median < mode
- median somewhat resistant to skew
- mode very resistant to skew
- negative skew
- Other
- non-continuous variable types have their own distributions
- e.g., binary, categorical, ordinal, binomial, and count variables
Characteristics of the Normal Distribution
- For continuous variables
- Defined entirely by 2 parameters
- Mean (µ)
- standard deviation (σ)
- A certain percentage of all observations will always fall within +/- certain standard deviations of the mean
- +/- 3 standard deviations = 99.7%
Regression to the Mean
- Phenomenon in which sample points which were initially extreme often become closer to the mean in future measurements
- Most points will fall near on the average; therefore, extreme points are often a result of “luck” (e.g., a student performs particularly poor on an exam but normally performs at the average level)
- Has significance for study design
- e.g., patients with high blood pressure may improve after taking an experimental anti-hypertensive, but that improvement on the next measurement may be due to regression to the mean rather than the treatmentt
- the solution is to compare a control and experimental group.
Measures of Variability
- Standard deviation
- a statistical measure that demonstrates how close together or spread apart the data is
- if data is closer together, the standard deviation will be smaller (and vice versa)
- often designated by σ
- equation
- square root[(sum of the differences between each data point and the mean squared)/n]
- a statistical measure that demonstrates how close together or spread apart the data is
- Standard error
- a statistical measure that demonstrates how far the sample mean is from the true population mean
- helps determine confidence intervals
- equation
- standard deviation/square root of n
- a statistical measure that demonstrates how far the sample mean is from the true population mean