Statistical Distribution

Descriptive Statistics

Mean
- average of all observation
- mean = (sum of all observations)/(sample size)
Median
- the middle value of all observations
- if sample size is odd
  - median = ((n+1)/2)th largest value
- if the sample size is even
  - median = the average of the (n/2)th and ((n/2)+1)th largest value
Mode
- the most commonly occurring value
- if there is more than 1 most commonly occurring value, there are as many modes as most commonly occurring values
in decreasing order of resistance to outliers, mode > median > mean

Types of Distributions

Normal
- aka Gaussian, bell-shaped
- for continuous variables
- mean = median = mode
Bi-modal
- distribution has 2 humps (each being a relative mode)
- if symmetrical, mean = median
Skewed
- positive skew
  - asymmetrical with tail trailing off to right
  - mean > median > mode
- negative skew
  - asymmetrical with tail trailing off to left
  - mean < median < mode
- mean very sensitive to skew
- median somewhat resistant to skew
- mode very resistant to skew
Other
- non-continuous variable types have their own distributions
e.g., binary, categorical, ordinal, binomial, and count variables

Characteristics of the Normal Distribution

For continuous variables
Defined entirely by 2 parameters
Mean (µ)
- standard deviation (σ)
A certain percentage of all observations will always fall within +/- certain standard deviations of the mean
- +/- 1 standard deviation = 68%
- +/- 2 standard deviations = 95%
+/- 3 standard deviations = 99.7%

Regression to the Mean

Phenomenon in which sample points which were initially extreme often become closer to the mean in future measurements
Most points will fall near on the average; therefore, extreme points are often a result of “luck” (e.g., a student performs particularly poor on an exam but normally performs at the average level)
Has significance for study design
- e.g., patients with high blood pressure may improve after taking an experimental anti-hypertensive, but that improvement on the next measurement may be due to regression to the mean rather than the treatmentt
the solution is to compare a control and experimental group.

Measures of Variability

Standard deviation
- a statistical measure that demonstrates how close together or spread apart the data is
  - if data is closer together, the standard deviation will be smaller (and vice versa)
- often designated by σ
- equation
  - square root[(sum of the differences between each data point and the mean squared)/n]
Standard error
- a statistical measure that demonstrates how far the sample mean is from the true population mean
  - helps determine confidence intervals
- equation
  - standard deviation/square root of n