Statistical Distribution

Descriptive Statistics

  • Mean
    • average of all observation
    • mean = (sum of all observations)/(sample size)
  • Median 
    • the middle value of all observations  
    • if sample size is odd
      • median = ((n+1)/2)th largest value
    • if the sample size is even
      • median = the average of the (n/2)th and ((n/2)+1)th largest value 
  • Mode
    • the most commonly occurring value
    • if there is more than 1 most commonly occurring value, there are as many modes as most commonly occurring values   
  • in decreasing order of resistance to outliers, mode > median > mean 

Types of Distributions

  • Normal
    • aka Gaussian, bell-shaped
    • for continuous variables
    • mean = median = mode
  • Bi-modal
    • distribution has 2 humps (each being a relative mode)
    • if symmetrical, mean = median
  • Skewed  
    • positive skew
      • asymmetrical with tail trailing off to right
      • mean > median > mode  
    • negative skew
      • asymmetrical with tail trailing off to left
      • mean < median < mode
    • mean very sensitive to skew 
    • median somewhat resistant to skew
    • mode very resistant to skew
  • Other
    • non-continuous variable types have their own distributions 
  • e.g., binary, categorical, ordinal, binomial, and count variables 

Characteristics of the Normal Distribution

  • For continuous variables
  • Defined entirely by 2 parameters
  • Mean (µ)
    • standard deviation (σ)
  • A certain percentage of all observations will always fall within +/- certain standard deviations of the mean  
    • +/- 1 standard deviation = 68%  
    • +/- 2 standard deviations = 95%    
  • +/- 3 standard deviations = 99.7%

 Regression to the Mean

  • Phenomenon in which sample points which were initially extreme often become closer to the mean in future measurements
  • Most points will fall near on the average; therefore, extreme points are often a result of “luck” (e.g., a student performs particularly poor on an exam but normally performs at the average level)
  • Has significance for study design
    • e.g., patients with high blood pressure may improve after taking an experimental anti-hypertensive, but that improvement on the next measurement may be due to regression to the mean rather than the treatmentt
  • the solution is to compare a control and experimental group. 

Measures of Variability

  • Standard deviation
    • a statistical measure that demonstrates how close together or spread apart the data is 
      • if data is closer together, the standard deviation will be smaller (and vice versa)
    • often designated by σ
    • equation
      • square root[(sum of the differences between each data point and the mean squared)/n]
  • Standard error
    • a statistical measure that demonstrates how far the sample mean is from the true population mean
      • helps determine confidence intervals
    • equation
      • standard deviation/square root of n