Ranjan Marathe Blog

TIL: Weighted vs Simple Averages

💡 TIL

Simple averages can be misleading when analyzing hospital mortality rates. Using sample sizes (patient counts) as weights provides a more statistically sound approach, especially when dealing with varying sample sizes.

🎯 Quick Example: Hospital Mortality Rate Analysis

# Hospital Death Rates
# Format: Deaths/Total_Patients = Rate (Weight = Total_Patients)
City_Hospital_A:    15/1000 = 1.5%  (weight: 1000) # Each patient counts equally
City_Hospital_B:    25/1500 = 1.7%  (weight: 1500) # Larger sample = more influence
Rural_Clinic_A:     2/50   = 4.0%   (weight: 50)   # Smaller sample = less influence
Rural_Clinic_B:     1/30   = 3.3%   (weight: 30)   # Smallest sample = least influence

# Why using patient counts as weights makes sense:
# 1. Each patient contributes equally to the overall average
# 2. Automatically adjusts for sample size differences
# 3. Gives proper statistical weight to larger, more reliable samples

Simple Average:   (1.5% + 1.7% + 4.0% + 3.3%) / 4 = 2.63%
Weighted Average: (1.5×1000 + 1.7×1500 + 4.0×50 + 3.3×30) / (1000+1500+50+30) = 1.69%

🔑 Key Points

Using patient counts as weights ensures each patient contributes equally
Larger samples naturally get more weight, aligning with statistical principles
Weights reflect the reliability of each data point
Law of Large Numbers suggests larger samples are more reliable

📊 Formula and Explanation

Weighted_Avg = Σ(Rate × Count) / Σ(Count)

This is equivalent to:
Total_Deaths / Total_Patients = (15+25+2+1) / (1000+1500+50+30)

💭 Why Count-Based Weights Matter

Statistical Validity:
- Larger samples provide more reliable estimates
- Each individual observation has equal influence
Intuitive Results:
- The weighted average (1.69%) represents the true overall death rate
- Simple average (2.63%) overrepresents small clinics
Practical Applications:
- Population studies
- Quality metrics
- Resource allocation decisions

📈 Sample Size Impact

Small Sample (Rural):  80 patients → Higher variance
Large Sample (City): 2500 patients → More stable rates
Total Coverage:     2580 patients → Weighted rate more reliable