TIL: Law of Large Numbers
💡 TIL
The Law of Large Numbers states that as the sample size increases, the sample mean tends to get closer to the true population mean. This fundamental principle explains why larger datasets generally provide more reliable statistical insights.
🎯 Quick Example: Coin Flip Probability
import random
# Simulated coin flips at different sample sizes
small_sample = 10 # Highly variable
medium_sample = 1000 # More stable
large_sample = 10000 # Very stable
# Results (Heads Percentage)
# Small Sample (10 flips):
heads = 7
print(f"10 flips: {(heads/small_sample)*100:.1f}% heads") # 70.0% (far from 50%)
# Medium Sample (1000 flips):
heads = 487
print(f"1000 flips: {(heads/medium_sample)*100:.1f}% heads") # 48.7% (closer to 50%)
# Large Sample (10000 flips):
heads = 5024
print(f"10000 flips: {(heads/large_sample)*100:.1f}% heads") # 50.2% (very close to 50%)
# Expected probability: 50%
🔑 Key Points
- Sample Size Effect:
- Small samples: High variance, unreliable estimates
- Large samples: Lower variance, more reliable estimates
- Very large samples: Converge toward true probability
- Real-world Applications:
- Insurance risk assessment
- Quality control in manufacturing
- Medical trial outcomes
- Survey accuracy estimation
📊 Formula
P(|X̄ₙ - μ| < ε) → 1 as n → ∞
Where:
- X̄ₙ is the sample mean
- μ is the true population mean
- ε is any small positive number
- n is the sample size
💭 Why It Matters
- Research Design:
- Helps determine appropriate sample sizes
- Guides statistical power calculations
- Informs confidence level estimations
- Business Applications:
- A/B testing sample size requirements
- Customer satisfaction survey design
- Risk assessment in financial models
📈 Visual Example: Convergence
# Convergence of dice rolls to expected value (3.5)
sample_sizes = [10, 100, 1000, 10000]
results = {
10: 3.9, # High variance
100: 3.58, # Getting closer
1000: 3.52, # Very close
10000: 3.51 # Almost exact
}
# Expected value of a fair die: (1+2+3+4+5+6)/6 = 3.5
🏥 Healthcare Example
Consider mortality rates in hospitals:
# Small Rural Hospital
cases = 50
deaths = 2
mortality_rate = 4.0% # High variance!
# Large City Hospital
cases = 5000
deaths = 175
mortality_rate = 3.5% # More reliable estimate
# Key Insight: The larger hospital's rate is more likely
# to reflect the true mortality rate for similar cases
📚 Best Practices
- Sample Size Planning:
- Calculate minimum required sample sizes
- Account for expected effect sizes
- Consider practical constraints
- Interpretation Guidelines:
- Always report sample sizes
- Include confidence intervals
- Acknowledge limitations of small samples
- Common Pitfalls to Avoid:
- Overconfidence in small sample results
- Ignoring population variability
- Neglecting practical significance
🔍 Related Concepts
- Central Limit Theorem
- Standard Error of the Mean
- Confidence Intervals
- Statistical Power
📝 Note
While larger samples are generally better, practical constraints often limit sample sizes. The key is finding the balance between statistical reliability and resource constraints.