Statistics for Data Science
This is a WIP
Written for audience who want to learn Statistics for Data Science
Mean Median Mode
Mean = Average = Arithmetic Mean = Sum of all numbers / number of numbers
Mean indicates a typical number, middle number, measure of Central Tendency
Data set = {4, 3, 1, 6, 1, 7}
Mean = 22/6 = 3 2/3 = 3.666
Median is the middle number. Order all your numbers in a set and find the middle number. For even number of numbers, find the mean of two middle numbers.
Median = Middle wo numbers 3 & 4 = (3+4)/2 = 3.5
Mode is the most frequently occurring number in the data set.
Mode = 1
Arithmetic Mean, Geometric Mean and Harmonic Mean
Think about it this way. The arithmetic mean of a bunch of numbers is the number a that satisfies
x₁ + x₂ + x₃ + .... = a + a + a + ... + a
The geometric mean is the number b that satisfies
x₁ * x₂ * x₃ * ... = b * b * b * ... * b
There is also a harmonic mean, which is the number h that satisfies
1/x₁ + 1/x₂ + 1/x₃ + ... = 1/h + 1/h + ... + 1/h.
Pearson Correlation
The strength of a linear association is an indication of how closely the points in the scatterplot fit a straight line. If the points in the scatterplot lie exactly on a straight line, we say there is a strong linear association. If there is no fit at all we say there is no association. To measure the strength of a linear association we use Pearson’s correlation coefficient (r), which has the following properties:
No linear association Positive linear association Negative linear association
r = 0 r = +1 r = −1
Pearson’s correlation coefficient (r) measures the strength of a linear association (−1 ≤ r ≤ +1).
- Positive correlation (0 to +1) – Both quantities increase or decrease at the same time.
- Zero or no correlation (0) – No association between the quantities.
- Negative correlation (−1 to 0) – One quantity increases, the other quantity decreases.
Note: High correlation between two variables does not imply that one causes the other.
Comments
Post a Comment