Statistics for Data Science

 

This is a WIP 

Written for audience who want to learn Statistics for Data Science


Mean Median Mode


Mean = Average = Arithmetic Mean = Sum of all numbers / number of numbers
Mean indicates a typical number, middle number, measure of Central Tendency

Data set = {4, 3, 1, 6, 1, 7}

Mean = 22/6 = 3 2/3 = 3.666

Median is the middle number. Order all your numbers in a set and find the middle number. For even number of numbers, find the mean of two middle numbers.

Median = Middle wo numbers 3 & 4 =  (3+4)/2 = 3.5

Mode is the most frequently occurring number in the data set.

Mode = 1

Arithmetic Mean, Geometric Mean and Harmonic Mean


Think about it this way. The arithmetic mean of a bunch of numbers is the number a that satisfies
x₁ + x₂ + x₃ + .... = a + a + a + ... + a

The geometric mean is the number b that satisfies
x₁ * x₂ * x₃ * ... = b * b * b * ... * b

There is also a harmonic mean, which is the number h that satisfies

1/x₁ + 1/x₂ + 1/x₃ + ... = 1/h + 1/h + ... + 1/h.


Pearson Correlation

The strength of a linear association is an indication of how closely the points in the scatterplot fit a straight line. If the points in the scatterplot lie exactly on a straight line, we say there is a strong linear association. If there is no fit at all we say there is no association. To measure the strength of a linear association we use Pearson’s correlation coefficient (r), which has the following properties:

No linear association        Positive linear association        Negative linear association

r = 0                        r = +1                             r = −1




Pearson’s correlation coefficient (r) measures the strength of a linear association (−1 ≤ r ≤ +1). 
  • Positive correlation (0 to +1) – Both quantities increase or decrease at the same time. 
  • Zero or no correlation (0) – No association between the quantities. 
  • Negative correlation (−1 to 0) – One quantity increases, the other quantity decreases.
Note: High correlation between two variables does not imply that one causes the other.



Comments