SlideShare a Scribd company logo
Think stats: chi square test
in digital analytics
Chi-Square Test for independence FTW!!11one
Pawel Kapuscinski
pawel@databall.co
@aliendeg
Chi-square test use cases
Is gender a factor in color preference of a car?
Comparing the number of sales from the test experience vs the control
experience (A/B test or A/B/n)
Comparing sales revenues of each product before and after the change in
strategy
Is country a factor in pricing plan preference?
Is weather a factor in sales of different products?
Implementing the chi square test
1. Identify the two variables of interest from the data table
2. State hypothesis
3. Compute Margin summations
4. Build contingency table
5. Compute the observed chi-square value
6. Compare the observed value to critical value
IMPORTANT: Requirements for chi squared test
The variables under study are each categorical. If sample data are displayed in a
Hypothesis testing steps
1. State null (H0) and alternative (H1) hypothesis
2. Choose level of significance
3. Find critical values
4. Find test statistic
5. Draw your conclusion
Chi squared distribution plots
Dataset - pricing plans sold across world
Sold plans
Professional Team Business Enterprise
USA 1220 790 500 190
UK 950 590 200 120
Germany 880 420 320 70
Sweden 340 260 130 60
Belgium 290 190 110 80
Poland 910 290 190 40
Spain 250 320 220 50
Hypothesis
H0: Number of sales of each pricing plan is independent upon country
H1: Number of sales of each pricing plan is dependent upon country
Finding test statistics (manually, Excel and R)
Find critical value
(https://www.ma.utexas.edu/users/davis/375/popecol/tables/chisq.html)
Compute Margin summations
Summing rows and columns
Build contingency table
Compute the observed chi-square value
Finding test statistics - results
R code
df = data.frame(Prof= c(152,118,110,42,36,113,31),
Team = c(98,73,52,32,23,36,40),
Business = c(62,25,40,16,13,23,27),
Enterprise = c(23,15,8,7,10,6,6))
chisq.test(df)
Drawing conclusion
We can reject hypothesis zero (H0) and
accept H1. Number of sales of each pricing
plan is dependent upon country
Learn more
http://stats.stackexchange.com
www.analyticsvidhya.com
www.dartistics.com
Measure Slack - http://join.measure.chat
Assignment / homework
Transactions
mobile desktop tablet
Direct 3490028 538101 526095
Paid Search 1229227 214050 210811
Organic Search 862144 401720 193064
Referral 228352 129927 39693
Affiliates 38669 31947 12523
Email 35681 14284 6615
Social 9013 5196 2070
Display 231 171 47
(Other) 58 82 36
Questions?

More Related Content

Chi squared test for digital analytics

  • 1. Think stats: chi square test in digital analytics Chi-Square Test for independence FTW!!11one Pawel Kapuscinski pawel@databall.co @aliendeg
  • 2. Chi-square test use cases Is gender a factor in color preference of a car? Comparing the number of sales from the test experience vs the control experience (A/B test or A/B/n) Comparing sales revenues of each product before and after the change in strategy Is country a factor in pricing plan preference? Is weather a factor in sales of different products?
  • 3. Implementing the chi square test 1. Identify the two variables of interest from the data table 2. State hypothesis 3. Compute Margin summations 4. Build contingency table 5. Compute the observed chi-square value 6. Compare the observed value to critical value IMPORTANT: Requirements for chi squared test The variables under study are each categorical. If sample data are displayed in a
  • 4. Hypothesis testing steps 1. State null (H0) and alternative (H1) hypothesis 2. Choose level of significance 3. Find critical values 4. Find test statistic 5. Draw your conclusion
  • 6. Dataset - pricing plans sold across world Sold plans Professional Team Business Enterprise USA 1220 790 500 190 UK 950 590 200 120 Germany 880 420 320 70 Sweden 340 260 130 60 Belgium 290 190 110 80 Poland 910 290 190 40 Spain 250 320 220 50
  • 7. Hypothesis H0: Number of sales of each pricing plan is independent upon country H1: Number of sales of each pricing plan is dependent upon country
  • 8. Finding test statistics (manually, Excel and R) Find critical value (https://www.ma.utexas.edu/users/davis/375/popecol/tables/chisq.html) Compute Margin summations Summing rows and columns Build contingency table Compute the observed chi-square value
  • 10. R code df = data.frame(Prof= c(152,118,110,42,36,113,31), Team = c(98,73,52,32,23,36,40), Business = c(62,25,40,16,13,23,27), Enterprise = c(23,15,8,7,10,6,6)) chisq.test(df)
  • 11. Drawing conclusion We can reject hypothesis zero (H0) and accept H1. Number of sales of each pricing plan is dependent upon country
  • 13. Assignment / homework Transactions mobile desktop tablet Direct 3490028 538101 526095 Paid Search 1229227 214050 210811 Organic Search 862144 401720 193064 Referral 228352 129927 39693 Affiliates 38669 31947 12523 Email 35681 14284 6615 Social 9013 5196 2070 Display 231 171 47 (Other) 58 82 36

Editor's Notes

  1. rejection region