Skip to main content

Statistician Answers Stats Questions From Twitter

Jeffrey Rosenthal, a professor of statistics at the University of Toronto, answers the internet's burning questions about statistics. What are the most common statistical errors? Why do polls get it so wrong? What's the worst casino game in terms of odds? How does probability work in roulette? Jeffrey answers all these questions and much more!

Released on 02/21/2022

Transcript

Hello, I'm Jeffrey Rosenthal.

I'm a Professor of Statistics

at the University of Toronto.

And this is Stats Support.

[upbeat music]

Question from Kingdweeb,

Why do statisticians get so worked up over probability?

Every event is just 50/50.

It either happens or it doesn't.

This is something I've heard before this idea that,

well if it can either happen or not, it must be 50/50.

Sometimes that's referred to by philosophers

as the principle of indifference

meaning that anything that could happen

they must all have the same probability.

The thing is, it's just not true.

When I go home today from the studio

I might get killed by a bolt of lightning,

or I might not get killed by a bolt of lightning.

But I'm pretty sure there's not a 50% chance

I'm gonna get killed by a bolt of lightning.

Okay next, we have a question from Whatthefuss who says,

Why is statistics important in life?

Really, we're awash in all kinds of different data.

So anything from the spread of disease

or crime statistics, or studies of a medical treatment

or financial data or public opinion polls,

there's so many facts and figures and statistics out there.

The science of statistics

is a way to try to sort through it.

So if you don't have any statistical knowledge

or understanding or perspective,

then you're likely just say, well

this must be true because my friend said it,

or this must be true because I heard it on the news

or I just kind of think it must be true.

But if you have statistics,

you can try to analyze all the facts

and figures that are out there

and try to see what are the real trends,

what's really happening versus what things really

aren't the way people think they are.

Next we have question from Lawrenceitv says,

Question for statisticians.

Why did the polls get it so wrong, explanations please?

Yeah, so public opinion polling, especially

when it's predicting elections is a very high profile thing

but also a hard thing to do.

And usually people notice the mistakes more

than the corrections.

So, a lot of public polling for elections

has actually been quite accurate

and it's predicted things quite well

but there have been some high profile misses, for example

the US presidential elections of 2016 and the 2020.

Now, even in those cases,

typically the polls prediction compared

to the actual results was usually only off

by about four or 5%,

which isn't such a huge amount considering

how hard it is to figure out what's gonna happen.

But it's still a big enough error

that if the election's close, it can make a big difference.

So why is that?

Well, election polls, of course they don't ask everybody

how they're gonna vote.

They just ask a sample, usually a few thousand people

and then try to figure out

what maybe a hundred million people are going to do.

So, that is a challenge.

The good news is if the polling is done randomly,

that is we're equally likely to pick every person

with the same probability.

Then we have good statistics to allow us to figure out

how accurate we're gonna be,

what will be the so-called margin of error?

How close we'll usually be to the true answer.

And actually that works pretty well

but what makes it especially hard for the pollsters

is that it's hard to get a random sample.

And the main reason

is because most people don't wanna talk to pollsters.

Polling companies don't necessarily like to talk about it,

but their response rates are usually less than 10%.

And that can lead to a lot of biases

because maybe people who support a certain candidate

are a little bit more likely to agree to talk

to the pollsters than people who support another candidate.

And any little response bias

like that can have a huge impact on the results.

Question from, CmonMattTHINK,

What are some common statistical errors

and how can we learn to spot them and if possible,

correct them in others and our own work?

One of the biggest things is people don't think

about what I like to call the out of how many principle.

And that's this idea that when something happens

at striking people will compute the probability

of it happening in that exact way to that exact person,

but not look at the chance that it will happen

in some way to somebody.

There was a woman

in England who had two sons who each died in infancy.

There is something, as you probably know

called SIDS or a sudden infant death syndrome.

So maybe just two times she got really, really unlucky

and her baby stopped breathing, or maybe she was a murderer.

And she had actually, she'd actually suffocated them

and she was arrested and charged.

And at her trial, they said,

Oh it's so unlikely that there'd be two SIDS cases

in the same family that we can rule that out.

She must have actually tried to kill them.

And that's an interesting example

where if you just look at the probability,

given two kids in one family,

what's the chance they're both gonna die of SIDS?

Of course, it is very unlikely.

But then if you say out of all the millions of families

in the United Kingdom or in the whole world

what's a chance that somewhere there's a family

where two kids both died of SIDS?

Extremely likely.

And it seems like that was the case with her.

There was actually no other evidence

that she had actually tried to kill these kids.

She was just extremely unlucky.

And yet, she was convicted, she was jailed.

She spent several years in jail

before there was enough of an outcry.

And eventually on the second appeal,

the case was overturned.

Question from Josh Levs says,

What's more likely than winning the lottery?

The short answer is everything,

that is to say if you're talking about winning

a lottery jackpot for one of the big lotteries,

like mega millions or power ball

then the chance of winning that jackpot

with a single ticket is one chance

in a couple of hundred million, depending on which lottery.

So, just incredibly unlikely.

So compared to that, almost anything you can think of,

being killed by a bolt of lightning

or the next person you meet will one day be the president

of the United States

or any crazy thing you can come up with.

We can estimate the odds for all of them

and they're all more likely

than the chance you're gonna win the Powerball lottery.

And in fact, one that I like to use as an example

is if you drive to the store to buy your lottery ticket,

you're way more likely to be killed in a car crash

on your way to the store than you are to win the jackpot.

Next, we have a question from SmollyMall.

I'm just patiently waiting for people to realize

that all statistics are skewed because the data is skewed

in so many ways that I can can't even list them all.

So not a big fan of statistics, maybe, but that's true.

That's a good point that all data

is gonna have some things that are wrong with it.

Maybe it was bias.

Maybe it wasn't measured correctly.

Maybe it only shows part of the story

but I don't think that means we should just forget

about it and just forget about statistics and data.

I think what it means is we have to think carefully

when we get data, we have to say,

how is this data collected?

Is it an accurate reflection of the truth?

In what ways is it gonna be biased or misleading?

And then we can still draw inferences from it.

But it's true that we have to be careful.

We have a question from John Friedberg says,

About to play what be the absolute worst casino game

in terms of player odds, any guesses?

Well, it's an interesting question.

There's different casinos with different games

but one of the games, which to my surprise

is one of the most popular

and also has one of the worst odds against you

is the video lottery terminals.

So people love them, but they usually have

at least a 5% and maybe 10% or even 15% housage.

So, they're really not the best game.

Now, there are some casino games which have odds

which are much better for the players.

So for example, of the pure chance games, the game Craps

where you repeatedly roll a pair of dice,

kind of like these you have a 49.2929% chance of winning.

Next, we have a question from ShavaKadzi,

Are murder rates skyrocketing

or the media doesn't have much to report,

so they are focusing more on that?

Yeah, it's a good question.

So, murder rates have generally been coming down

a little bit in the last couple of decades.

But in the last few years,

there's been a little bit of an uptake.

So they're now a little bit higher

than they were a few years ago

but there's still quite a bit lower

than they were a decade or two go.

Also I've noticed for example

politicians and police spokespeople and so on,

they all will at times say, oh

crime rates are way up for their own reasons.

They have reasons for wanting that to be said,

even though, maybe it's not actually true.

So it's just one more reason

that if you wanna know what's happening

with something like rates of crime,

well don't listen to what a few people are saying.

Look at the actual statistics

and then you can see the truth.

Next, we have a question from Brentaclan, says,

How does probability work in the roulettes?

So that's a good question.

Roulettes are fairly simple.

So the standard American Roulette Wheel

has 38 of those little wedge slots.

And two of them are green.

There's the zero and the double zero.

And then the others are divided

into eight 18 red and 18 black.

The person at the casino spins the wheel.

And presumably it's equally likely

to come up any of those 38 different wedges.

So what it means is if you bet on, for example, red,

well 18 out of the 38 wedges are red.

So you have an 18 out of 38 chance of getting red

which is a little bit less than 50%.

And that's why, if you bet on red

there's an even my payout, but on average

you're gonna lose a little bit more money than you win.

You can also sometimes bet on different things

like all the even numbers or something like that.

But whichever bet you do, it works out to the same thing.

There's a slight edge in favor of the casino.

And that's why if you play Roulette,

over a long period of time, it's gonna be more

and more sure that you're gonna lose more money

than you win.

A question from 6Latin6Lover6,

Who makes betting odds, is it an algorithm?

So it's a really interesting problem

for the bookies or the people who are making these odds.

Now, the goal is pretty easy to understand

'cause if you're a bookie, what you want is pretty

much to have the same amount of betting on both sides.

So that in the end, you don't really care

if the horse wins or not

or you don't really care if the team wins or not

'cause either way you're gonna make money,

'cause you're gonna get your cut.

Whereas if everybody bet on one side and then they all won

then you could lose a lot of money.

But on the other hand

how they do that is kind of a challenge.

And usually, they're updating their odds as they go.

And if they see you, everybody's betting

on this one team G we better change the odds

so that the next betters

are more likely to bet on the other side.

And I'm not a bookie, but my impression

is that in the old days, it used to be on just kind of

by their judgment or experienced people

looking things over and tweaking things.

Whereas now there's so much online gambling

that a lot of it is automated and they have algorithms

which I think are not simple based

on how everybody's betting and trying to adjust things.

But the goal is pretty easy to understand,

trying to balance out those bets.

Question from Zenodotus.

What is stochastic process, really?

Well, I'm glad you asked.

So, stochastic is just another word for random.

So, it means random processes

or things that proceed randomly in time.

And the simplest example is actually one.

I sometimes like to illustrate

with my students using a stuff frog.

So I'll do that here.

And we imagine we have a frog,

which every second randomly decides

either to move one step this way

or to move one step this way.

And once it does, then the next second,

it again decides randomly to move one step this way

or one step this way.

And yet, it's actually really interesting

for mathematicians to study this.

What's the chance that the frog will eventually return

into where it started, turns out it's 100%.

It's certain, they might take a really long time

but eventually it's gonna return to where it started.

And in fact, eventually,

it's gonna be a million steps that way.

And eventually it's gonna be a billion steps that way,

it's gonna go to every single place.

Eventually, if you wait long enough with probability one,

we can prove that.

Next, we have a question from Anacelx, says,

What does it mean to be statistically significant?

So, statistically significant is saying probably

it wasn't just chance.

That this is enough of an effect that we can pretty much,

you can never do it for sure, but you can pretty much say

it's probably not due to chance alone.

Probably this actually shows something real.

There was really a difference

or there was really an increase

or something really happened.

It wasn't just the random luck.

So, the basic idea is pretty simple.

It sometimes gets lost in the details,

but when you notice something that happens,

maybe, oh this classroom did better

on the test than this other classroom.

Then as statisticians, the fundamental question

you're always asking is, does that mean something real?

Like, oh, maybe the teaching was better in this class,

or maybe people in that class are smarter.

Or was it just random luck?

So, you'd never expect any two results

to be exactly the same.

There's always gonna be some differences.

Okay, next question from John Elworthy.

Can someone please help with this?

What are the odds of having three generations

of family members being born on the same day?

First was born on January 10th, 1943,

the second, same day, 1994

the third, same day in 2022.

It's actually a good example

of the sort of question that there's different ways

of looking at the probability.

So, if you just say there's three people,

what are the chances they'll all have been born

on the same day?

Well, that's pretty straightforward.

So you can think,

well the first one could be born on any day,

doesn't really matter.

Then the second one has roughly one chance

in 365 of being born on that same day.

And then the third one has roughly one chance

in 365 of being born again on that same day.

So, it's one chance in 365 times 365

which was at a little lesser

and one chance in a hundred thousand, I think.

So, it's quite unlikely.

One way I'd like to look at these kind of questions

is this is sort of out of how many different ways

that this could have happened.

So even in this one family,

probably there's a lot of other people

in each of those generations.

And if any three of them had matched up their birthdays,

then the same tweet could have been written.

So right away, the chance is a lot bigger

'cause there's lots of different combinations

which all could have led to the same conclusion.

It's not incredible that it happens,

but it's still pretty cool when it does happen to you.

From AjaoSeyi, says,

How best can a statistician explain P value

to a non statistician?

Yeah, so that's a good question.

The basic idea of a P value is the idea

of what is the probability that the thing you just observed

would've happened just by pure chance

if there was no true effect?

If we look at, let's say, we have some people

with a disease and we give them a new treatment,

and then a certain number of them get better.

Do we say, oh well,

that means the new treatment really helped?

Well, no, 'cause some of them would've gotten better

even without this new treatment.

Maybe more of them got better

than you'd expect on average from the new treatment.

Yeah, but how much more

and the P value question would be, what's the probability

if we hadn't given any treatment that that same number

or more of the people would still have gotten better?

And if that P value is pretty high,

maybe there was a 40% chance

that they would've gotten better even without the treatment,

we haven't really proved anything.

And the typical standard is that if the P value

is less than 5% or less than one chance in 20,

then we say, okay it's pretty unlikely

that they all would've gotten better

if it hadn't been for this new treatment.

So, this provides some evidence

that the new treatment is helping.

But if the P value's larger, it doesn't.

Okay, so next a question from King Mbuso says,

Statistically, what are the chances?

And right, and this is a display of draw results.

And I believe this was

from the South Africa Powerball Lottery

back in December of 2020.

And what happened was a little surprising.

So of the main numbers

there were five numbers chosen in a row,

five, six, seven, eight, nine

and then the bonus Powerball number chosen was a 10.

So we had six numbers all in a row for the draw,

seemed very surprising.

So you could say, what are the chances of that happening?

Well, the rules of the South African Powerball then,

were you choose five numbers between one and 50

and then a bonus number between one and 20.

So you could say how many different ways

could you get them all in a row like that?

Well, the first five numbers would have to be five numbers

in a row, starting with something

from one, two, three up to 15, really.

So that's only 15 ways.

And then the power ball number would have

to be the next one.

So there's a very small number.

And then when you divide that by the total number

of different ways you could have chosen those five balls

plus the one bonus thing, there's many more of those.

So when you divide it, you get that there's a little less

than one chance in 2 million that such a sequence like that

would've come up.

Question from Chris Masterson.

Is it statistically less likely

to be in a plane crash if you've already been in one?

Well, no. And of course the answer is no.

And if you think about it, how could it be?

How could this new plane know, wait a minute.

There's somebody on here who was on another crash.

So I better not crash this time.

That's just not the way science works.

It's not the way airplanes work.

It's not the way pilots work

but a lot of people will think that.

And the reason people think that

is because it's very unlikely any one person

is gonna be on two different that crash, right?

That's really bad luck, but once you've already been on one

that was very unlucky, but now it doesn't have any effect

on the probability of the next plane.

They are what we call statistically independent events.

So, neither one affects the probability of the other.

So a question from Tetraform says,

Hey, what is the most statistically improbable thing

to happen to you?

Well, when I was in my early teens,

my family went on a trip to Disney World, Florida.

And in the middle of it all,

we looked up and we saw my father's cousin, Phil.

And he lived in Connecticut at the time.

And we lived in Toronto, Canada

and we had no idea he was gonna be there.

I said, What are the odds

that out of all of the hundreds of millions of people

in the United States and all the people

that visited Disney World,

that my dad's cousin would to be there?

It's a good example that on the one hand,

if you just say what's the chance

that one guy would be my dad's cousin Phil,

it's incredibly unlikely, but as with a lot of things

if you take the bigger picture, you can say,

well my dad's cousin, Phil, isn't the only person

we would've been so surprised to see.

What about my dad's other cousins or my mom's cousins,

or my cousins or my piano teacher or my friend from school,

there's probably a few hundred people

that we would've been really surprised to see.

And then you say, well, we were at Disneyland

for a couple of days and we went on lots of different rides

and so on.

And we probably saw thousands of people.

And just one of them was my dad's cousin, Phil,

the other ones were other people.

So, it's actually not so unlikely.

And I end up computing there's about one chance in 200

or so, about half of 1% that if you go on a trip

to Disney World and spend a couple of days there,

on all the rides, that you run into somebody that you know.

So it's not so incredible,

even though it sure was a surprise at the time.

Okay, so I think that's all the questions for today

and I hope you learned something

and I hope I'll see you again.

Up Next