8
$\begingroup$

Consider an example as follows.

I am running a mobile app that allows users to apply for a loan on the app.

Say a guy signed in to my app to use his phone to apply for a loan.

Call events:

A = a person has a smartphone

B = default

First of all, I assign P(B)=0.8 for this guy without any information (just be conservative).

Assume that in my country P(A) = 0.5, i.e. only 50% of the population do have a smartphone.

Assume P(A|B) = 1, i.e. when I look to my database, all the guys who did not pay back so far do have a smartphone, that is obvious because users need a smartphone to install my app.

So apply Bayes:

P(B|A) = P(A|B) * P (B) / P(A) = 1 * 0.8 / 0.5 = 1.6

Two problems here indeed:

1) P(B|A) > 1. I know that more than one thread on StackExchange discussed this problem in theory to prove that P(B|A) <= 1 in all cases but could not find why my inference is wrong.

2) Adding one more bit of information, such as "this guy has a smartphone", according to my Bayesian inference, in fact will increase the probability of default of his case, while in my intuitive inference, it does not bring any information because I know all my customers do have smartphone already. How to explain that?

$\endgroup$
2
  • $\begingroup$ Like the top answer below gets to (2nd paragraph), you have to use the same population in the denominator when using relative frequency as probability. $\endgroup$
    – Jeff Y
    Commented Nov 4, 2019 at 17:45
  • $\begingroup$ In particular, you need something like P(A|B) = 1 * (number of people in my database) / (number of people in my country). $\endgroup$
    – Jeff Y
    Commented Nov 4, 2019 at 17:52

3 Answers 3

24
$\begingroup$

On the face of it your assumptions are inconsistent in that you think more people will default than have smartphones but you also think all defaulters have smartphones.

Part of the problem is that some of your assumptions are for users of your app and part for the whole population and you treat these as being for the same group

If instead you just consider users of your app, you might have $P(B)=0.8$ and $P(A)=1$ and $P(A \mid B)=1$. This will now give you $P(B \mid A)= P(A\mid B) \space P (B) \space / \space P(A) = 1 \times 0.8 / 1 = 0.8$ and there are no problems there apart from the lack of value in considering $A$ since all users of your app have smartphones

$\endgroup$
12
$\begingroup$

You have an inconsistent set of assumptions.

Like saying 80 % of the population of the world like soccer. 100 % of all the people, who like soccer, like also tennis, which implies at least 80 % of the population like tennis. But then you say, 50 % of the population like tennis...!

In order to derive $P(A)$, you could specify first $P(A\vert B^c)$ and then calculate $P(A) = P(A\vert B)P(B) + P(A\vert B^c)(1-P(B))$. Or directly deriving $P(A)$ from other reasoning, but in a consistent way with your previous assumptions.

$\endgroup$
2
$\begingroup$

You've written:

P(B|A) = P(A|B) * P (B) / P(A) = 1 * 0.8 / 0.5 = 1.6

by which you mean:

P(default|smartphone) = P(smartphone|default) * P (default) / P(smartphone) = 1.6

which seems wrong, and indeed is. The problem here is you have forgotten that there is an implicit condition in some of these probabilities, namely that the person actually has a loan (or equivalently, that some of the probabilities are for different populations). So in fact the numbers you have used are:

P(smartphone|default,loan) * P(default,loan) / P(smartphone)

which leads to a nonsense answer because P(smartphone) isn't matched on the "has a loan" condition (it's the probability that any random person has a phone). For Bayes' rule to work here you would need to use the probability that someone has a smartphone given they have a loan, which, since you note that "users need a smartphone to install my app", will of course be 1, leading to the (correct, but)not very useful result that:

P(default|loan,smartphone) =  P(phone|default) * P(default|loan) / P(phone|loan) = 1 * 0.8 / 1 = 0.8

i.e. you learn nothing to refine your prior from the fact that the user has a smartphone, which is intuitively obvious, since they need a smartphone to even have the chance to default, so all defaulting borrowers AND non-defaulting borrowers have smartphones (the only people who might not have smartphones, are those who don't have loans).

As an aside, we could do some rough analysis like:

P(default|smartphone) = P(smartphone|default) * P (default) / P(smartphone)
                      = P(loan) * P(default|loan) / P(smartphone)
                      = P(default) / P(smartphone)

Which suggests that the probability that a random person who has a smartphone defaults on of the loans is higher than the probability that a general random person does, which makes sense since the fraction of the general population who don't have a smart phone can't get a loan so can't possibly default. But again this is hardly informative.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.