ISI-BUDS 2025
Think 💭 - Pair 👫🏽 - Share 💬
What have you so far heard about Bayesian Statistics?
Think 💭 - Pair 👫🏽 - Share 💬
“In 1985 only about 10% of JASA [Journal of the American Statistical Association] Applications and Case Studies articles used Bayesian methods.
In 2022 plus half of 2023, that percentage has changed. It is now 49%.”
Jeff Witmer, To Bayes or Not to Bayes – Is There Any Question? Talk at Joint Statistical Meetings, 2023
The notes for this lecture are derived from Chapter 1 of the Bayes Rules! book
How can we live if we don’t change?
Beyoncé
Bayesian: a probability measures the relative plausibility of an event.
Frequentist: a probability measures the long-run relative frequency of a repeatable event. In fact, “frequentists” are so named because of their interpretation of probability as a long-run relative frequency.
Consider two claims. (1) Zuofu claims that he can predict the outcome of a coin flip. To test his claim, you flip a fair coin 10 times and he correctly predicts all 10!
You’re more confident in Kavya’s claim than Zuofu’s claim.
The evidence supporting Zuofu’s claim is just as strong as the evidence supporting Kavya’s claim.
Suppose that during a recent doctor’s visit, you tested positive for a very rare disease. If you only get to ask the doctor one question, which would it be?
Belief in afterlife | |||
---|---|---|---|
Taken a college science class | Yes | No | Total |
Yes | 2702 | 634 | 3336 |
No | 3722 | 837 | 4559 |
Total | 6424 | 1471 | 7895 |
Data from General Social Survey
\(P(\text{belief in afterlife})\) = ? \(P(\text{belief in afterlife and taken a college science class})\) = ?
\(P(\text{belief in afterlife given taken a college science class})\) = ?
Calculate these probabilities and write them using correct notation. Use \(A\) for belief in afterlife and \(B\) for college science class.
Belief in afterlife | |||
---|---|---|---|
Taken a college science class | Yes | No | Total |
Yes | 2702 | 634 | 3336 |
No | 3722 | 837 | 4559 |
Total | 6424 | 1471 | 7895 |
Data from General Social Survey
\(P(\text{belief in afterlife})\) = ?
\(P(A) = \frac{6424}{7895}\)
\(P(A)\) represents a marginal probability. So do \(P(B)\), \(P(A^C)\) and \(P(B^C)\). In order to calculate these probabilities we could only use the values in the margins of the contingency table, hence the name.
Belief in afterlife | |||
---|---|---|---|
Taken a college science class | Yes | No | Total |
Yes | 2702 | 634 | 3336 |
No | 3722 | 837 | 4559 |
Total | 6424 | 1471 | 7895 |
Data from General Social Survey
\(P(\text{belief in afterlife and taken a college science class})\) = ? \(P(A \text{ and } B) = P(A \cap B) = \frac{2702}{7895}\)
\(P(A \cap B)\) represents a joint probability. So do \(P(A^c \cap B)\), \(P(A\cap B^c)\) and \(P(A^c\cap B^c)\).
Note that \(P(A\cap B) = P(B\cap A)\). Order does not matter.
Belief in afterlife | |||
---|---|---|---|
Taken a college science class | Yes | No | Total |
Yes | 2702 | 634 | 3336 |
No | 3722 | 837 | 4559 |
Total | 6424 | 1471 | 7895 |
Data from General Social Survey
\(P(\text{belief in afterlife given taken a college science class})\) = ? \(P(A \text{ given } B) = P(A | B) = \frac{2702}{3336}\)
\(P(A|B)\) represents a conditional probability. So do \(P(A^c|B)\), \(P(A | B^c)\) and \(P(A^c|B^c)\). In order to calculate these probabilities we would focus on the row or the column of the given information. In a way we are reducing our sample space to this given information only.
\(P(\text{attending every class | getting an A}) \neq\) \(P(\text{getting an A | attending every class})\)
The order matters!
\(P(A^C)\) is called complement of event A and represents the probability of selecting someone that does not believe in afterlife.
The notes for this lecture are derived from Section 2.1 of the Bayes Rules! book
Priya, a data science student, notices that her college’s email server is using a faulty spam filter. Taking matters into her own hands, Priya decides to build her own spam filter. As a first step, she manually examines all emails she received during the previous month and determines that 40% of these were spam.
Let event B represent an event of an email being spam.
\(P(B) = 0.40\)
If Priya was to act on this prior what should she do about incoming emails?
Since most email is non-spam, sort all emails into the inbox.
This filter would certainly solve the problem of losing non-spam email in the spam folder, but at the cost of making a mess in Priya’s inbox.
Priya realizes that some emails are written in all capital letters (“all caps”) and decides to look at some data. In her one-month email collection, 20% of spam but only 5% of non-spam emails used all caps.
Using notation:
\(P(A|B) = 0.20\)
\(P(A|B^c) = 0.05\)
Which of the following best describes your posterior understanding of whether the email is spam?
event | \(B\) | \(B^c\) | Total |
---|---|---|---|
probability | 0.4 | 0.6 | 1 |
Looking at the conditional probabilities
\(P(A|B) = 0.20\)
\(P(A|B^c) = 0.05\)
we can conclude that all caps is more common among spam emails than non-spam emails. Thus, the email is more likely to be spam.
Consider likelihoods \(L(.|A)\):
\(L(B|A) := P(A|B)\) and \(L(B^c|A) := P(A|B^c)\)
When \(B\) is known, the conditional probability function \(P(\cdot | B)\) allows us to compare the probabilities of an unknown event, \(A\) or \(A^c\), occurring with \(B\):
\[P(A|B) \; \text{ vs } \; P(A^c|B) \; .\]
When \(A\) is known, the likelihood function \(L( \cdot | A) := P(A | \cdot)\) allows us to compare the likelihoods of different unknown scenarios, \(B\) or \(B^c\), producing data \(A\):
\[L(B|A) \; \text{ vs } \; L(B^c|A) \; .\]
Thus the likelihood function provides the tool we need to evaluate the relative compatibility of events \(B\) or \(B^c\) with data \(A\).
\(P(B|A) = \frac{P(A\cap B)}{P(A)}\)
\(P(B|A) = \frac{P(B)P(A|B)}{P(A)}\)
\(P(B|A) = \frac{P(B)L(B|A)}{P(A)}\)
Recall Law of Total Probability,
\(P(A) = P(A\cap B) + P(A\cap B^c)\)
\(P(A) = P(A|B)P(B) + P(A|B^c)P(B^c)\)
\(P(B|A) = \frac{P(B)L(B|A)}{P(A|B) P(B)+P(A|B^c) P(B^c)}\)
\(P(B) = 0.40\)
\(P(A|B) = 0.20\)
\(P(A|B^c) = 0.05\)
\(P(B|A) = \frac{0.40 \cdot 0.20}{(0.20 \cdot 0.40) + (0.05 \cdot 0.60)}\)
event | \(B\) | \(B^c\) | Total |
---|---|---|---|
prior probability | 0.4 | 0.6 | 1 |
posterior probability | 0.72 | 0.18 | 1 |
event | \(B\) | \(B^c\) | Total |
---|---|---|---|
prior probability | 0.4 | 0.6 | 1 |
likelihood | 0.20 | 0.05 | 0.25 |
posterior probability | 0.72 | 0.18 | 1 |
\[P(B |A) = \frac{P(B)L(B|A)}{P(A)}\]
\[\text{posterior} = \frac{\text{prior}\cdot\text{likelihood}}{\text{marginal probability}}\]
\[\text{posterior} = \frac{\text{prior}\cdot\text{likelihood}}{\text{normalizing constant}}\]