Saturday, February 2, 2013

AT LEAST ONE... NOT ALL...?


At least one… not all...?

I often saw that some students have a problem understanding questions that involve “at least one” or “not all”. The first time that this really becomes an issue is when independence it introduced. Here is how I graphically explain that concept:

Let’s assume four independent events:
P(Andy graduates within 4 years) = P(A) = 60%
P(Beth falls asleep in class) = P(B) = 20%
P(Chris will have sunshine during his wedding this summer) = P(C) = 70%
P(Derrick will beat me in racquetball tonight) = P(D) = 10%

Questions that often cause trouble are:
“What is the chance that not all of these events happen?”
which is the same as
“What is the chance at most 3 events happen?”

or

“What is the chance at least one of these events happen?”

When my students come up with all kinds of crazy answers of how to answer this, I show them the following graph:


By virtue of independence, P(ALL) = P(A)*P(B)*P(C)*P(D) = 0.6*0.2*0.7*0.1 = 0.84%
and since ALL and NOT ALL are complements(not opposites!), P(NOT ALL) = 100% - P(ALL) = 99.16%   
Similarly P(NONE) = P(A̅)*P(B̅)*P(C̅)*P(D̅) = (1-0.6)*(1-0.2)*(1-0.7)*(1-0.1) = 8.64%
and since NONE and AT LEAST ONE are complements, P(AT LEAST ONE) = 100% - P(NONE) = 91.36% 

If students don’t believe that the rule of complements (P(A) = 1-P(A̅) is useful, it is time to show them a four-event Venn diagram:
The first one just shows the four Events: A, B, C, D.
The second shows for all the 16 combinations of events happening and not happening how many actually happen.


Let’s go back to P(ALL) which means A, B, C and D must happen – that are is indicated with the “4” in the right diagram . Then P(NOT ALL) is everything else – all 15 combinations of events happening and not happening (Those areas indicated with a 0, 1, 2 or 3). Since they are all mutually exclusive outcomes, nothing prevents us from computing their respective probabilities, but if asked P(NOT ALL), it is much easier to acknowledge that
P(NOT ALL) = 100%-P(ALL)

The same logic holds for P(AT LEAST ONE). At least one event happening (so 1, 2, 3 or 4) is everything but the outside of the eclipes, so again there would be 15 different combinations of events happening and not happening to be accounted for. Instead it is much more time efficient to compute

P(AT LEAST ONE) = 100%-P(NONE)

Monday, January 28, 2013

SCALES OF DATA AND PERMISSIBLE DESCRIPTIVE STATISTICS


The table below shows different descriptive measure and the scale for which they are permissible. Binary variables are technically on the nominal scale, but allow for some descriptive measures that other nominal scaled variables don’t.


Binary
Nominal
Ordinal
Interval
Ratio
Percentiles
NO
NO
YES
YES
YES
Mean
YES[1]
NO
NO[2]
YES
YES
Median
NO
NO
YES
YES
YES
Mode
YES [3]
YES
YES
YES[4]
YES[4]
Minimum and Maximum
NO
NO
YES
YES
YES
Range
NO
NO
NO
YES
YES
Standard Deviation
YES
NO
NO
YES
YES
Variance
YES
NO
NO[2]
YES
YES
Interquartile Range
NO
NO
NO
YES
YES




[1] It is the proportion
[2] But often done for Likert scales and in the absence of higher quality data
[3] It is the absolute majority
[4] But useless if there are too many different values

Tuesday, November 27, 2012

A TOTALLY CRAZY STATS PROBLEM


I am a participant in a football betting pool where every participant picks 10 football games (NCAA or NFL) each week. This Saturday I picked 8 of 10 college football games correctly (I hardly ever pick NFL games) which made me wonder about my chances were to win this weeks’ betting pool. We also have a tie breaker in place (the total points scored in the Monday night game), but since only the over/under and not the standard deviation of points scored is known, I was not able to solve tie breaker scenarios. Also I assumed that each team has a 50% chance of covering the spread (all spreads are set to x ½ points so there are no pushes)

The table below shows the standing Saturday night when all NCAA games were over. 14 of my fellow pool participants (names Alf to Nick) still had a chance to catch up or overtake me (10-max wins is the number of games they had wrong already) and there were 13 NFL games to be played this weekend.

What is the chance that I win outright (meaning no one else got 8 correct picks)?
What is the chance that I ended up in a tie for first place (meaning at least one other player has 8 correct picks and no one has more than 8 correct picks)?
What is the chance I was guaranteed a top three finish?
What is the lowest place I could have finished after the tie breaker is decided (assuming I could finish last in the tie breaker)
Hint: The chance that Alf wins outright is 15.94% - If you cannot match this number you likely have it all wrong!

Name
max
wins
AZ
CLE
PHI
GB
TB
JAC
CIN
NYJ
IND
SD
CHI
NOR
BAL

wins
so
AT
AT
AT
AT
AT
AT
AT
AT
AT
AT
AT
AT
AT


far
ATL
DAL
WAS
DET
CAR
HOU
KC
STL
NE
DEN
SF
OAK
PIT
Volker
8
8













Alf
10
5



GB


CIN

IND
DEN

NOR

Bert
9
3
ATL

WAS
GB
TB



IND



BAL
Chris
10
1


WAS
GB
TB

CIN
NYJ
NE

SF
NOR
BAL
Doe
10
1

DAL
WAS
GB
TB
HOU
CIN
STL
IND
DEN



Eric
10
0
AZ
CLE
PHI
DET

JAC
KC
NYJ
NE
SD


BAL
Fred
10
0
ATL

WAS
GB
TB

CIN
NYJ

DEN
SF
NOR
BAL
Gino
10
0
AZ
CLE
PHI
GB
TB

CIN

IND
SD
CHI
NOR

Herb
8
4
AZ



CAR
HOU




SF


Ina
8
2


WAS
GB
TB

CIN
STL



NOR

Jeff
8
6
ATL




HOU







Kyle
8
5


WAS





IND



BAL
Liam
8
4

DAL






IND
DEN

NOR

Matt
8
4


WAS




STL

DEN


BAL
Nick
8
2


WAS
GB


CIN


DEN

NOR
BAL

The link to the pdf is here