Volker's Teaching Business Statistics Blog

Saturday, February 2, 2013

AT LEAST ONE... NOT ALL...?

At least one… not all...?

I often saw that some students have a problem understanding questions that involve “at least one” or “not all”. The first time that this really becomes an issue is when independence it introduced. Here is how I graphically explain that concept:

Let’s assume four independent events:

P(Andy graduates within 4 years) = P(A) = 60%

P(Beth falls asleep in class) = P(B) = 20%

P(Chris will have sunshine during his wedding this summer) = P(C) = 70%

P(Derrick will beat me in racquetball tonight) = P(D) = 10%

Questions that often cause trouble are:

“What is the chance that not all of these events happen?”

which is the same as

“What is the chance at most 3 events happen?”

“What is the chance at least one of these events happen?”

When my students come up with all kinds of crazy answers of how to answer this, I show them the following graph:

By virtue of independence, P(ALL) = P(A)*P(B)*P(C)*P(D) = 0.6*0.2*0.7*0.1 = 0.84%

and since ALL and NOT ALL are complements(not opposites!), P(NOT ALL) = 100% - P(ALL) = 99.16%

Similarly P(NONE) = P(A̅)*P(B̅)*P(C̅)*P(D̅) = (1-0.6)*(1-0.2)*(1-0.7)*(1-0.1) = 8.64%

and since NONE and AT LEAST ONE are complements, P(AT LEAST ONE) = 100% - P(NONE) = 91.36%

If students don’t believe that the rule of complements (P(A) = 1-P(A̅) is useful, it is time to show them a four-event Venn diagram:

The first one just shows the four Events: A, B, C, D.

The second shows for all the 16 combinations of events happening and not happening how many actually happen.

Let’s go back to P(ALL) which means A, B, C and D must happen – that are is indicated with the “4” in the right diagram . Then P(NOT ALL) is everything else – all 15 combinations of events happening and not happening (Those areas indicated with a 0, 1, 2 or 3). Since they are all mutually exclusive outcomes, nothing prevents us from computing their respective probabilities, but if asked P(NOT ALL), it is much easier to acknowledge that

P(NOT ALL) = 100%-P(ALL)

The same logic holds for P(AT LEAST ONE). At least one event happening (so 1, 2, 3 or 4) is everything but the outside of the eclipes, so again there would be 15 different combinations of events happening and not happening to be accounted for. Instead it is much more time efficient to compute

P(AT LEAST ONE) = 100%-P(NONE)

Monday, January 28, 2013

SCALES OF DATA AND PERMISSIBLE DESCRIPTIVE STATISTICS

The table below shows different descriptive measure and the scale for which they are permissible. Binary variables are technically on the nominal scale, but allow for some descriptive measures that other nominal scaled variables don’t.

	Binary	Nominal	Ordinal	Interval	Ratio
Percentiles	NO	NO	YES	YES	YES
Mean	YES[1]	NO	NO[2]	YES	YES
Median	NO	NO	YES	YES	YES
Mode	YES [3]	YES	YES	YES[4]	YES[4]
Minimum and Maximum	NO	NO	YES	YES	YES
Range	NO	NO	NO	YES	YES
Standard Deviation	YES	NO	NO	YES	YES
Variance	YES	NO	NO[2]	YES	YES
Interquartile Range	NO	NO	NO	YES	YES

[1] It is the proportion

[2] But often done for Likert scales and in the absence of higher quality data

[3] It is the absolute majority

[4] But useless if there are too many different values

Tuesday, November 27, 2012

A TOTALLY CRAZY STATS PROBLEM

I am a participant in a football betting pool where every participant picks 10 football games (NCAA or NFL) each week. This Saturday I picked 8 of 10 college football games correctly (I hardly ever pick NFL games) which made me wonder about my chances were to win this weeks’ betting pool. We also have a tie breaker in place (the total points scored in the Monday night game), but since only the over/under and not the standard deviation of points scored is known, I was not able to solve tie breaker scenarios. Also I assumed that each team has a 50% chance of covering the spread (all spreads are set to x ½ points so there are no pushes)

The table below shows the standing Saturday night when all NCAA games were over. 14 of my fellow pool participants (names Alf to Nick) still had a chance to catch up or overtake me (10-max wins is the number of games they had wrong already) and there were 13 NFL games to be played this weekend.

What is the chance that I win outright (meaning no one else got 8 correct picks)?

What is the chance that I ended up in a tie for first place (meaning at least one other player has 8 correct picks and no one has more than 8 correct picks)?

What is the chance I was guaranteed a top three finish?

What is the lowest place I could have finished after the tie breaker is decided (assuming I could finish last in the tie breaker)

Hint: The chance that Alf wins outright is 15.94% - If you cannot match this number you likely have it all wrong!

Name

max

wins

CLE

PHI

JAC

CIN

NYJ

IND

CHI

NOR

BAL

wins

far

ATL

DAL

WAS

DET

CAR

HOU

STL

DEN

OAK

PIT

Volker

Alf

CIN

IND

DEN

NOR

Bert

ATL

WAS

IND

BAL

Chris

WAS

CIN

NYJ

NOR

BAL

Doe

DAL

WAS

HOU

CIN

STL

IND

DEN

Eric

CLE

PHI

DET

JAC

NYJ

BAL

Fred

ATL

WAS

CIN

NYJ

DEN

NOR

BAL

Gino

CLE

PHI

CIN

IND

CHI

NOR

Herb

CAR

HOU

Ina

WAS

CIN

STL

NOR

Jeff

ATL

HOU

Kyle

WAS

IND

BAL

Liam

DAL

IND

DEN

NOR

Matt

WAS

STL

DEN

BAL

Nick

WAS

CIN

DEN

NOR

BAL

The link to the pdf is here