Category Archives: math and statistics puzzles

Generating Irrational Probabilities

Over at Jan Nordgreen’s excellent Thnik Again! blog, frequent commenter Richard Sabey posed the question “I want to throw a fair coin and use the result to make some decision with probability 1/sqrt(2). How?”

This was initially a stumper; not a surprise since Richard is a pretty formidable mathematician who has more number theory in his left pinky than I do in my whole head.  Nevertheless, I chewed on this problem for a few days and set it aside until Jan decided to egg folks on by repeating the question.  While my solution (and Richard’s–he had one ready, of course) are posted at the link, I want to explain the Irrational Probability Algorithm here a bit more clearly.

The general problem is to generate a sequence of Bernoulli (binary, yes/no) events such that the long run probability (or proportion) of a YES event is some irrational number in the interval (0, 1), using only repeated tosses of a fair coin, which has a rational probability for heads or tails of 1/2.  The difficulty is that no irrational number can be expressed as ratio (that’s what a mathematician means by “irrational”), so there is no counting technique that will do the job with frequencies of heads or tails.

The solution comes from recognizing this inapplicability of counting methods.  Instead, the solution takes advantage of the fact that, while irrational numbers cannot be precisely expressed as ratios, they can be approximated by ratios to any desired degree of precision (“any” means as far as you want to go, a billion decimal places is certainly feasible).  So, express the irrational target proportion as a binary fraction (the binary point followed by a sequence of zeroes and ones) to the desired precision, and use it as the probability generator. Richard’s algorithm does this implicitly, using the most significant digit of a floating point representation that is repeatedly trimmed and doubled, mine uses the binary expansion explicitly as a series of digits.

Here are our two implementations:

Algorithm S:

 x <- irrational 
do { 
    x <- 2x 
    t <- cointoss() 
    if (t == TAILS) { 
       if (x < 1) 
          return(NOEVENT) 
       else 
          x <- x-1 
    } else { 
      if (x >=1) 
          return(EVENT) 
    } enddo 

Algorithm A:

 
digits <- binaryExpansion(irrational) 
digitpointer <- 0 
repeat 
    digitpointer <- digitpointer + 1 
    t <- cointoss() 
until (t NOT= digits[digitpointer] ) 
if (t = HEADS) 
   return(NOEVENT) 
else 
    return(EVENT) 

Algorithm S is limited by the number of digits available in the floating point representation of the irrational number, Algorithm A has a similar limitation in the number of digits in the (explicit) binary representation.  However, this isn’t as big a deal as it might first appear…

How fast does this thing run?  The basic loop in both algorithms is essentially checking to see if the sequence of coin tosses match the binary expansion; if so, it continues indefinitely.  The probability of a match at each step is just 1/2, so this is just a geometric series, with an expected length of 2.  The probability of requiring more than d steps is 2-d.  E.g. the chance of requiring more than 10 steps is less than 1 in a million (210 ≈ 103)

This WILL appear on a future exam

This WILL appear on a future exam

RTFQ.

Tip from Flowing Data.

Love the message, hate the graphic

Meg McLain tells a great story about the relative risk of being killed by terrorists in the US.  Unfortunately, she comes up with this baffling graphic which appears to use the sort of number scales beloved of President Obama’s budget speechwriters:

Sure, there’s a scale problem, when the multipliers range from 6 to 17,600, but generations of scientists and engineers have handled that with a logarithmic scale:

This still doesn’t give the compressed range that MM’s chart shows.  Aha!  Perhaps she’s using the little-known log-log scale (beloved by statisticians who deal with generalized linear models)–let’s see:

Pretty close.  But how would any reasonable person expect a layperson to understand this exotic measurement scale?

Tip from the Knowledge Problem.  And from Thnik Again!

Update (28 September).  In her comments section, Dr Kiesling admonishes me not to “go all Tufte” and tosses out the phrase “mathematically pedant.”    How flattering!  I bet LK was the kind of girl who slugged guys in junior high to get their attention.

The Dutch Book, made simple

Briggsy gives a dead-simple explanation for spotting and profiting from a Dutch Book set of odds.  Even an undergrad can do it!

A fair 3-way choice using coin tosses

I’d like to make a fair and random choice among 3 alternatives, but the only randomizing device I have available is a coin to toss.  Worse yet, I suspect the coin may be biased.  What to do?

If you gamble, know when to stop

It’s explained in gory detail here.

Tip from Thnik Again!

Optimal Grilling

The Rapid Steak Algorithm.  OK, so it’s really an introductory article about operations research, which we at UTSA, in our infinite wisdom, call management science, to keep the engineers from getting their grubby paws on it.

Writer Sanjay Saigal (is that some kind of Houston name?) promises more of this good stuff.

Tip from the Instapundit.

Not quite ready for prime time

This sounds like a great innovation, using Raman spectroscopy to detect melanoma.  But let’s look a little deeper at the preliminary results.

First, the developers claim that in 274 known cases of melanoma, the device detected all 274 of them, which gives and estimated test sensitivity of 100%, which is patent bullshit; nothing in this life is guaranteed perfect.  A reasonable Bayesian shrinkage estimator might be 275/276 = 0.9964, which is still pretty good.  Second, the developers conveniently omitted any statistics on specificity (how well cases of NO melanoma are correctly identified); I’m sure this was simple oversight*.  Third, and finally, there’s no mention of the prevalence of melanoma in the general population.  Without all three of these numbers, we have no way of evaluating the utility of the device.

But what the heck, let’s do a back of the envelope** calculation.  We have the estimate for sensitivity, and we’ll be optimistic and assume that the other 726 subjects were all melanoma-free and tested negative, so we’ll get a shrinkage estimate of 727/728=0.9986  (which should make it a pretty good test).  Then look up the prevalence of melanoma (also called incidence rate) to get 1/5074 = 0.000197.  Then plug into Bayes’ Theorem to get the posterior predictive value:

PPV = (sensitivity x prevalence) / (sensitivity x prevalence + (1-specificity)x(1-prevalence)) = 0.1251

So if you test positive with this gizmo, you have about a 12.5% chance ( 1 in 8 ) of having melanoma.  I think we need more test data before buying any stock.

Tip from the Instapundit, who should calm down.

* I’m sure there’s an Easter Bunny, too.

** OK, back of the spreadsheet

Update (8 February).  Jan Nordgren has the perfect quote to describe this situation.  I wish I had used it for my title.

Thnik again! again

Jan Nordgreen’s Thnik Again! blog is back on the air.  Get your Thinking Caps ready.

Get logical!

Everything you wanted to know about logical fallacies in one handy chart.

Tip from the Corner.