Reports of its death are greatly exaggerated

The ability of statistics to accurately represent the world is declining. In its wake, a new age of big data controlled by private companies is taking over – and putting democracy in peril.

begins William Davies tale of woe in the Guardian.  Unfortunately, he confuses credible statistics with modern state-istics*; and seems impervious to the idea that Joe Sixpack has wised up to the fact that there are “lies, damned lies, and statistics,” and that most of these are peddled by the Leviathan State and its corporate cronies.  Usually to Joe’s detriment.

Statistics in industry and scientific research is doing quite well, thank you.  The Big Data movement is still immature and riddled with snake-oil salesmen; it will eventually spot them, possibly by applying its methodologies reflexively.

Tip from that same O’Reilly Newsletter.  Finally, I got on a sucker list that’s interesting!

*Where did you think the word came from?

Multiple Comparisons, Made Easy

Adrian Colyer at the morning paper, takes a stab at explaining the problem with p-values and multiple comparisons.  He shoots!  He scores!  The crowd* goes wild!


Tip from an O’Reilly Daily Newsletter, which I found languishing in Clutter purgatory.

*OK, the crowd of two or three statistics lecturers who struggle to explain the multiple comparison problem.

When Bayesian Statistics Broke into History

Wonderful article here about the Mosteller and Wallace analysis of the twelve Federalist Papers, the ones of disputed authorship–was it Madison or Hamilton who wrote them?  With a nice, easy-to-understand explanation of the Bayesian methodology they  used.

Aaron Burr insured that Hamilton took the secret to his grave.

Tip from Real Clear Life.

An end run around an impossible integral

Ever-insightful polymath John Cook shows how to integrate the Gaussian PDF, in less time than it takes to make breakfast.  The trick?  Coordinate transformations and the Jacobian are your friends.


A suitably-embellished version of Cook’s post will appear in my lecture notes in the Spring semester.  Thanks, J.C.


Rating a Published Clinical Trial…

…can be done in 10 minutes or less, using the Jadad score.  There’s a full explanation in the original paper,  but suffice it to say, it’s pretty easy to identify sketchy studies using this method.  Aaron Carroll, writing in the New York Times, shows how this affects the credibility of nutrition research.  For those who want to try this at home, here’s the scorecard from the paper:

  1. Was the study described as randomized? (YES/NO)
  2. Was the study described as double blind? (YES/NO)
  3. Was there a description of withdrawals and dropouts? (YES/NO)

Give 1 point for each YES, and 0 points for each NO, with no partial credit.  Then assess these

  • For question 1, GIVE 1 additional point if the method to generate the sequence of randomization was described and it was appropriate (table of random numbers, computer generated, etc.) Otherwise, DEDUCT 1 point if the method to generate the  sequence of randomization was described and it was inappropriate (patients were allocated alternately, or according to date of birth, hospital number,etc.)
  • For question 2, GIVE 1 addtional point if the method of double blinding was described and it was appropriate (identical placebo, active placebo, dummy, etc.).  Otherwise, DEDUCT 1 point if the study was described as double blind but the method of blinding was inappropriate (e.g., comparison of tablet vs. injection with no double dummy).

Hey, it’s not perfect, but then neither is the APGAR score, and look where that’s gotten us.

Tip from Andrew Gelman’s often contrarian Statistical Modeling, Causal Inference, and Social Science blog.


R Tutorial: Correlation

Fisher’s iris dataset is the basis for this extended example in the calculation and visualization of correlations.  The ggpairs() function gives an impressive coded scatterplot matrix.  And an old friend makes a last-minute cameo appearance.


Update:  Dirk Eddelbuettel just released tint 0.0.3 (tint is not Tufte) with some nifty examples.  I wanted to try it out, so I’ve updated the example using tint and added two margin plots to illustrate the Simpson’s Paradox  situation.  Tip from R Bloggers.