When all you have is a hammer…

…everything looks like a nail.

Daniel Lakens, the 20% Statistician, takes a rare but easy shot at statisticians and null hypothesis significance testing.

Our statistics education turns a blind eye to training people how to ask a good question. After a brief explanation of what a mean is, and a pit-stop at the normal distribution, we jump through as many tests as we can fit in the number of weeks we are teaching. We are training students to perform tests, but not to ask questions

He defines

…the Statisticians’ Fallacy: Statisticians who tell you ‘what you really want to know’, instead of explaining how to ask one specific kind of question from your data.

My favorite is the two-tailed test of the difference of two means, which can provide evidence that the two are different, but not that they are (nearly) the same.  My runners up are goodness-of-fit tests, which do no such thing.  Sometimes I feel like I’m selling the researcher’s version of Snake Oil, rather than teaching sound data analysis and interpretation.

Lakens closes with an excellent addendum, a reference to David Hand’s Deconstructing Statistical Questions,  which goes into much more detail.

Advertisements

Seven Pillars

Wisdom hath built her house, she hath hewn out her seven pillars.  –Proverbs 9:1

I just finished Stephen Stigler’s The Seven Pillars of Statistical Wisdom, and I’m daunted–and embarrassed that I waited so long to read it.  Stigler gives us a structure and taxonomy to statistical thinking* that gives us the “big picture” of statistics.

StiglerSevenPillars

Quite a difference from the descriptives-to-inference-to-models approach that most textbook authors follow.  This is making me rethink how I approach my introductory courses, especially those for statistics majors.  I’m starting with a baby step: adding the (inexpensive, paperbound) book as a required reading in my statistical research methods class.

*the 7 pillars: aggregation, information, likelihood, intercomparison, regression, design, and residual (and that’s just the table of contents!)

Houston, We Have a Solution

heb-disaster

Long-time south Texas residents swear by the H-E-B grocery chain for value, selection, quality, and always being well-stocked.  These guys are supply-chain ninjas; we see groceries, they see a logistics network.  And they always step up in emergencies; Houston may be their finest hour to date.

Tip from American Digest.

Stanford Invents AI Gaydar, Flubs Write-Up

Yilun Wang and Michal Kosinsksi, researchers at Stanford’s Graduate School of Business, have developed a neural-net classifier that purportedly detects sexual orientation (in caucasians).
FacialRecognition
The authors report an avalanche of experimental results, and claim the classifier can “correctly distinguish between gay and straight men 81% of the time, and 74% for women.”  OK, that’s the sensitivity of the gadget.  What about specificity, i.e. how well does it correctly distinguish folks who are not-so-gay?  Without that second number (as well as an estimate of prevalance), it’s not possible to estimate the false positive and false negative rates for this thing.  Very important, if some of the more Orwellian applications mentioned by the authors come to pass.
I give the authors a “C,” for incomplete work.
Update: Dan Simmons, writing at the Andrew Gelman blog, writes a rambling, fascinating takedown of this “research,” from both the scientific and MSM points of view.  Based on just the statistical problems, I’m changing the grade to a “D-.”

R Tutorial: the non-linear equation solver

Need a numerical solution to simultaneous non-linear equations?  The nleqslv package is just what you’re looking for!  The coding required is minimal; just define the equations you want solved in a function, set some initial values, and let ‘er rip.

Here’s an example that uses the method of moments to estimate the parameters of a beta-binomial distribution.

Reports of its death are greatly exaggerated

The ability of statistics to accurately represent the world is declining. In its wake, a new age of big data controlled by private companies is taking over – and putting democracy in peril.

begins William Davies tale of woe in the Guardian.  Unfortunately, he confuses credible statistics with modern state-istics*; and seems impervious to the idea that Joe Sixpack has wised up to the fact that there are “lies, damned lies, and statistics,” and that most of these are peddled by the Leviathan State and its corporate cronies.  Usually to Joe’s detriment.

Statistics in industry and scientific research is doing quite well, thank you.  The Big Data movement is still immature and riddled with snake-oil salesmen; it will eventually spot them, possibly by applying its methodologies reflexively.

Tip from that same O’Reilly Newsletter.  Finally, I got on a sucker list that’s interesting!

*Where did you think the word came from?

Update:  Briggsy holds much the same opinion as I do, but expresses it more eloquently.

Multiple Comparisons, Made Easy

Adrian Colyer at the morning paper, takes a stab at explaining the problem with p-values and multiple comparisons.  He shoots!  He scores!  The crowd* goes wild!

p-value-wikipedia

Tip from an O’Reilly Daily Newsletter, which I found languishing in Clutter purgatory.

*OK, the crowd of two or three statistics lecturers who struggle to explain the multiple comparison problem.