Stanford Invents AI Gaydar, Flubs Write-Up

Yilun Wang and Michal Kosinsksi, researchers at Stanford’s Graduate School of Business, have developed a neural-net classifier that purportedly detects sexual orientation (in caucasians). The authors report an avalanche of experimental results, and claim the classifier can “correctly distinguish between gay and straight men 81% of the time, and 74% for women.”  OK, that’s the sensitivity of the gadget.  What about specificity, i.e. how well does it correctly distinguish folks who are not-so-gay?  Without that second number (as well as an estimate of prevalance), it’s not possible to estimate the false positive and false negative rates for this thing.  Very … Continue reading Stanford Invents AI Gaydar, Flubs Write-Up

R Tutorial: the non-linear equation solver

Need a numerical solution to simultaneous non-linear equations?  The nleqslv package is just what you’re looking for!  The coding required is minimal; just define the equations you want solved in a function, set some initial values, and let ‘er rip. Here’s an example that uses the method of moments to estimate the parameters of a beta-binomial distribution. Continue reading R Tutorial: the non-linear equation solver

Reports of its death are greatly exaggerated

The ability of statistics to accurately represent the world is declining. In its wake, a new age of big data controlled by private companies is taking over – and putting democracy in peril. begins William Davies tale of woe in the Guardian.  Unfortunately, he confuses credible statistics with modern state-istics*; and seems impervious to the idea that Joe Sixpack has wised up to the fact that there are “lies, damned lies, and statistics,” and that most of these are peddled by the Leviathan State and its corporate cronies.  Usually to Joe’s detriment. Statistics in industry and scientific research is doing … Continue reading Reports of its death are greatly exaggerated

Multiple Comparisons, Made Easy

Adrian Colyer at the morning paper, takes a stab at explaining the problem with p-values and multiple comparisons.  He shoots!  He scores!  The crowd* goes wild! Tip from an O’Reilly Daily Newsletter, which I found languishing in Clutter purgatory. *OK, the crowd of two or three statistics lecturers who struggle to explain the multiple comparison problem. Continue reading Multiple Comparisons, Made Easy

Rating a Published Clinical Trial…

…can be done in 10 minutes or less, using the Jadad score.  There’s a full explanation in the original paper,  but suffice it to say, it’s pretty easy to identify sketchy studies using this method.  Aaron Carroll, writing in the New York Times, shows how this affects the credibility of nutrition research.  For those who want to try this at home, here’s the scorecard from the paper: Was the study described as randomized? (YES/NO) Was the study described as double blind? (YES/NO) Was there a description of withdrawals and dropouts? (YES/NO) Give 1 point for each YES, and 0 points … Continue reading Rating a Published Clinical Trial…

R Tutorial: Correlation

Fisher’s iris dataset is the basis for this extended example in the calculation and visualization of correlations.  The ggpairs() function gives an impressive coded scatterplot matrix.  And an old friend makes a last-minute cameo appearance. Update:  Dirk Eddelbuettel just released tint 0.0.3 (tint is not Tufte) with some nifty examples.  I wanted to try it out, so I’ve updated the example using tint and added two margin plots to illustrate the Simpson’s Paradox  situation.  Tip from R Bloggers. Continue reading R Tutorial: Correlation