Reproducible research

Great article in the NY Times about Keith Baggerly’s push for open data and reproducible analysis of results.  Curiously enough, one of my students hit upon a tiny example of the problem this semester:

…I decided to run my own descriptive statistics on their data sets to make sure their reports were all represented in the same way (and thus, I could compare them to my own results). Good thing I did! It turns out that some of the means reported in the articles were incorrect due to an error on the part of the researcher during data entry. It seems that they used – instead of 0 when a group did not have, say, an infant or a juvenile male in a group, which resulted in the means of infant or juvenile male being figured based on a number less than N. This is a problem because they were reported as the means of N number of groups and the overall … ratio became inflated as the result.

Science is all about doing reproducible experiments, but I fear many researchers lose track of that principle.

Baggerly is a big proponent of using tools like Sweave to make analysis transparent.

Tip from R Bloggers.


