Nature article on releasing source code

Nature just published a 4-page perspective article on the important of releasing the source code for programs used in scientific research. The authors emphasize the importance of reproducibility for results that depend on computation.

Labrigger has already covered releasing source code here, including the sympathetic CRAPL. In that license, it is acknowledged that the code offered is not pretty and no promise of support is offered (in fact, quite the opposite is promised). These are among the key concerns about releasing code. Perhaps the software relies on expensive, propriatery hardware and/or software. Perhaps the code is buggy as hell, thread hostile, devoid of error handling, and relies on core dumps as the main data output interface. Or perhaps the source code is poorly commented, or not commented at all, and all variable names are single characters. Perhaps the code is written in Whitespace with inline Brainfuck.

The authors spend a long time explaining that there is no substitute for releasing the source code. That is, pseudo code, mathematical, or natural language descriptions are never enough. Of course they’re right in principle, but I appreciate the alternative descriptions sometimes, so I wouldn’t want a source code release to replace those alternative descriptions. For example, I don’t want to have to sift through someone’s crap code just to find how they performed a specific bootstrap analysis. The description in the methods section should be sufficiently clear and detailed so that I can code it up myself.

One point the authors make is that errors can be detected when the entire source code is released. Sometimes, even commercial programs have bugs that change results. E.g., GraphPad had a rather unfortunate bug that resulted in data groups being flagged as significantly different when they weren’t.