Posts tagged with openness

Figshare is an online database for storing figures, data, and papers. It’s been around for a while, but recently its really been taking off.

–You can post images, or entire paper PDFs. Also datasets.

–It’s citable. Useful for negative data?

–It’s free.

–If you want to keep it private, the limit is 1 GB. If you leave it public, there is no storage limit.

Post publication peer review has yet to really take off, but Labrigger hopes it does. One of the newest sites is PubPeer.

PubPeer does it right, allowing for anonymity. This is important in order to obtain candid opinions from the scientific community. The bland and boring reviews at F1000 show what comes out when anonymity is not allowed.

An addition benefit is an increased quantity of participation: compare the content on Wikipedia and Scholarpedia. The latter has non-anonymous authorship, and although the articles are high quality, the quantity is very poor compared to Wikipedia. Anonymity lowers the threshold for making casual additions to online resources.

PubPeer is not a free-for-all, however, it is trying to keep the comments constructive by only allowing authors of papers to create accounts and comment.

A similar effort, The Third Reviewer, seems to have been abandoned. I get the impression that the PubPeer platform is a bit better automated than The Third Reviewer, so perhaps it has more staying power.

One concern is that anonymity will mean the comments will be dominated by cranks and disgruntled colleagues, and might even devolve to the level of YouTube comments. However, as evidence to the contrary, the comments on the short-lived Third Reviewer site were all fairly constructive and generally positive, even when pointing out technical flaws or other issues (examples 1, 2, and 3).

Last Monday, Labrigger covered HelioScan, a LabVIEW-based, two-photon laser scanning microscopy software suite.

Marcel van ‘t Hoff (left) and Dominik Langer (right) are the two main developers of HelioScan. They were kind enough to answer some interview questions for Labrigger.

LR: Are there special considerations you had to make when designing HelioScan? What measures did you take in LabVIEW to support the level of modularity?
The main design considerations were to design for future flexibility and for multiple developers to easily collaborate without interfering. The idea was then to assemble the software at run-time from components that are dynamically loaded based on configuration settings. LabVIEW classes are the ideal candidates for such components:

First, they naturally bundle data and corresponding VIs into separate entities.

Second, LabVIEW allows to load and instance classes at run-time based on the class file path (which can be specified in a configuration file).

Third, using class inheritance and polymorphy, we can substitute any two components, as long as interacting other classes refer to them in terms of the same abstract base class. For example, a particular ScanHead component can handle any incoming Trajectory component as long it is a subclass of a particular expected abstract base class.

Fourth, by means of aggregation combined with the above principles, we can build up highly complex objects at run time.
An example: HelioScan may read from its main configuration file to load a particular ImagingMode. During its initialization, the loaded ImagingMode object reads its own configuration file, stating that – among other components – a particular ScanHead component is to be loaded. Once loaded, the ScanHead object reads a configuration guiding it to load a bunch of Scanner components of a particular type, etc.

A couple of others have implemented individual components, demonstrating that the distributed multideveloper scenario is working.

LR: In the near future, say 5 years, who will be maintaining the core HelioScan codebase and leading new developments?
We are currently in a transition phase. First, both Marcel and I just left Fritjof Helmchen’s lab. While Marcel is going to continue as a postdoc in the field and will still use HelioScan, I will leave the academic field. Second, we are currently working on a major new version of HelioScan that we expect to bring about another quantum leap in flexibility. The new version will be based on a framework for distributed cross-platform signal processing (Murmex) that we are developing. We have been developing this framework in our free time and plan to continue doing so. Murmex as the new core will allow HelioScan components to be written not only in LabVIEW, but various other well-known programming languages. We believe that this will lower the entry barrier for new developers enough for HelioScan to become a project that can be handed over to the community. New components will be collected on a central repository, given that they meet some basic quality standards. This will be similar to certain ImageJ distributions, such as FIJI.

LR: Do you have design principles that you follow to ensure that your LabVIEW programs don’t become spaghetti?
First, we try to stick as far as possible the style rules from “The LabVIEW Style Book” (Peter A. Blume, Prentice Hall) [Also covered here. -Labrigger]. That’s also the book we usually recommend to newcomers before they start to develop their own components.

Second, HelioScan is implemented using LabVIEW object-oriented programming (LVOOP), which naturally already structures the code by bundling related data and functionality into LabVIEW classes.

Third, we made very good experiences with the so-called XControls of LabVIEW. XControls are your own, re-usable user interface controls, which can provide arbitrarily complex functionality. For example, we made an XControl that allows to load and display multi-page TIFF files, where the user can scroll through the frames, draw different types of ROIs, load and save ROI seletions, and display file meta-information. On the VI block diagram, this whole functionality is represented by a single terminal, hiding all the underlying complexity from the developer.

Fourth, we heavily use a couple of design patterns to do certain things.

Fifth, the HelioScan framework enforces a lot of structure. When we develop a new component, we subclass an abstract class of the framework and override some of its methods. For example, when we need a new scan pattern for galvanometric mirrors, we subclass a generic Trajectory class. We override (among others) the initialise method of the class and put the code initialising the pattern exactly there (and nowhere else).

Read the rest of this entry »

If you’d like to read some deep, wide-ranging philosophical discussions about R, an open statistics software package, then I wonder why we’re friends. But since we are, here are some links, you psychopath:
“R is a programming language missing a GUI”
“R is really important to the point that it’s hard to overvalue it”
“…it’s object oriented rather than data record oriented…”

If you just want to get on with it, and run stats on your data, then keep reading.

I like R and have used it in a lot of my work. I recommend it. However, it is all command-line based (there are GUIs that can be applied to R, but I haven’t tried them, here’s one). As we all know, not everyone is spellbound by command-line interfaces. SOFA is an open statistics software package with a well thought out GUI, including database interface, chart generation, and more.

Hat tip to CSH.

Springer is starting a new open access journal called “Scio Cell Biology”. It’s intended to be the first of what they hope will be a whole family of “Scio”-branded journals. They’re a gold open access/libre publisher with a new business model that allows them to offer two excellent features:

1. No author fees.
No authors pay to publish. This includes big rich labs. No color charges, no publication fees, no author fees whatsoever. No institutional fees, society dues, or subscription fees either.

2. Reviewers get paid.
This is the first time I’ve seen anything like this. The amount they’ll get paid hasn’t been announced yet, but they’re hoping that this will let the editors go back repeatedly to the best, most constructive reviewers. They also hope that it will raise the prestige of the journal. The idea is that the reviewers are financially motivated to be as constructive as possible and so authors will want to have their papers reviewed by them.

So what’s their funding model? As a hint, here are some excerpts from their upcoming inaugural issue:

.

Nature just published a 4-page perspective article on the important of releasing the source code for programs used in scientific research. The authors emphasize the importance of reproducibility for results that depend on computation.

Labrigger has already covered releasing source code here, including the sympathetic CRAPL. In that license, it is acknowledged that the code offered is not pretty and no promise of support is offered (in fact, quite the opposite is promised). These are among the key concerns about releasing code. Perhaps the software relies on expensive, propriatery hardware and/or software. Perhaps the code is buggy as hell, thread hostile, devoid of error handling, and relies on core dumps as the main data output interface. Or perhaps the source code is poorly commented, or not commented at all, and all variable names are single characters. Perhaps the code is written in Whitespace with inline Brainfuck.

The authors spend a long time explaining that there is no substitute for releasing the source code. That is, pseudo code, mathematical, or natural language descriptions are never enough. Of course they’re right in principle, but I appreciate the alternative descriptions sometimes, so I wouldn’t want a source code release to replace those alternative descriptions. For example, I don’t want to have to sift through someone’s crap code just to find how they performed a specific bootstrap analysis. The description in the methods section should be sufficiently clear and detailed so that I can code it up myself.

One point the authors make is that errors can be detected when the entire source code is released. Sometimes, even commercial programs have bugs that change results. E.g., GraphPad had a rather unfortunate bug that resulted in data groups being flagged as significantly different when they weren’t.

Perhaps you’ve heard about the Elsevier Boycott aka the Cost of Knowledge. I’m not going to rehash the problem we all know so well. But here are a few links I wanted to share. Just fyi, Elsevier’s take home (EBITDA) is about $500 million/year. That’s about a quarter of what Starbucks makes. Or about 7-fold more than Newport Corporation makes. And as the graph above shows, a good chunk of the revenue is from their scientific publications. Making money is great. I only bring it up to highlight the magnitude of money that is being made. It’s not peanuts.

I’m inclined to believe that most publishers are well aware that their business model needs to evolve. I think most of them know that although their contribution to the scientific process is important, it is only a small part of the entire process now that distribution (and even basic typesetting) is quite trivial. I have this positive view because I’ve spoken personally with editors at top tier journals and this is the impression they left me with. They seemed very interested in finding new ways to contribute to the scientific process and increase the value of their services.

Apparently not all of the commercial scientific publishing world is so reasonable.

A DrugMonkey post pointed to this article by a UK publishing industry representative named Graham Taylor. He dissects it quite well. Here’s an excerpt:

From Graham Taylor’s article: “Public funds have not paid for the peer-reviewed articles that are based on research supported by agencies such as the National Institutes of Health (NIH). They have only paid for the research itself and whatever reports the researchers are required to submit to the agency.

DrugMonkey: Another falsehood, wrapped up with a disingenuous misdirecting belittlement. “Only” for the research? ONLY????? These publishers seem universally unaware that the VAST, VAST majority of the value of an academic article is the bloody research. The damn content. That is what has value. The fancy layout? That’s nice and all but we can do without that. The science is the thing. Trying to dismiss this as a minor contribution is…well…..that takes some serious chutzpah.

If publishers fail to adapt to the changes in technology that threaten their business model, then that’s a problem that scientists don’t necessarily have to worry about. There are plenty of new journals with excellent editorial oversight that are more than capable of taking up the slack. However, the big problem we all need to pay attention to is when publishers try to change laws to prevent the free distribution of research results. The current effort is a bill called the “Research Works Act”. As Michael Eisen describes in his blog, the bill is written to outlaw the free dissimination of taxpayer-funded research in order to protect the anachronistic business model of publishers like Elsevier.

The bill reads:

No Federal agency may adopt, implement, maintain, continue, or otherwise engage in any policy, program, or other activity that:
(1) causes, permits, or authorizes network dissemination of any private-sector research work without the prior consent of the publisher of such work; or
(2) requires that any actual or prospective author, or the employer of such an actual or prospective author, assent to network dissemination of a private-sector research work.

And the bill’s definition of “private sector” includes all NIH-funded research carried out at universities.

They are using intentionally misleading language to distinguish works funded by the government but carried out by a non-governmental agency as “private sector research”. Thus, under this bill, works funded by the NIH but carried at a University would be “private sector research”.

There are some open access journals that seem to have relativley loose editorial standards. And by “editorial” I mean “ethical”, and by “relatively loose”, I mean “no”. These publishers have been called “predatory” open access publishers. The idea is simple: solicit submissions via spam email, accept submissions, and then charge publication fees that more than cover the cost of your spamming operation. Here’s a list of predatory open access publishers. Richard Poynder did a very in depth story on this phenomenon. If you’re interested to know more, read the PDF linked to on this blog post. The bottom line is, it works. At least some people send papers to predatory open access journals and pay to have them published. And that’s why any email address you’ve used as a corresponding author will get inundated with spam from these outfits.

What’s new– to me at least– is what seems to be predatory micro funding for scientific research. Microfinance has been used for many years to get enterprises off the ground. More recently, groups like Kickstarter have developed web sites to finance proposed creative and technology projects. Kickstarter is cool. It’s all above the table, as far as I can tell, and has many success stories.

By contrast, the Open Source Science Project is sketchy as hell. Here’s the model: researchers post project proposals and funders browse and decide what they want to fund. This is very similar to Kickstarter. However, the business makes money by charging the researchers monthly subscription fees. At this point, I’m gone. That’s the only red flag I need. But there are other red flags too: zero success stories, endless attempts to look legit by association (a bunch of university and industry logos all over their site), the “Privacy Policy” and “Terms of Use” do not actually link to anything, and the identity of the people running the organization is not revealed anywhere. There is no evidence that anyone has ever had a project funded through their system.

Thorlabs and Newport have offered 3D models of their products for a long time. However, they’re typically in formats for expensive programs like SolidWorks and AutoCAD. In the past year or two, Newport has been slowly adding to their library of Google SketchUp models.

I still prefer SolidWorks, but I’m optimistic that I’ll eventually switch to SketchUp. Regardless, it’s nice to see a company supporting free tools.

ScanImage is an excellent software package for controlling 2p scopes. It’s free and open source. It’s been actively developed and released to the public since its inception. RIght now they personnel involved are trying to renew their funding. To help keep this resource actively developed and free, please fill out their survey. It’s very, very short. Don’t take the resource for granted. It takes a lot of salaried time to keep the development going and adding in new features.

By the way, ScanImage 3.8 (new features) and 4.0 (for ThorLabs scopes) are out now (3.5 and 3.6 are no longer supported; 3.7.1 is the current stable release) (link). If you haven’t already tried a new version of ScanImage out, you should. It doesn’t take too long and the feedback is very helpful. Don’t assume that everyone else is already sending in the same feedback.