Posts tagged with dissemination

Writing a first draft is almost a completely different sport compared to editing and revising. It’s the most creative part of the writing effort. The work flow incorporates outlines, research notes, references, organization, re-organization, and more. I’ve recently been trying out a new program for writing first drafts: Scrivener.

Scrivener (Win and OSX) is a program designed to seemlessly combine word processing with several other writing tools including note organization, outlining, equation editing, and layout. It can work with EndNote (youtube), too.

It’s great to have all of these other tools right at hand and integrated. There’s less task-switching time.

The only shortcoming I can see is the lack of a “Track Changes” function like Word has. So it’s not great for collaboration, but it might reduce the time until the first draft gets done by reducing the time it takes to refer back to notes and other materials.

Oldest Reading hard copy tables of content. Pleasent, but impractical.
Older Checking out the updates for Index Medicus. Totally reasonable. In 1988.
Old Getting eTOCs emailed to you. Welcome to the year 2000.
Middle-aged Subscribing to RSS feeds for journals. Okay, but still needs filtering.
Present Automated, keyword-based filtering of RSS feeds. Better.

RSS feeds

E.g., here are the ones for Nature journals (including the AOPs) (link)
And for Science (link).
It doesn’t take much time to find feeds for the top 20 journals in your field. Feed links shouldn’t change frequently, but they can change.

Filtering RSS feeds

This is an early (2005) work in the field of filtering RSS feeds from journals: BaRF (Bioinformatics aggregated RSS feeds) is a tool for keeping up with bioinformatics articles across multiple journals’ RSS feeds.

Presently, there are a bunch of different ways to filter RSS feeds. Fascinatingly, they’re all inadequate. So, although this is a good approach, I’m not sure it’s worth the time to set up and maintain just yet.

At any rate, if you want to take a stab at it, here are some of the services to check out. Feed Sifter, Scraper, Superfeedr, Feed Rinse. To be honest, none of these worked completely for me. I’ve tried others as well, including the powerful Yahoo Pipes (too buggy). If you have a system worked out that you’d like to share, please let us know.

Getting every last drop

It’s also possible to set up PubMed search result updates. But there can be weeks between when an article is put online and when PubMed picks it up, so this isn’t ideal. However, it covers all of the journals that PubMed indexes, so it can bring papers to your attention that might have otherwise fallen through the cracks of your RSS feeds.

Services like Hubmed and Mendeley are trying to serve this need from a different angle, but at present don’t offer the immediacy of an RSS feed of AOPs.

Springer is starting a new open access journal called “Scio Cell Biology”. It’s intended to be the first of what they hope will be a whole family of “Scio”-branded journals. They’re a gold open access/libre publisher with a new business model that allows them to offer two excellent features:

1. No author fees.
No authors pay to publish. This includes big rich labs. No color charges, no publication fees, no author fees whatsoever. No institutional fees, society dues, or subscription fees either.

2. Reviewers get paid.
This is the first time I’ve seen anything like this. The amount they’ll get paid hasn’t been announced yet, but they’re hoping that this will let the editors go back repeatedly to the best, most constructive reviewers. They also hope that it will raise the prestige of the journal. The idea is that the reviewers are financially motivated to be as constructive as possible and so authors will want to have their papers reviewed by them.

So what’s their funding model? As a hint, here are some excerpts from their upcoming inaugural issue:

.

These two sites have a bunch of poster design tips, case studies, and other links.
Designing conference posters
Better Posters

Nature just published a 4-page perspective article on the important of releasing the source code for programs used in scientific research. The authors emphasize the importance of reproducibility for results that depend on computation.

Labrigger has already covered releasing source code here, including the sympathetic CRAPL. In that license, it is acknowledged that the code offered is not pretty and no promise of support is offered (in fact, quite the opposite is promised). These are among the key concerns about releasing code. Perhaps the software relies on expensive, propriatery hardware and/or software. Perhaps the code is buggy as hell, thread hostile, devoid of error handling, and relies on core dumps as the main data output interface. Or perhaps the source code is poorly commented, or not commented at all, and all variable names are single characters. Perhaps the code is written in Whitespace with inline Brainfuck.

The authors spend a long time explaining that there is no substitute for releasing the source code. That is, pseudo code, mathematical, or natural language descriptions are never enough. Of course they’re right in principle, but I appreciate the alternative descriptions sometimes, so I wouldn’t want a source code release to replace those alternative descriptions. For example, I don’t want to have to sift through someone’s crap code just to find how they performed a specific bootstrap analysis. The description in the methods section should be sufficiently clear and detailed so that I can code it up myself.

One point the authors make is that errors can be detected when the entire source code is released. Sometimes, even commercial programs have bugs that change results. E.g., GraphPad had a rather unfortunate bug that resulted in data groups being flagged as significantly different when they weren’t.

Perhaps you’ve heard about the Elsevier Boycott aka the Cost of Knowledge. I’m not going to rehash the problem we all know so well. But here are a few links I wanted to share. Just fyi, Elsevier’s take home (EBITDA) is about $500 million/year. That’s about a quarter of what Starbucks makes. Or about 7-fold more than Newport Corporation makes. And as the graph above shows, a good chunk of the revenue is from their scientific publications. Making money is great. I only bring it up to highlight the magnitude of money that is being made. It’s not peanuts.

I’m inclined to believe that most publishers are well aware that their business model needs to evolve. I think most of them know that although their contribution to the scientific process is important, it is only a small part of the entire process now that distribution (and even basic typesetting) is quite trivial. I have this positive view because I’ve spoken personally with editors at top tier journals and this is the impression they left me with. They seemed very interested in finding new ways to contribute to the scientific process and increase the value of their services.

Apparently not all of the commercial scientific publishing world is so reasonable.

A DrugMonkey post pointed to this article by a UK publishing industry representative named Graham Taylor. He dissects it quite well. Here’s an excerpt:

From Graham Taylor’s article: “Public funds have not paid for the peer-reviewed articles that are based on research supported by agencies such as the National Institutes of Health (NIH). They have only paid for the research itself and whatever reports the researchers are required to submit to the agency.

DrugMonkey: Another falsehood, wrapped up with a disingenuous misdirecting belittlement. “Only” for the research? ONLY????? These publishers seem universally unaware that the VAST, VAST majority of the value of an academic article is the bloody research. The damn content. That is what has value. The fancy layout? That’s nice and all but we can do without that. The science is the thing. Trying to dismiss this as a minor contribution is…well…..that takes some serious chutzpah.

If publishers fail to adapt to the changes in technology that threaten their business model, then that’s a problem that scientists don’t necessarily have to worry about. There are plenty of new journals with excellent editorial oversight that are more than capable of taking up the slack. However, the big problem we all need to pay attention to is when publishers try to change laws to prevent the free distribution of research results. The current effort is a bill called the “Research Works Act”. As Michael Eisen describes in his blog, the bill is written to outlaw the free dissimination of taxpayer-funded research in order to protect the anachronistic business model of publishers like Elsevier.

The bill reads:

No Federal agency may adopt, implement, maintain, continue, or otherwise engage in any policy, program, or other activity that:
(1) causes, permits, or authorizes network dissemination of any private-sector research work without the prior consent of the publisher of such work; or
(2) requires that any actual or prospective author, or the employer of such an actual or prospective author, assent to network dissemination of a private-sector research work.

And the bill’s definition of “private sector” includes all NIH-funded research carried out at universities.

They are using intentionally misleading language to distinguish works funded by the government but carried out by a non-governmental agency as “private sector research”. Thus, under this bill, works funded by the NIH but carried at a University would be “private sector research”.

0 comments

Visualizations

Visual.ly recently posted a list of the top visualizations of 2011. This is a map of the world in which Twitter tweets are plotted using their GeoIP info and color-coded based on language. Looks cool, but there is no message. I could have guessed how this map would look. Germans tweet in German, Brazilians tweet in Portuguese, and so on. Big urban centers have a lot of tweets. Developing economic areas and sparsely populated regions don’t. I don’t see anything interesting. And so many of the colors are difficult to distinguish, if there is any new information, it’s difficult to figure out. They should have added more text labels on the map to identify languages near where they are found. Really? This is #1 on the list?

The field of data visualization is not well defined. The comeliness of a visualization is important, but if there is nothing revealed by the visualization– no story told– then it fails. There seems to be quite a number of people who enjoy studying data visualization who are almost purely interested in the aesthetics. This makes it difficult to really get good advice. A recent book review in Science by Robert Kosara seems to be wrestling with this issue. Two books are covered, Yau’s Visualize This, and Lima’s Visual Complexity. The former is a practical guide to creating effective visualizations, the latter is a collection of breath takingly beautiful works of art with no apparent message. Kosara notes that Lima “never attempts to explain what viewers can learn from any of the examples.”

Tufte has already said much of what needs to be said about data visualization. For example: “Have a message.” “No chart junk.” “Maximize the data-ink ratio.” Tufte himself said of Lima’s book, “One useful question to ask of each image is: What did I learn from this, in addition to seeing an elegant architecture?”

Yau does an excellent job following Tufte’s principles. He has a website that is worth checking out, flowingdata.com. And don’t forget Tufte’s own website. Junk Charts is good too. It just posted this holiday Venn diagram:

1 comments

Tycho Brahe

Like the Mayan astronomers over 600 years before him, Tycho Brahe was a data factory. A data factory in the same vein as the Human Genome Project. Or as the Allen Institute for Brain Science is today.

In most formulations of the scientific method, the hypothesis is generated somewhere in the middle. What comes first is careful observation. What comes last are the hypothesis-testing experiments and controls. Often these individual steps are handled by different scientists and groups. For example, the Human Genome Project’s primary goal was one of observation, not hypothesis testing. Perhaps the same is true for the Mayans who observed the movement of Venus across the sky.

Tycho Brahe did what the Mayans did, 600 years later and in even more detail. He had the best primary data for the positions of celestial objects in the sky at the time. It was this high quality data that enabled Kepler to work out elliptical orbits for the planets.

A new website, Neurotycho.org, is set out on a similar mission. There, you can download data from primate experiments and reanalyze it. The setup seems as if they’ll accept data from other people at some point, but so far, it’s a one lab show. That one lab is Naotaka Fujii’s RIKEN lab.

There are similar efforts elsewhere in neuroscience. What’s unique about Neurotycho is that they seem to be reaching out to a very general audience. They also have a wiki with more details.

For the next two weeks, you can get a personal 1 year subscription to any of the Nature publications for a price equal to that journal’s impact factor. (link) (via xcorr)

Wolfram is pushing a new document format called Computable Document Format (CDF). It looks like PDF + embedded Java apps.

One the one hand, there’s the xkcd viewpoint. This is basically just another step in the evolution of Mathematica’s native file format. And right now, Mathematica 8 is the only way to author a CDF file. Alternatively, one can simply author a webpage with embedded Java apps, or maybe even HTML5. Then everyone can use it, anyone can modify it, and most platforms will play it. On the other hand, web pages don’t print out as nicely as PDFs (CDFs should print out as nicely), and it can be a bit messy to download and view a webpage with embedded apps offline.

So maybe there’s a future for CDF. Although the same basic results can be obtained using HTML5 or Java, Mathematica makes it very easy to create some types of interactive infographics. Arguably, it’s possibly exactly what Elsevier’s Executable Paper Challenge is looking for. However, it’s a closed format. Although Wolfram says the specification is public, the restrictions are perhaps enough to prevent wider adoption. There’s only a player for people who don’t own Mathematica (~500 MB download).

It’s hard to get people to install another player on their computer. Flash had success because of streaming video. Shockwave got people to install it for games. I think it will be a challenge to get readers of the NY Times to en mass download another player for their browser just so that they can see an infographic.

See that 23% bar for the RealOne player? And that’s for streaming media, a very broad market. What does Wolfram expect for such a narrow application? No matter how optimistic they are, why would they want to do this? They have to support this software for a bunch of different platforms, including mobile devices. This includes direct customer support and keeping up with changes to the platforms, security, etc. All for free. If Mathematica offered an “Export to HTML5″ option, they’d sell more software. Because then authors would know that everyone can see what they produce, without having to download another player.

So perhaps CDF is a bit like Wolfram Alpha: utterly useless outside of a very narrow field of applications, in which it performs utterly beautifully.

But that’s where the problem lies. I downloaded the player and tried out several of the CDFs. They were ugly and inefficient.

Note the lack of antialiasing. The animation wasn’t very smooth either. I’ve seen much better results with Java and HTML5. I think Processing is probably a better path towards this sort of thing. And that is free and outputs Java.

You can hide source code inline, and that’s nice. But it’s not a particularly innovative feature. Basically there’s a place to click on the right margin that expands a block of source code.

To cap it all off, the typesetting doesn’t seem to be as rich as that of PDF. It really looks like a webpage. The equations look good, as we would expect, but they don’t always select and highlight in intuitive ways. So if you want to cut and paste, it can be frustrating.

Overall, it’s hard to get too excited about CDF. It’s a way to get people to see a Mathematica document when they don’t have the software. But it only works when the intended audience is more likely to download the player than the author is likely to write it up in more standard web technology.