Posts tagged with collaboration
If you like Python, want to analyze data online, and are interested in a standardized environment that can be easily shared, read on:
Continuum Analytics is offering a new beta: Wakari. You can register to try it out!
If you’re an academic frustrated by setting up computing environments and annoyed that your colleagues can’t easily run your code, Wakari is made for you. Wakari handles all of the problems related to setting up a Python scientific computing environment. Because Wakari builds on Anaconda, useful libraries like SciKit, mpi4py and NumPy are right at your fingertips without compilation gymnastics.
Since you run code on our servers through a web browser, it is easy for your colleagues to re-run your code to repeat your analysis, or try out variations on their own. At Continuum, we understand that reproducibility is an important part of the scientific process that your results be consistent for reviewers and colleagues.
That’s who is credited with creating Boston-NeuroTalks (look at the bottom of their page). It turns out that Boston’s a bit of a college town, and thus it’s difficult to keep up with all of the neuroscience talks at the various institutions. Boston-NeuroTalks started out as a mailing list, then evolved into a calendar service, and now is a wiki. It’s not a bad model and seems to work pretty well in practice.
Perhaps other places (e.g., New York, Bay Area, London, San Diego, Research Triangle) would also benefit from this kind of system.
When I’m in a tight spot and don’t have CAD software handy, there are a couple of sites I use to view CAD files online.
ShareCAD lets you upload CAD files in a lot of different formats and displays the 3D model. It offers very minimal interaction, but you can pan and zoom. You also get a link that you can share via email with collaborators so that they can view and download the file. Handy.
ProfiCAD’s AutoCAD Viewer is more limited, but sometimes works where ShareCAD doesn’t. For example, here is an AutoCAD drawing that ShareCAD couldn’t handle.
Ars Technica has a short, but nice article on the US Government’s role in AT&T’s decision to release Unix to a wider audience.
Bell Labs’ Unix operating system is often credited as being the catalyst that kick-started large-scale, open source software collaboration. They released Unix with no tech support, so the users formed their own community and quickly began sharing ideas. This, in time, inspired other communities, including the recently developed open source hardware community.
Why did Bell Labs release Unix without tech support? The larger company, AT&T, was constantly under the threat of being busted under anti-trust laws. In the 50′s, the government asked AT&T to narrow their business focus. So when Unix came along, AT&T wanted to make it clear that they weren’t in the software business. That’s why they released under the terms that they did. It was free-as-in-beer, not free-as-in-speech, but it was close enough to let the community grow.
Did AT&T’s refusal to provide technical support hurt Unix? Quite the opposite, argues Salus. Instead, the policy had an “immediate effect: it forced the users to share with one another. They shared ideas, information, programs, bug fixes, and hardware fixes.”
A paper on data organization just came out in Nature Methods (Millard et al. 2011, commentary by Swedlow et al.). They believe, as do I, that using XML schema to organize data is a good way to simplify automated analysis. For the primary container, they use HDF5. Fun fact for all the MATLAB users in the audience (from Tim at Imperial College):
The .mat file format is simply an HDF5 file with a pointless header prepended.
They call the XML metadata+HDF5 data combos “SDCubes”, for Semantically typed Data hyperCubes. Why cubes? That suggests that they are the same size along all axes, which they probably aren’t. If you don’t like hyperrectangle, you can use orthotope. One point that is lost in the figure I put above is the idea that the axes do not have to be continuous. There can be gaps and jumps. There can also be piles of data that all share one point on an axis, if that suits the data.
I like this approach because it is very general and simple. It consists of two file formats that are already being used by many researchers. In a way, the authors didn’t “create” anything. Hopefully this paper will give the strategy some added credibility and help to standardize it. Then people can concentrate on developing tools for working with data in this system, rather than developing new formats all the time.
On the topic of the recent app post, I’ve been thinking about an old blog post from Peter Keane. The of his post is, “What is Data’s Killer App?” In it, he pondered the problem of data management.
After a not entirely apt analogy to HTML, Keane suggested that there is no killer app coming. Instead, “our solution will likely be a set of protocols, formats, and practices all of which will enable the creation of end-user applications that can ‘hide the plumbing.’” I think that’s a valuable insight and I submit that it already exists. Moreover, people are already migrating to it. Just not in any organized way.
I suggest that this is a minimal skeleton for data and lab management upon which most comprehensive solutions can be built:
For protocols and other miscellaneous information: a lab wiki.
Primary data should be stored in, or referenced by, XML files.
Standard collaborative tools should be used where appropriate. E.g., Google Docs, MS Office, Zoho, Buzzword, etc.
Standard databases tools should be used for organization of stereotyped datasets. That means mySQL-based tools like dbForge, or Filemaker, or the like. Excel is not a database.
When Karl Deisseroth started publishing his work on Channelrhodopsin-2, he set up a website to share the resources, including plasmid information, protocols for expression systems, and hardware details. His site, optogenetics.org, is an excellent source. However, it is focused on Deisseroth lab information.
For a more broadly focused resource, Josh Siegle (Matt Wilson lab, MIT) and others have consolidated a great deal of information in wiki format at OpenOptogenetics.org. The wiki format is ideal for this sort of information since it is changing all the time, and the relevant personnel changes over time as well.
There’s already a good amount of information on the site, but there are several opportunities to contribute and fill in the gaps as well. I encourage you to pitch in.
Ebird is a website where bird watchers (or “birders”, in the parlance of the field) upload their observations. This project, by Cornell’s Lab of Ornithology and the National Audubon Society, has been a beautiful success, with over 1.5 million observations. I know first hand that birders are a unique community, but they are not the only amateur community that could contribute research. Amateur astronomers are a similarly fertile resource. What other communities could be organized to further scientific research?
A similar approach is being used by protein chemists. Predicting tertiary structure from amino acid sequences is a huge problem that is being attacked on multiple fronts. One effort to contribute better structure solving algorithms is a bit like “The Last Starfighter.” Researchers at the University of Washington have developed a game, Foldit, which trains people to accurately predict tertiary structure from amino acid sequences. Apparently, this is something that some people have a natural knack for. The gamers fold proteins with known structures, so the game can grade their work. By observing the strategies that expert games use, the researchers are hoping to come up with better algorithms; and maybe even have gamers design proteins themselves.