Friday, December 2, 2011

Quick summary ARL / DLF E-Science Institute Capstone -- Atlanta

Waiting for flight home from Atlanta from the ARL / DLF E-Science Institute Capstone event.  Overall, it was a very productive event, especially for discussions with Rob Olendorf (my collaborator and a data management librarian at UNM) and Dale Hendrickson (head of Library IT at UNM).  Almost all of the attendees were library personnel, and I learned a lot from my interactions and the ideas presented.  I thought I would jot down some ideas and action items.

First, action items.  We were encouraged to develop "next steps" for when we return to our institutions.  Here are some of ours:

1.  Incorporate Library interactions with the undergraduate physics course (PHYC 308L, electronics lab) I am teaching next semester.  This is a new course for me, and I won't have time or familiarity to diverge much from the very good plan that prior instructors have developed.  But I know enough that Rob, Dale, and I came up with some concrete ideas that will be great for spurring data management at UNM and with these budding scientists:

  • Guest lecture by Rob to describe data management and related library services.  I think this would be best for the second lecture period in the course.  Rob will describe issues of data management and we will announce our intention to integrate library data management into the course (below).  Rob will also give an overview of github and a quick "how to."
  • A substantial part of the course (as I understand from talking to prior instructors and students) involves developing LabVIEW code for circuit design and simulation.  I'm guessing (pretty sure) that no source code control or versioning is used.  I think this presents a good (not perfect) opportunity to teach the students how to use github for versioning and source code sharing.  I'm thinking it will be an integrated requirement for all of the coding during the semester.  The reason it's not perfect is because LabVIEW uses binary files, so some of the forking and merging functionality will not be appreciated.  Many of the students are experienced in Matlab, though, and where possible I will encourage moving to that platform.  Regardless of how this plays out, I think for sure the students will come away from the course with a fundamental knowledge of github and how wonderful it is for protecting and sharing code.  I think I will also require LaTeX for their final reports, which will work well with github.
  • Incorporate data management, using the Library Institutional Repository.  Some infrastructure and coordination with the library will be necessary here, because I don't think we've done it before at UNM.  Dale's idea is to create a "community" in the d-space IR for our course, e.g. "Junior Lab 308L."  The students will be in charge up uploading their final data sets (testing their circuits) into permanent, curated objects in the IR.  There may be difficulties with this, but I am confident that the students will come away with a good appreciation of the power of good data management, and, hopefully a real, curated data set as part of their career portfolio.
2. Participation is data management "group meeting."  The library currently has some kind of regular meeting like this, and I will visit one of their upcoming meetings.

3. (Mostly for Rob)--"finish" our pilot data management project.  Rob has been working on this for a long time and it hasn't been easy.  He is working on curating and archiving one of Andy Maloney's complete kinesin gliding assay data sets.  The uncurated data can be seen on our server.  I don't really understand how Rob is doing this, but he's done a lot of coding and is close to putting a curated version of that data set into our institutional repository.  There are 500,000 images in the set, and I think Rob said that involves more than 50 million lines of (XML?) code to describe it.  I may be getting terminology and numbers wrong ("schema," etc.) but the point is Rob is writing a lot of code to do it "right."  A finished product will serve as a great example to everyone on campus (and even broader), especially researchers as to what the library can provide for data management.  I think this will be a huge step for us at UNM and in convincing more researchers to collaborate with the Library for research data management.

There were many more "next steps," but they aren't coming to mind now.  More than just next steps, there were a lot of visionary ideas presented by groups at the capstone event.  Here are some that stuck in my mind:

1.  Graduate students are key to connecting data management librarians with research groups.  What seemed the best idea to emerge was that an existing pipeline to graduate students is the general requirement for "ethics / responsible research conduct" courses as part of NIH/NSF training grants.  Good data management is often part of these courses, and in my mind is essential for responsible / ethical research.  Given how these courses are usually implemented, I think it would be fairly easy for data management librarians to obtain one or more time slots to discuss data management with the graduate students.  Best would be "hands-on" coursework, where the students are asked to bring data to the course.  This was discussed a bit on a friendfeed thread.

2.  Our institutional group and at least one other (can't remember the institution) more than once mentioned a vision for the library providing more than just data curation / preservation / storage.  I don't have a good term to capture this area, but it involves capturing / helping with workflow (especially custom software used in labs for data management / processing) and data visualization.  In my mind, a ripe area for connecting with researchers is to work backwards from the traditional publication.  Currently, many libraries have an institutional repository that allows researchers to post PDFs of research papers.  And usually that's about it (from what I can see).  Working back upstream, what I think would be very useful is to provide a computational workspace (through the libary) where researchers can process and produce the figures in those papers.  As an example, my graduate student logs into the library workspace, and uploads the data needed to produce the final figures. The graduate student and me then use software on that workspace (maybe R, Matlab, Excel) to create the figures for the paper.  There is a versioning system to keep track of the code used to process the figures and the many versions created.  When the paper is submitted for peer review (the current standard), it is seamless to link each figure to the data sets and the code used to generate those figures, using either permanent URLs or DOIs.  For me as a researcher, I would LOVE such a system.  And talking with Dale and Rob, it doesn't seem too much of a pipe dream.  It's a lot of work, but I think it would be a huge step and improvement in data management and data sharing in research.  Successful implementation would also be a really great way to recruit more researchers into data management partnerships with the library.  An important component of this I forgot to describe above is that there will be experts in the Library (such as Rob) who can work side-by-side (virtually) with us to develop the data visualization code and figures.

3.  Related to item 1 above, I think connections with graduate students could be greatly accelerated by a grants / data management competition.  A $1000 dollar research grant prize, directly to graduate students for "the best data management," would I think be very effective.  Compared to what we need to accomplish to transform research and the library's involvement, $1000 every so often is not a lot.  But it would mean a lot to the graduate students in the competition.

4. The NSF Data Management Plan (DMP) requirement has already done a lot to connect researchers with data management librarians.  Rob estimates more than 30 faculty connections have been made for him at UNM because of DMPs.  I think this is just one great outcome of the DMP requirement.  And it illuminates a huge opportunity that I see for researchers and libraries.  In my specific case, if I get tenure at UNM, I want to pursue a couple training grants.  One specifically I would like to try for is an "open science" NSF REU program.  REU is "research experience for undergraduates," usually involving summer research internships for undergraduates from other institutions around the country.  I think an REU proposal with a heavy focus on "open science" and advanced data management would look very appealing to the NSF.  Of course I also think it would be very effective in training the next generation of researchers.  Importantly, though, I would need a lot of help to write this grant.  The Library's experience with DMP's can be extended to this effort and people like Rob and others will be essential in planning, writing, and executing the grant.  Moreover, I think other people on campus who are planning other training grants would get a big "broader impacts" boost from this kind of data management or "open science" collaboration with the library.  So, hopefully, our Research office can help coordinate these endeavors.

Many, many more ideas but I think I'm out of steam for now.  Overall, a great conference and I'm excited for pursuing these ideas!

Tuesday, November 22, 2011

An idea for wealthy donors: alternative to direct research funding: fund libraries to help with e-research

Next week, I am attending the E-Science Institute Capstone event, along with Rob Olendorf and Dale Hendrickson from U. New Mexico Libraries.  As part of our preparation for this event, we are interviewing several people around the university to capture their views on e-research.  Today, Rob and I interviewed Martha Bedard, Dean of the UNM Libraries.  Rob and Dale figured it would be good to have me lead the interview, since I'm coming from outside of the library and thus would ask different questions.  At least from my perspective, this was a success and I learned a lot in the generous one hour of time that Martha gave us.

At this point, I can't share the interview notes publicly, but I did want to share one idea that emerged during our discussion (and there were several good ideas!).  I'm having trouble getting the idea in writing so maybe by poorly blogging it, someone else can turn it into a good idea, if it's sensible at all.  Here's what I'm thinking: wealthy donors, or a group of donors that want to make a big impact on research at their university have at least the following two choices:

1.  Provide substantial money to fund research in a specific field, for example by providing 10's million dollars to fund a nanomedicine research center.  Or to build a new biomedical engineering building.  Etc.

2.  Provide substantial money (say $10 million) to the university library in order to vastly improve the ability of ALL researchers at the university to conduct e-research.  The money would go towards hiring many new library faculty and staff members and procuring and implementing storage and networking infrastructure.  The goal would be a completely transformed library that would make it easy and almost automatic for all university researchers to conduct connected, networked, open, archived, discoverable, etc. research.

Option 1 is common and makes a big impact on specific research fields.  Performing research in excellent facilities, with dependable funding is a great thing for researchers.  As far as I know, option 2 is less common, and I'm not aware of a good example.  But I think there'd be tremendous leverage compared to option 1.  The reason there is so much leverage is because currently the huge potential of "e-research" remains almost untapped.  There are shining examples of successes.  (For an excellent overview of the successes and the vast, untapped potential, read Michael Nielsen's excellent book.)  But in reality, for most researchers it's really difficult to manage data, share data, provide open access publications, etc.  And this is true even for researchers like me, who've decided to be as open as possible yet are finding it difficult to do so effectively!  So, it's basically true that there are huge technical barriers for most of the researchers to maximize the impact of their research by sharing.  Because we're so bad at it and because it's so difficult, I think there's a ton of room to make a huge impact at a university with a medium-sized grant.  I think the uinversity library is the natural and only choice to lead the effort.  And by doing so, it would impact all of the researchers across all of the disciplines (humanities, science, medicine, etc.).  How would they implement option #2?  I don't actually know, and that's a big reason why I want the library to do it!  Rob Olendorf, my collaborator at UNM on open data projects has a vision for how to make it seamless and almost automatic for researchers like me to connect, archive, and share our research and data.  I don't understand how that can work, and I don't have time to understand.  But I would LOVE to participate in that system.

That's the final key to the idea.  I think a university would gain a huge competitive advantage by becoming the "e-research leader."  There is a perception that most researchers are content with limited sharing and the status quo.  This may or may not be true.  But regardless, it looks like there is a lot of momentum, driven by the public interest, for funding agencies to go much further with data sharing, data management, open data mandates.  These mandates are scary to many researchers.  Even if researchers want to have excellent data management and share their data, it's almost impossible to do so now.  So, compliance will be a huge and new headache for researchers.  If a university could boast that compliance is "seamless and easy" it would be a real and strong recruitment incentive.  This probably sounds questionable to some, but I really see it as a huge incentive.  It would be just as appealing as the opportunity to work in a fancy new research facility.

Thursday, November 3, 2011

The inevitable spread of open science

Two things have happened this week that make me really happy about the research in our lab and the spread of open science.  First, we have a new undergraduate REU student, Alex Haddad, who has started her own open notebook science under the mentorship of Anthony Salvagno.  Her notebook is on wordpress.com and can be found here.  This is Alex's first experience in a research lab and she has immediately embraced open notebook science and she is excited about it.  One cool thing that I've noticed already is that her notebook entries are automatically linked in Anthony's notebook when she links to them.  Some kind of trackback thingy that I don't understand, but is great as far as good notebooks go.  An example can be found in Anthony's notebook entry, which automatically links to Alex's entry providing more information (see the trackback at the bottom of the page).  Welcome, Alex, to open notebook science!

The second thing that happened is that our former PhD student, Andy Maloney, just started a new postdoc at UT-Austin with Hugh Smyth.  This is going to be a very productive experience for both Andy and Hugh's lab, I am confident.  Most excitingly, though, is that Andy and Hugh have decided to incorporate open science into their projects!  I think this is very big news and a success for the spread of open science.  Major props to both Andy and Hugh for their willingness to carry out major parts of their research using open science!  I had some further thoughts on this and the implications for the spread of open science.  Instead of re-writing them, I'll just quote my comments on the FriendFeed thread:

I think big factors are Andy's commitment to open science and his new PI's commitment to making an impact in science and medicine.  I met Hugh Smyth a few times when he was at UNM and only detected awesomeness, both in his research and in his mentoring and concern for students.  Openness is probably going to be more challenging for them, though.  One reason is their research is much more applied and medical, and thus IP plays a major role.  The field is probably a lot more competitive.  And their lab is much more successful with funding.  As Nielsen and others have pointed out, the current reward system stacks the cards against openness.  So they will have to be careful.  But I think they're clever enough to figure out how to do it, and their success will pave a lot of roads for future openness.  I've been thinking about it pseudo-mathematically and I think the fact that they're even willing to try is a success.  I've had two PhD students graduate so far.  One is likely in industry for a long time and unlikely to be open for a long time if ever.  The other, Andy, is now at least partially doing open science.  The subsequent students in our lab (Anthony, Alex, Nadia, Pranav) are still performing open science.  A former intern, Diego Ramallo Pardo is in grad school at Stanford and has a passion for openness, but not able to be open yet.  Dozens of undergraduate lab students have performed open notebook science in my lab course, and there have been a few instances of continuing ONS after the course (most do not continue in research careers).  So, at first glance it appears that there isn't a high rate of spread of openness from our research and teaching labs.  But it occurs to me that it doesn't matter.  If we were to model openness as an infection, it's a powerful one.  I think it's even a latent infection in almost all scientists.  Participating in openness awakens the infection for life and it sheds constantly.  The immune reaction is our current system of practicing and rewarding science and it's quite powerful.  So it wins in a lot of cases.  Nevertheless, openness is slowly winning more often and the immune system is not going to adapt to get stronger.  On the contrary, the immune system is going to take major hits in the coming years.  Funding agencies are going to change rules.  Tenure and Promotion and hiring committees are going to add members who value openness.  Closed-access publishing for profit is going to topple precipitously.  And at that point, openness will spread and emerge naturally and quickly.  It seems plain as day to me.  Now, one of you all can translate that into epidemiological mathematics and fiddle with some exponents.

Friday, October 28, 2011

Open Access Week event at U. Arizona: Reproducibility, Open Data

Earlier this week I was lucky to participate in the Open Access Week event at the University of Arizona: The Future of Data: Open Access and Reproducibility.  The event was hosted by Chris Kollen and Dan Lee of Arizona University Libraries.  I am very grateful for the invite and the opportunity to meet them, some active member of the audience, and the other speakers, Victoria Stodden and Eric Kansa.

Victoria Stodden gave an excellent talk, framed around the computational sciences, and with the major point: Instead of promoting "open data," we should promote "reproducibility" in science.  She argued, very convincingly, that good science requires reproducibility and thus scientists should be easily convinced that we need very high standards for reproducible results.  For computational research, the only way to ensure reproducibility is to publish much more open data and open code than is normally done now.  If your result is computational, how can anyone hope to replicate and build upon your results if you haven't provided the source code and the data sets?  They can't, but publications without code and data are by far the most common these days.  It's a failure of science that is probably caused by many factors.  One that comes to mind is that computational scientists have been forced to fit their "publications" into standard peer-reviewed articles, where the system is not set up to accept and / or host source code and data.  (As an aside, this is clearly a routine failure of peer review, as referees obviously are not ensuring reproducibility of the research, which should be a primary criterion for publication.)  Scientists understand that reproducibility is an essential element of research.  For example, two years in a row, my undergraduate physics majors identified reproducibility as the most important element of good science (see brainstorming 2010).  Since scientists understand this, then they will naturally practice open publishing of data, code, methods when they realize that reproducibility is missing without those elements.  As Victoria argued, demanding "open data" leads to confusion and resistance and ultimately probably lack of compliance.  In contrast, demanding "reproducible research" is already a cultural norm and it naturally leads to open data and open code of the most helpful variety for reproducibility.  Victoria's slides can be found here.

The notes for my presentation can be found on linked mindmaps, starting here.  (Click on the tiny right arrows to navigate.)  My notes are probably not too meaningful if you weren't at the symposium.  In contrast to Victoria's high-level talk about policies that could make a major impact, I told a few stories about open data and open notebook science in our own teaching and research labs, and the successful impact we've had already.  I think (hope) it provided concrete example of the benefits of open science.  On the one hand, I showed that open science, especially open notebook science strongly promotes reproducibility.  This has been seen best in the undergraduate physics lab that I teach.  Students read the notebooks of other students from prior weeks and prior years.  They build upon these previous results, which allows them to get the experiment working much quicker and have more time to explore new aspects of the experiment, or to develop new data analysis methods.  They are doing real science!  I showed an example of an excellent primary notebook from Alex Andrego and Anastasia Ierides.  However, I think I also showed that open data and open science make an impact beyond just reproducibility.  This impact is in reuse and repurpose of data. I told two stories where theory and research groups already have been able to use data we publicly shared on youtube.  One group has already used our data in a theory preprint on the arXiv.  Both groups expressed delight and gratitude that our data was freely availalbe.  There are two important features of these stories.  First, both groups used our data for a purpose that we had not (and probably would not have) imagined!  Clearly the impact of our data was multiplied by being public.  Secondly, we did the easiest and simplest sharing method we could find: youtube, yet we still made an impact.  We are currently working with Rob Olendorf, a data curation librarian at UNM to vastly improve our sharing.  This will include permanent citation links, vastly improved metadata (at least 10x more than the data itself), hosting by the institutional repository (much safer than our lab server), and links to other data sets.  Reason would have it that if we could make an impact with the imperfect system we tried first, then the impact will be much higher with the data shared via Rob and the institutional repository.

The final talk was by Eric Kansa, who described the amazing work of him and his colleagues on Open Context, a platform for sharing and linking archaeological data.  His notes from the event can be found here.  And his slides are available also: A More Open Future for the Past.  Despite being far from the field of archaeology, it was easy for me to see the vast impact that Eric and his colleagues are making via the open context project.  A large amount of time, sweat, and money are expended collecting archaeological data.  Without opening these data and curating and linking these data, the potential impact is severely limited.  The Open Context team has developed a method for collecting these data, archiving them, and linking them to other data sets.  The method is very effective, and importantly requires far less work than required to collect the data in the first place.  This seemed clearly, to me, a case of the huge power of data reuse and repurpose. In contrast to computational science, the power of data reuse seemed to trump the need for open data for reproducibility.  This is not surprising, given how different the two fields are.  But it was an interesting and somewhat confusing contrast for me between the needs for open data in computational research versus archaeology.

There were several engaged audience members.  One of them was Nirav Merchant, with the iPlant Collaborative.  Victoria and I were highly impressed by the computational platform that iPlant has developed already, only three years into the NSF cyberinfrastructure project.  I was simply amazed and I couldn't do it justice describing it.  The ability to ensure reproducibility of computational research with the iPlant platform is vast.  One example is how easy it is to save an image of a virtual machine and then share this image with other users.  They demonstrated this for us and it took only a few clicks and less than a minute.  I highly recommend reading more about iPlant at their site linked above.  The iPlant team that we met was energized, engaged, and collectively brilliant.  I'd love to know how they assembled their team as they've clearly done an excellent job.  I intend to keep in contact with the iPlant folks and am even hoping that I could introduce the computational platform to my Junior Lab students this year.  I think the exposure to these state of the art and "open" tools will be invaluable for their future research.

Overall, the one-day Open Access Week event was highly successful for me.  I met some amazing people and gained a lot of clarity in my thinking about the imperative for much more openness and sharing in science. Incidentally, maybe not coincidentally, during my flights I was able to read Michael Nielsen's fantastic new book on the untapped potential of connected, open science: Reinventing Discovery.  Despite having met Michael and having heard him speak a few times, I still found the book riveting and I learned a lot.  I absolutely recommend the book to anyone interested in the practice of science!

Friday, March 4, 2011

I am maximally-skeptical that there currently exists any evidence that drinking deuterium-depleted water has health benefits or will cure disease.

Because of our lab's interest in the biophysical effects of heavy water--both heavy-hydrogen water, D2O, and heavy-oxygen water, H2O18--I received a very friendly email inquiry today.  The person suffers from a health problem and currently hopes that drinking deuterium-depleted water will help with that condition.

As a scientist and a health consumer, I am maximally-skeptical of any medical claims related to drinking deuterium-depleted water.  This is despite that fact that I think there's a good chance that cells may behave differently if deprived of deuterium, which exists in all natural water sources.  The reasoning for my skepticism is very straightforward.  There is a dearth of any published scientific or medical research utilizing deuterium-depleted water.  As I will note below, there are less than a dozen research papers on the topic.  So we really don't know.  There is almost no evidence.  We don't know whether drinking large quantities of deuterium-depleted water will be helpful or harmful or negligible.

There is much more evidence, though, that the quantity of water that would need to be consumed is quite large.  Because deuterium is natually-occurring, there's a lot of it in your body!  It would take a long time of drinking lots of D-depleted water to have a systemic effect.  My interpretation of the existing evidence is that by far the most likely outcome of this therapy is that it will generate profit for whomever is selling the D-depleted therapeutic water.

Because I think it's a shame that a Google search for "deuterium-depleted water" is overrun by claims of cures for horrible diseases, I asked the person who wrote me if I could send my response on my blog instead of privately.  So that perhaps our discussion could benefit more people.  The person kindly agreed and so I will post his email:

Dear Steve,
I enjoyed reading your blogs and noted that you work with D2O.
I have a medical condition that I want to treat with alternative methods - one of them is drinking "light" water.
Do you know, or can you suggest any resources for the following:
1. how to make "light' water, with D2O concentration of below 50ppm 2. who does D2O concentration testing in the us for water samples 3. who makes light water (for sale) 4. any scholarly literature on this topic...
Any info will be much appreciated and shared with fellow friends who are in need.
Thank you very very much!!

Here is the reply I would have sent, but instead post publicly:

Dear ___, 
Thank you for your kind message.  I am sorry to hear of your medical condition.  I am not an expert on the medical effects of deuterium-depleted water.  In fact, I am not aware of any medical experts on this topic.  As a scientist, I am maximally-skeptical of any claims of currently-known medical benefits of drinking deuterium-depleted water.  I'm not saying it will help or hurt you, I'm saying that I don't think anyone is close to knowing whether it will be helpful, harmful, or negligible.  There is almost no published, rigorous research on the subject (your question #4), and thus any claims are probably speculation.  I would suggest talking to a medical doctor, which I'd guess you've done plenty of, since they know almost infinitely more about the human body than I do.  However, I would think that any medical doctor, or indeed any living person, would merely have to guess, because I do not see any experimental evidence beyond just less than a dozen published reports which have yet to be challenged or supported. 
Below I will put responses to your specific questions, and I wish you the best, 
Sincerely,
Steve 
1. I don't know of an efficient method for producing mildly-deuterium-depleted water.  The deuterium-depleted water we use in our research is much more depleted.  We obtain it from Sigma, a chemical supply company, and it is roughly $100 per 100 milliliters (a few ounces).  As you may know, you would probably need to drink a lot of water over many days to appreciably deplete deuterium from your body.  This would surely be expensive.  And like I said above, as far as I can ascertain, it's unknown whether it would be helpful, harmful, or negligible.
2. I don't know who does D2O testing.  I'd be skeptical of anyone offering these services related to this medical purpose.  Incidentally, deuterium-rich water is inexpensive.  You could easily mix D-rich water with regular water and see if the purported D2O-testing company is able to correctly discern the difference. 
3. We so far have only purchased from Sigma.  See for example product #195294.
4. I have read two scholarly papers on the subject, both from a research group out of Hungary.  I found both papers very interesting, but I also am highly skeptical of the interpretation of their results.  A good place to find scholarly papers related to biology or medicine is on Pub Med.  This link will hopefully take you to a search for articles related to deuterium-depleted water.  I can only see one that is freely available.  Google Scholar is another place to search, but it will not be limited to biological articles. 
I actually find this topic fascinating, as far as whether life has evolved a beneficial use for naturally-occurring deuterium.  We have a side project in our lab to see whether we can notice any effects on tobacco seed growth.  We're using tobacco seeds because they are tiny, so we don't need much water to see an effect.  We got this idea from Gilbert N. Lewis, who did the initial studies in the 1930's that showed that too much deuterium affects life. One of the reasons I find this side project on deuterium-depletion so fascinating is that I see it as an open mystery.  That correlates well with my skepticism of claims related to therapeutic effects of drinking light water.

Below, I will embed a comment thread from FriendFeed, and also there are potential comments on the blog itself.  I expect them to be a mix of helpful and derisive...hopefully more of the helpful type!

Saturday, February 5, 2011

An open data success story

Over the winter break, Andy Maloney and our lab enjoyed an open data success story.  Andy shares his data publicly with a CC0 / public domain license.  Some scientists ran across the data, I think by Google searching and contacted us to ask if they could use our data to support their research.  Since it is CC0, they didn't have to ask, but like most scientists, they were courteous and did contact us.  I shared this story at the ScienceOnline2011 "Data Discoverability: Institutional Support Strategies" session and I think people liked the story.  Jean-Claude Bradley mentioned it in his blog summary of the conference, and Lucy Power saw this and contacted me for more details.  Lucy's is studying e-Research for her Ph.D. dissertation topic.  I sent her a reply, and instead of rewording it, I will just past it below.  I can answer questions on the FriendFeed thread.  Yay Open Data!


Hi Lucy – I definitely should write up a blog post about it and I will try to do that soon.  I think it’s a great little success story for open data and data reuse.  In a nutshell (and I can answer questions): Some people found Andy’s microtubule gliding assay data on youtube and emailed us to say it was very interesting to their theoretical work and could they use our data in a pre-print.  We replied “of course!” “woo hoo!” and we told them that it’s all public domain data so they are free to do whatever.  As a courtesy, we said we’d like a shout-out.  They went further and offered co-authorship, but Andy and I decided an acknowledgment was more appropriate at this time.  Andy suggested they acknowledge open notebook science, etc. and they did in their pre-print.  You can find the pre-print here: http://arxiv.org/PS_cache/arxiv/pdf/1101/1101.2225v1.pdf see Figure 3A for Andy’s data and the acknowledgments section.
 I think it’s a great success story because (A) they never would have known about our data if it weren’t open.  It didn’t necessarily have to have an open license, but it needed to be discoverable.  (B) we never would have thought to use our data for this purpose.  So obviously value was created via openness. OK, I’ll try to write up the story in a blog or something soon!  (Maybe I should just post the above and not worry about wording it better? J )
 --Steve

FriendFeed Thread:
 
Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.