What are some of the useful ways of sharing and dissemination of research data that has been painstakingly collected as part of the research process? Is it enough to share and disseminate research reports or should we be sharing datasets too? What are some of the desirable practices of licensing reports and data openly such that others can reuse and build upon them? How do we archive and store data for long term preservation? How do we make more of data, graphs and charts as tools for research communication?
As we reach the completion of research projects as part of ‘Exploring the Impacts of Open Data in Developing Countries’ (ODDC), these were some of the interesting questions that were reflected upon, in a monthly web meeting hosted by the Web foundation along with the research partners on 15th April 2014. Melanie Brunet (Research Librarian at IDRC) and Barbara Porrett (IM Systems Analyst, IDRC) talked about sharing and archiving research findings and reports in the IDRC system through the IDRC Digital Library (IDL). Lars Holm Nielsen (CERN) gave a very useful presentation on Zenodo, a research data archive to store and share research data.
This post shares some of what was discussed.
The IDL Repository: Sharing research publications for maximum reach
The IDL is a repository that includes research outputs in many forms: articles,books, technical publications,theses,conference papers,videos etc which are generated by the IDRC funded projects. As far as possible the IDRC encourage all funder projects to use Creative Commons (CC) licenses for publications, maximising their accessibility.
It’s important for researchers to consider when publishing whether the journals they are submitting research too might restrict access to the publications, through claiming copyright and only making articles available for a fee. Sometimes it is important for researchers to publishing in high-ranked journals that are not open access, but in these cases, researchers should still look for ‘hybrids’: journals where the article would be open access upon payment of an author fee, or where the author has rights to archive pre-print copies of the papers.
A number of tools exist to help researchers locate journals to publish in: the Directory of Open Access Journals allows browsing by title, subject, country, license and publication fee, and SHERPA/RoMEO could be used to see if a journal allows for self-archiving in an institutional repository (pre-print, post-print, publishers PDF). Not all ‘author-pay’ journals however are committed to good schoarly dissemination: Bealls list of Predatory Open access provides a list of journals to avoid.
Zenodo: A tool for archiving research data and sharing outputs
One of the most unfortunate things that could happen to research data is its loss either partially or entirely. This can happen in a myriad ways from a lost USB drive or a corrupted hard disk. That is wiping off your entire hard work in a swoosh. Archiving research is thus essential long term preservation. That apart, there are other reasons to store research data. One of them is that research data from even 10 years ago can be difficult to read with the existing software.
A 2010 study showed that only 25% researchers share their data openly and 20% store it in a digital archive, showing how researchers cannot be trusted with long term preservation of data. So why are researchers poor at storing and sharing data? There are 2 reasons for it. One, it is not easy to share research data; and the second being that researchers do not get credit for their data in the same way they do for publications. A lot of publishers are addressing the second problem, for example, through the creation of data journals to publish dataset descriptions.
Zenodo solves for the first ‘not-easy-to-share’ problem. Developed in CERN, the primary goal of Zenodo is to get research data from a local hard drive to a proper digital archive where it can be preserved for future by bringing all of CERN’s infrastructure in an easy to use interface. Publishing data on Zenodo in a simple and quick process involving 3 steps:
1. Choose a file from the local hard drive and upload
2. Describe: Title, Author
3. Submit: Files go straight online
Zenodo is able to supports big files of any size without an upper limit, as well as to access them from Dropbox. One of the major advantages with Zenodo is that files can be readable in several years. Great care for preservation is taken where they make sure that the data is protected in case of any corruption to files. Data is re-read from time to time and proper backups are provided.
Zenodo encourage researchers sharing data to use a Creative Commons license although they support other licenses and have a host of opensource software licenses too. Another very useful function of Zenodo is the Embargo function for researchers who are worried about their data being used before their research publication. Using this function, data can be released only in the future when researchers desire even while uploading the data in the present.
Start from the begining: preparing to archive research data
These are issues that researchers need to start thinking of before collecting the data.
Researchers are advised to think of file formats while archiving. Open source formats like CSV are a better alternative to Excel. Researchers also need to consider privacy requirements. If the data contains Personal Identifier information, they need to get consent from the people to publish data about them, or to make sure published files do not contain any personal or private information. Intellectual Property Rights (IPR) issues need to be taken care of too: keeping a log of secondary data that might be mixed in with a research dataset is a useful way to ensure IPR issues don’t become a barrier to sharing data later.
Meta-data is really important to ensuring research data can be re-used. While uploading data for future reference, researchers are encouraged to describe all the parameters available in the file, such as codes and their meaning. Including clear descriptions and meta-data in the Zenodo record will also help other researchers to understand how they can re-use data in the future.
If researchers wanted to follow the interest on their datasets on social media, they could use Article level Metrics, an external service, which allows you to see how many people tweeted about their data and number of readers for their data among other things. Zenodo gives a Digital Object Identifier( DOI) which is absent on slideshare or dropbox. As DOI is persistent and does not change over time, it allows researchers to see what is being cited and makes counting citations of research data much easier. A DOI is resolvable ie. researchers can go to the web location where this data is stored.
If researchers want to submit my data to multiple platforms, Zenodo allows them to submit a dataset which is hosted in a different place. While researchers need to upload datafiles the master version is sourced from the data verse and researchers can take the data verse DOI and include in their submission. It is better to have one DOI.
Sharing research data from the Open Data Barometer
During the Web Meeting, Tim Davies outlined how sharing research data alongside the research publications had helped increase the impact of the Open Data Barometer.
For the Open Data Barometer (ODB), The Web Foundation released both the report and datasets on the same day because we wanted to emphasize that it is not just about the report but the data. Two days before the report release, Tim Davies, the lead co-ordinator of the ODB uploaded the datasets on to Zenodo with an embargo on the data for until when the research was released.
The ODB team shared the data and datasets and described each of them carefully and converted some of the data in the excel sheet into own individual CSV files to ensure readability. Before releasing, we sent a press release highlighting some of the key findings from our work for maximum coverage and support secondary use.
The ODB team also pulled out of our their reports some of the charts as images for people to tweet about or share the key findings on other social media. By taking a conventional report and breaking it down into parts and show datasets alongside the report the Barometer was able to reach more people on launch day. And using the data that had been shared, the ODI created the first interactive visualisation of the data within just a few days showing the potential for re-users to make more of open data and provide new ways to explore research findings.
So, it is critical to think about separation of reports and datasets and to start early with data management planning, which makes it a lot easier and could be integrated into the workflow.