Research Data Curation Bibliography Creator


The Research Data Curation Bibliography includes over 680 selected English-language articles, books, and technical reports that are useful in understanding the curation of digital research data in academic and other research institutions.

The "digital curation" concept is still evolving. In "Digital Curation and Trusted Repositories: Steps toward Success," Christopher A. Lee and Helen R. Tibbo define digital curation as follows:

Digital curation involves selection and appraisal by creators and archivists; evolving provision of intellectual access; redundant storage; data transformations; and, for some materials, a commitment to long-term preservation. Digital curation is stewardship that provides for the reproducibility and re-use of authentic digital data and other digital assets. Development of trustworthy and durable digital repositories; principles of sound metadata creation and capture; use of open standards for file formats and data encoding; and the promotion of information management literacy are all essential to the longevity of digital resources and the success of curation efforts.

The Research Data Curation Bibliography covers topics such as research data creation, acquisition, metadata, provenance, repositories, management, policies, support services, funding agency requirements, peer review, publication, citation, sharing, reuse, and preservation.

This bibliography does not cover digital media works (such as MP3 files), editorials, e-mail messages, interviews, letters to the editor, presentation slides or transcripts, unpublished e-prints, or weblog postings. Coverage of conference papers and technical reports is very selective.

Most sources have been published from January 2009 through September 2017; however, a limited number of earlier key sources are also included. The bibliography includes links to freely available versions of included works. If such versions are unavailable, links to the publishers' descriptions are provided.

Such links, even to publisher versions and versions in disciplinary archives and institutional repositories, are subject to change. URLs may alter without warning (or automatic forwarding) or they may disappear altogether. Inclusion of links to works on authors' personal websites is highly selective. Note that e-prints and published articles may not be identical.

Abstracts are included in this bibliography if a work is under a Creative Commons Attribution License (BY and national/international variations), a Creative Commons public domain dedication (CC0), or a Creative Commons Public Domain Mark and this is clearly indicated in the work (see the "Note on the Inclusion of Abstracts" below for more details). In cases where the license has changed since publication, the most current license is described.

For broader coverage of the digital curation literature, see the author's Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works,which presents over 650 English-language articles, books, and technical reports, and the Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works, 2012 Supplement, which presents over 130 additional sources.


In memory of Paul Evan Peters (1947-1996), founding Executive Director of the Coalition for Networked Information, whose visionary leadership at the dawn of the Internet era fostered the development of scholarly electronic publishing.



Aalbersberg, IJsbrand Jan, Sophia Atzeni, Hylke Koers, Beate Specker, and Elena Zudilova-Seinstra. "Bringing Digital Science Deep inside the Scientific Article: The Elsevier Article of the Future Project." LIBER Quarterly 23, no. 4 (2014): 275-299.

In 2009, Elsevier introduced the "Article of the Future" project to define an optimal way for the dissemination of science in the digital age, and in this paper we discuss three of its key dimensions. First we discuss interlinking scientific articles and research data stored with domain-specific data repositories—such interlinking is essential to interpret both article and data efficiently and correctly. We then present easy-to-use 3D visualization tools embedded in online articles: a key example of how the digital article format adds value to scientific communication and helps readers to better understand research results. The last topic covered in this paper is automatic enrichment of journal articles through text-mining or other methods. Here we share insights from a recent survey on the question: how can we find a balance between creating valuable contextual links, without sacrificing the high-quality, peer-reviewed status of published articles?

This work is licensed under a Creative Commons Attribution 4.0 License.

Aalbersberg, IJsbrand, Judson Dunham, and Hylke Koers. "Connecting Scientific Articles with Research Data: New Directions in Online Scholarly Publishing." Data Science Journal 12 (2013): WDS235-WDS242.

Researchers across disciplines are increasingly utilizing electronic tools to collect, analyze, and organize data. However, when it comes to publishing their work, there are no common, well-established standards on how to make that data available to other researchers. Consequently, data are often not stored in a consistent manner, making it hard or impossible to find data sets associated with an article—even though such data might be essential to reproduce results or to perform further analysis. Data repositories can play an important role in improving this situation, offering increased visibility, domain-specific coordination, and expert knowledge on data management. As a leading STM publisher, Elsevier is actively pursuing opportunities to establish links between the online scholarly article and data repositories. This helps to increase usage and visibility for both articles and data sets and also adds valuable context to the data. These data-linking efforts tie in with other initiatives at Elsevier to enhance the online article in order to connect with current researchers' workflows and to provide an optimal platform for the communication of science in the digital era.

This work is licensed under a Creative Commons Attribution 3.0 License.

Abrams, Stephen, Patricia Cruse, Carly Strasser, Perry Willet, Geoffrey Boushey, Julia Kochi, Megan Laurance, and Angela Rizk-Jackson. "DataShare: Empowering Researcher Data Curation." International Journal of Digital Curation 9, no. 1 (2014): 110-118.

Researchers are increasingly being asked to ensure that all products of research activity—not just traditional publications—are preserved and made widely available for study and reuse as a precondition for publication or grant funding, or to conform to disciplinary best practices. In order to conform to these requirements, scholars need effective, easy-to-use tools and services for the long-term curation of their research data. The DataShare service, developed at the University of California, is being used by researchers to: (1) prepare for curation by reviewing best practice recommendations for the acquisition or creation of digital research data; (2) select datasets using intuitive file browsing and drag-and-drop interfaces; (3) describe their data for enhanced discoverability in terms of the DataCite metadata schema; (4) preserve their data by uploading to a public access collection in the UC3 Merritt curation repository; (5) cite their data in terms of persistent and globally-resolvable DOI identifiers; (6) expose their data through registration with well-known abstracting and indexing services and major internet search engines; (7) control the dissemination of their data through enforceable data use agreements; and (8) discover and retrieve datasets of interest through a faceted search and browse environment. Since the widespread adoption of effective data management practices is highly dependent on ease of use and integration into existing individual, institutional, and disciplinary workflows, the emphasis throughout the design and implementation of DataShare is to provide the highest level of curation service with the lowest possible technical barriers to entry by individual researchers. By enabling intuitive, self-service access to data curation functions, DataShare helps to contribute to more widespread adoption of good data curation practices that are critical to open scientific inquiry, discourse, and advancement.

This work is licensed under a Creative Commons Attribution 2.0 UK: England & Wales License.

Abrams, Stephen, John Kratz, Stephanie Simms, Marisa Strong, and Perry Willett. "Dash: Data Sharing Made Easy at the University of California." International Journal of Digital Curation 11, no. 1 (2016): 118-127.

Scholars at the ten campuses of the University of California system, like their academic peers elsewhere, increasingly are being asked to ensure that data resulting from their research and teaching activities are subject to effective long-term management, public discovery, and retrieval. The new academic imperative for research data management (RDM) stems from mandates from public and private funding agencies, pre-publication requirements, institutional policies, and evolving norms of scholarly discourse. In order to meet these new obligations, scholars need access to appropriate disciplinary and institutional tools, services, and guidance. When providing help in these areas, it is important that service providers recognize the disparity in scholarly familiarity with data curation concepts and practices. While the UC Curation Center (UC3) at the California Digital Library supports a growing roster of innovative curation services for University use, most were intended originally to meet the needs of institutional information professionals, such as librarians, archivists, and curators. In order to address the new curation concerns of individual scholars, UC3 realized that it needed to deploy new systems and services optimized for stakeholders with widely divergent experiences, expertise, and expectations. This led to the development of Dash, an online data publication service making campus data sharing easy. While Dash gives the appearance of being a full-fledged repository, in actuality it is only a lightweight overlay layer that sits on top of standards-compliant repositories, such as UC3's existing Merritt curation repository. The Dash service offers intuitive, easy-to-use interfaces for dataset submission, description, publication, and discovery. By imposing minimal prescriptive eligibility and submission requirements; automating and hiding the mechanical details of DOI assignment, data packaging, and repository deposit; and featuring a streamlined, self-service user experience that can be integrated easily into scholarly workflows, Dash is an important new service offering with which UC scholars can meet their RDM obligations.

This work is licensed under a Creative Commons Attribution 2.0 UK: England & Wales License.

Accomazzi, Alberto, Edwin Henneken, Christopher Erdmann, and Arnold Rots."Telescope Bibliographies: An Essential Component of Archival Data Management and Operations." Proceedings of SPIE 8448 (2012): 84480K-1-84480K-10.

Adamick, Jessica, Rebecca C. Reznik-Zellen, and Matt Sheridan. "Data Management Training for Graduate Students at a Large Research University." Journal of eScience Librarianship 1, no. 3 (2013): e1022.

Adams, Sam, and Peter Murray-Rust. "Chempound—A Web 2.0-Inspired Repository for Physical Science Data." Journal of Digital Information 13, no. 1 (2012).

Addison, Aaron, and Jennifer Moore. "Teaching Users to Work with Research Data: Case Studies in Architecture, History and Social Work." IASSIST Quarterly 39, no. 4 (2015): 39-43.

Addison, Aaron, Jennifer Moore, and Cynthia Hudson-Vitale. "Forging Partnerships: Foundations of Geospatial Data Stewardship." Journal of Map & Geography Libraries 11, no. 3 (2015): 359-375.

Adrian, Burton, Koers Hylke, Manghi Paolo, Bruzzo Sandro La, Aryani Amir, Diepenbroek Michael, and Schindler Uwe. "The Data-Literature Interlinking Service: Towards a Common Infrastructure for Sharing Data-Article Links." Program 51, no. 1 (2017): 75-100.

Akers, Katherine G. "Going Beyond Data Management Planning: Comprehensive Research Data Services." College & Research Libraries News 75, no. 8 (2014): 435-436.

———. "Looking Out for the Little Guy: Small Data Curation." Bulletin of the American Society for Information Science and Technology 39, no. 3 (2013): 58-59.

Akers, Katherine G., and Jennifer Doty. "Differences among Faculty Ranks in Views on Research Data Management." IASSIST Quarterly 36 (2012): 16-20.

———. "Disciplinary Differences in Faculty Research Data Management Practices and Perspectives." International Journal of Digital Curation 8, no. 2 (2013): 5-26.

Academic librarians are increasingly engaging in data curation by providing infrastructure (e.g., institutional repositories) and offering services (e.g., data management plan consultations) to support the management of research data on their campuses. Efforts to develop these resources may benefit from a greater understanding of disciplinary differences in research data management needs. After conducting a survey of data management practices and perspectives at our research university, we categorized faculty members into four research domains—arts and humanities, social sciences, medical sciences, and basic sciences—and analyzed variations in their patterns of survey responses. We found statistically significant differences among the four research domains for nearly every survey item, revealing important disciplinary distinctions in data management actions, attitudes, and interest in support services. Serious consideration of both the similarities and dissimilarities among disciplines will help guide academic librarians and other data curation professionals in developing a range of data-management services that can be tailored to the unique needs of different scholarly researchers.

This work is licensed under a Creative Commons Attribution License.

Akers, Katherine G., and Jennifer A. Green. "Towards a Symbiotic Relationship between Academic Libraries and Disciplinary Data Repositories: A Dryad and University of Michigan Case Study." International Journal of Digital Curation 9, no. 1 (2014): 119-131.

In addition to encouraging the deposit of research data into institutional data repositories, academic librarians can further support research data sharing by facilitating the deposit of data into external disciplinary data repositories.

In this paper, we focus on the University of Michigan Library and Dryad, a repository for scientific and medical data, as a case study to explore possible forms of partnership between academic libraries and disciplinary data repositories. We found that although few University of Michigan researchers have submitted data to Dryad, many have recently published articles in Dryad-integrated journals, suggesting significant opportunities for Dryad use on our campus. We suggest that academic libraries could promote the sharing and preservation of science and medical data by becoming Dryad members, purchasing vouchers to cover researchers' data submission costs, and hosting local curators who could directly work with campus researchers to improve the accuracy and completeness of data packages and thereby increase their potential for re-use.

By enabling the use of both institutional and disciplinary data repositories, we argue that academic librarians can achieve greater success in capturing the vast amounts of data that presently fail to depart researchers' hands and making that data visible to relevant communities of interest.

This work is licensed under a Creative Commons Attribution 2.0 UK: England & Wales License.

Akers, Katherine G., Fe C. Sferdean, Natsuko H. Nicholls, and Jennifer A. Green. "Building Support for Research Data Management: Biographies of Eight Research Universities." International Journal of Digital Curation 9, no. 2 (2014): 171-191.

Academic research libraries are quickly developing support for research data management (RDM), including both new services and infrastructure. Here, we tell the stories of how eight different universities have developed programs of RDM support, focusing on the prominent role of the library in educating and assisting researchers with managing their data throughout the research lifecycle. Based on these stories, we construct timelines for each university depicting key steps in building support for RDM, and we discuss similarities and dissimilarities among universities in motivation to provide RDM support, collaborations among campus units, assessment of needs and services, and changes in staffing.

This work is licensed under a Creative Commons Attribution 2.0 UK: England & Wales License.

Akmon, Dharma, Ann Zimmerman, Morgan Daniels, and Margaret Hedstrom. "The Application of Archival Concepts to a Data-Intensive Environment: Working with Scientists to Understand Data Management and Preservation Needs." Archival Science 11, no. 3/4 (2011): 329-348.

Albani, Sergio, and David Giaretta. "Long-Term Preservation of Earth Observation Data and Knowledge in ESA through CASPAR." International Journal of Digital Curation 4, no. 3 (2009): 4-16.

Aleixandre-Benaven, Rafael, Luz María Moreno-Solano, Antonia Ferrer Sapena, and Enrique Alfonso Sánchez Pérez. "Correlation between Impact Factor and Public Availability of Published Research Data in Information Science and Library Science Journals." Scientometrics 107, no. 1 (2016): 1-13.

Allard, Suzie. "DataONE: Facilitating eScience through Collaboration." Journal of eScience Librarianship 1, no. 1 (2012): e1004.

Alma, Bridget. "Perseids: Experimenting with Infrastructure for Creating and Sharing Research Data in the Digital Humanities." Data Science Journal 16, no. 19 (2017).

The Perseids project provides a platform for creating, publishing, and sharing research data, in the form of textual transcriptions, annotations and analyses. An offshoot and collaborator of the Perseus Digital Library (PDL), Perseids is also an experiment in reusing and extending existing infrastructure, tools, and services. This paper discusses infrastructure in the domain of digital humanities (DH). It outlines some general approaches to facilitating data sharing in this domain, and the specific choices we made in developing Perseids to serve that goal. It concludes by identifying lessons we have learned about sustainability in the process of building Perseids, noting some critical gaps in infrastructure for the digital humanities, and suggesting some implications for the wider community.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Alqasab, Mariam, Suzanne M. Embury, and Sandra de F. Mendes Sampaio. "Amplifying Data Curation Efforts to Improve the Quality of Life Science Data." International Journal of Digital Curation 12, no. 1 (2017): 1-12.

In the era of data science, datasets are shared widely and used for many purposes unforeseen by the original creators of the data. In this context, defects in datasets can have far reaching consequences, spreading from dataset to dataset, and affecting the consumers of data in ways that are hard to predict or quantify. Some form of waste is often the result. For example, scientists using defective data to propose hypotheses for experimentation may waste their limited wet lab resources chasing the wrong experimental targets. Scarce drug trial resources may be used to test drugs that actually have little chance of giving a cure.

Because of the potential real world costs, database owners care about providing high quality data. Automated curation tools can be used to an extent to discover and correct some forms of defect. However, in some areas human curation, performed by highly-trained domain experts, is needed to ensure that the data represents our current interpretation of reality accurately. Human curators are expensive, and there is far more curation work to be done than there are curators available to perform it. Tools and techniques are needed to enable the full value to be obtained from the curation effort currently available.

In this paper,we explore one possible approach to maximising the value obtained from human curators, by automatically extracting information about data defects and corrections from the work that the curators do. This information is packaged in a source independent form, to allow it to be used by the owners of other databases (for which human curation effort is not available or is insufficient). This amplifies the efforts of the human curators, allowing their work to be applied to other sources, without requiring any additional effort or change in their processes or tool sets. We show that this approach can discover significant numbers of defects, which can also be found in other sources.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Altman, Micah, Margaret O. Adams, Jonathan Crabtree, Darrell Donakowski, Marc Maynard, Amy Pienta, and Copeland H. Young. "Digital Preservation through Archival Collaboration: The Data Preservation Alliance for the Social Sciences." American Archivist 72, no. 1 (2009): 170-184.

Altman, Micah, Christine Borgman, Mercè Crosas, and Maryann Matone. "An Introduction to the Joint Principles for Data Citation." Bulletin of the Association for Information Science and Technology 41, no. 3 (2015): 43-45.

Altman, Micah, Eleni Castro, Mercè Crosas, Philip Durbin, Alex Garnett, and Jen Whitney. "Open Journal Systems and Dataverse Integration—Helping Journals to Upgrade Data Publication for Reusable Research." Code4Lib Journal, no. 30 (2015).

This article describes the novel open source tools for open data publication in open access journal workflows. This comprises a plugin for Open Journal Systems that supports a data submission, citation, review, and publication workflow; and an extension to the Dataverse system that provides a standard deposit API. We describe the function and design of these tools, provide examples of their use, and summarize their initial reception. We conclude by discussing future plans and potential impact.

This work is licensed under a Creative Commons Attribution 3.0 United States License.

Altman, Micah, and Mercè Crosas. "The Evolution of Data Citation: From Principles to Implementation." IASSIST Quarterly 37, no. 1-4 (2013): 62-70.

Altman, Micah, and Gary King. "A Proposed Standard for the Scholarly Citation of Quantitative Data." D-Lib Magazine 13, no. 3/4 (2007).

Anastasiadis, Stergios V., Syam Gadde, and Jeffrey S. Chase. "Scale and Performance in Semantic Storage Management of Data Grids." International Journal on Digital Libraries 5, no. 2 (2005): 84-98.

Anderson, W. L. "Some Challenges and Issues in Managing, and Preserving Access to, Long Lived Collections of Digital Scientific and Technical Data." Data Science Journal 3 (2004): 191-201.

One goal of the Committee on Data for Science and Technology is to solicit information about, promote discussion of, and support action on the many issues related to scientific and technical data preservation, archiving, and access. This brief paper describes four broad categories of issues that help to organize discussion, learning, and action regarding the work needed to support the long-term preservation of, and access to, scientific and technical data. In each category, some specific issues and areas of concern are described.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Andreoli-Versbach, Patrick, and Frank Mueller-Langer. "Open Access to Data: An Ideal Professed but Not Practised." Research Policy 43, no. 9 (2014): 1621-1633.

Androulakis, Steve, Ashley M. Buckle, Ian Atkinson, David Groenewegen, Nick Nicholas, Andrew Treloar, and Anthony Beitz. "ARCHER—e-Research Tools for Research Data Management." International Journal of Digital Curation 4, no. 1 (2009): 22-33.

Angevaare, Inge. "Taking Care of Digital Collections and Data: 'Curation' and Organisational Choices for Research Libraries." LIBER Quarterly: The Journal of European Research Libraries 19, no. 1 (2009): 1-12.

This article explores the types of digital information research libraries typically deal with and what factors might influence libraries' decisions to take on the work of data curation themselves, to take on the responsibility for data but market out the actual work, or to leave the responsibility to other organisations. The article introduces the issues dealt with in the LIBER Workshop 'Curating Research' to be held in The Hague on 17 April 2009 ( and this corresponding issue of LIBER Quarterly.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Aquino, Janine, John Allison, Robert Rilling, Don Stott, Kathryn Young, and Michael Daniels. "Motivation and Strategies for Implementing Digital Object Identifiers (DOIs) at NCAR’s Earth Observing Laboratory—Past Progress and Future Collaborations." Data Science Journal 16, no. 7 (2017).

In an effort to lead our community in following modern data citation practices by formally citing data used in published research and implementing standards to facilitate reproducible research results and data, while also producing meaningful metrics that help assess the impact of our services, the National Center for Atmospheric Research (NCAR) Earth Observing Laboratory (EOL) has implemented the use of Digital Object Identifiers (DOIs) (DataCite 2017) for both physical objects (e.g., research platforms and instruments) and datasets. We discuss why this work is important and timely, and review the development of guidelines for the use of DOIs at EOL by focusing on how decisions were made. We discuss progress in assigning DOIs to physical objects and datasets, summarize plans to cite software, describe a current collaboration to develop community tools to display citations on websites, and touch on future plans to cite workflows that document dataset processing and quality control. Finally, we will review the status of efforts to engage our scientific community in the process of using DOIs in their research publications.

Arora, Ritu, Maria Esteva, and Jessica Trelogan. "Leveraging High Performance Computing for Managing Large and Evolving Data Collections." International Journal of Digital Curation 9, no. 2 (2014): 17-27.

The process of developing a digital collection in the context of a research project often involves a pipeline pattern during which data growth, data types, and data authenticity need to be assessed iteratively in relation to the different research steps and in the interest of archiving. Throughout a project's lifecycle curators organize newly generated data while cleaning and integrating legacy data when it exists, and deciding what data will be preserved for the long term. Although these actions should be part of a well-oiled data management workflow, there are practical challenges in doing so if the collection is very large and heterogeneous, or is accessed by several researchers contemporaneously. There is a need for data management solutions that can help curators with efficient and on-demand analyses of their collection so that they remain well-informed about its evolving characteristics. In this paper, we describe our efforts towards developing a workflow to leverage open science High Performance Computing (HPC) resources for routinely and efficiently conducting data management tasks on large collections. We demonstrate that HPC resources and techniques can significantly reduce the time for accomplishing critical data management tasks, and enable a dynamic archiving throughout the research process. We use a large archaeological data collection with a long and complex formation history as our test case. We share our experiences in adopting open science HPC resources for large-scale data management, which entails understanding usage of the open source HPC environment and training users. These experiences can be generalized to meet the needs of other data curators working with large collections.

This work is licensed under a Creative Commons Attribution 2.0 UK: England & Wales License.

Arsev Umur, Aydinoglu, Dogan Guleda, and Taskin Zehra. "Research Data Management in Turkey: Perceptions and Practices." Library Hi Tech 35, no. 2 (2017): 271-289.

Aschenbrenner, Andreas, Harry Enke, Thomas Fischer, and Jens Ludwig. "Diversity and Interoperability of Repositories in a Grid Curation Environment." Journal of Digital Information 12, no. 2 (2011).

Asher, Andrew, and Lori M. Jahnke. "Curating the Ethnographic Moment." Archive Journal, no. 3 (2013).

Ashley, Kevin. "Data Quality and Curation." Data Science Journal 12 (2013): GRDI65-GRDI68.

Data quality is an issue that touches on every aspect of the research data landscape and is therefore appropriate to examine in the context of planning for future research data infrastructures. As producers, researchers want to believe that they produce high quality data; as consumers, they want to obtain data of the highest quality. Data centres typically have stringent controls to ensure that they only acquire and disseminate data of the highest quality. Data managers will usually say that they improve the quality of the data they are responsible for. Much of the infrastructure that will emit, transform, integrate, visualise, manage, analyse, and disseminate data during its life will have dependencies, explicit or implicit, on the quality of the data it is dealing with.

This work is licensed under a Creative Commons Attribution 4.0 International License.

——— "Research Data And Libraries: Who Does What." Insights: the UKSG Journal 25, no. 2 (2012): 155-157.

A range of external pressures are causing research data management (RDM) to be an increasing concern at senior level in universities and other research institutions. But as well as external pressures, there are also good reasons for establishing effective research data management services within institutions which can bring benefits to researchers, their institutions and those who publish their research. In this article some of these motivating factors, both positive and negative, are described. Ways in which libraries can play a role—or even lead—in the development of RDM services that work within the institution and as part of a national and international research data infrastructure are also set out.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Assante, Massimiliano, Leonardo Candela, Donatella Castelli, and Alice Tani. "Are Scientific Data Repositories Coping with Research Data Publishing?" Data Science Journal 15, no. 6 (2016): 1-24.

Research data publishing is intended as the release of research data to make it possible for practitioners to (re)use them according to "open science" dynamics. There are three main actors called to deal with research data publishing practices: researchers, publishers, and data repositories. This study analyses the solutions offered by generalist scientific data repositories, i.e., repositories supporting the deposition of any type of research data. These repositories cannot make any assumption on the application domain. They are actually called to face with the almost open ended typologies of data used in science. The current practices promoted by such repositories are analysed with respect to eight key aspects of data publishing, i.e., dataset formatting, documentation, licensing, publication costs, validation, availability, discovery and access, and citation. From this analysis it emerges that these repositories implement well consolidated practices and pragmatic solutions for literature repositories. These practices and solutions can not totally meet the needs of management and use of datasets resources, especially in a context where rapid technological changes continuously open new exploitation prospects.

This work is licensed under a Attribution 4.0 International License.

Austin, Claire C., Theodora Bloom, Sünje Dallmeier-Tiessen, Varsha K. Khodiyar, Fiona Murphy, Amy Nurnberger, Lisa Raymond, Martina Stockhause, Jonathan Tedds, Mary Vardigan, and Angus Whyte. "Key Components of Data Publishing: Using Current Best Practices to Develop a Reference Model for Data Publishing." International Journal on Digital Libraries 18, no. 2 (2017): 77-92.

Austin, Claire C., Susan Brown, Nancy Fong, Chuck Humphrey, Amber Leahey, and Peter Webster. "Research Data Repositories: Review of Current Features, Gap Analysis, and Recommendations for Minimum Requirements." IASSIST Quarterly 39, no. 4 (2015): 24-38.

Bache, Richard, Simon Miles, Bolaji Coker, and Adel Taweel. "Informative Provenance for Repurposed Data: A Case Study using Clinical Research Data." International Journal of Digital Curation 8, no. 2 (2013): 27-46.

The task repurposing of heterogeneous, distributed data for originally unintended research objectives is a non-trivial problem because the mappings required may not be precise. A particular case is clinical data collected for patient care being used for medical research. The fact that research repositories will record data differently means that assumptions must be made as how to transform of this data. Records of provenance that document how this process has taken place will enable users of the data warehouse to utilise the data appropriately and ensure that future data added from another source is transformed using comparable assumptions. For a provenance-based approach to be reusable and supportable with software tools, the provenance records must use a well-defined model of the transformation process. In this paper, we propose such a model, including a classification of the individual 'sub-functions' that make up the overall transformation. This model enables meaningful provenance data to be generated automatically. A case study is used to illustrate this approach and an initial classification of transformations that alter the information is created.

This work is licensed under a Creative Commons Attribution License.

Baker, Karen S., Ruth E. Duerr, and Mark A. Parsons. "Scientific Knowledge Mobilization: Co-evolution of Data Products and Designated Communities." International Journal of Digital Curation 10, no. 2 (2015): 110-135.

Digital data are accumulating rapidly, yet issues relating to data production remain unexamined. Data sharing efforts in particular are nascent, disunited and incomplete. We investigate the development of data products tailored for diverse communities with differing knowledge bases. We explore not the technical aspects of how, why, or where data are made available, but rather the socio-scientific aspects influencing what data products are created and made available for use. These products differ from compact data summaries often published in journals. We report on development by a national data center of two data collections describing the changing polar environment. One collection characterizes sea ice products derived from satellite remote sensing data and development unfolds over three decades. The second collection characterizes the Greenland Ice Sheet melt where development of an initial collection of data products over a period of several months was informed by insights gained from earlier experience. In documenting the generation of these two collections, a data product development cycle supported by a data product team is identified as key to mobilizing scientific knowledge. The collections reveal a co-evolution of data products and designated communities where community interest may be triggered by events such as environmental disturbance and new modes of communication. These examples of data product development in practice illustrate knowledge mobilization in the earth sciences; the collections create a bridge between data producers and a growing number of audiences interested in making evidence-based decisions.

This work is licensed under a Creative Commons Attribution 2.0 UK: England & Wales License.

Baker, Karen S., and Lynn Yarmey. "Data Stewardship: Environmental Data Curation and a Web-of-Repositories." International Journal of Digital Curation 4, no. 2 (2009): 12-27.

Balkestein, Marjan, and Heiko Tjalsma. "The ADA Approach: Retro-archiving Data in an Academic Environment." Archival Science 7, no. 1 (2007): 89-105.

Ball, Alexander, Kevin Ashley, Patrick McCann, Laura Molloy, and Veerle Van den Eynden. "Show Me The Data: The Pilot UK Research Data Registry." International Journal of Digital Curation 9, no. 1 (2014): 132-141.

The UK Research Data (Metadata) Registry (UKRDR) pilot project is implementing a prototype registry for the UK's research data assets, enabling the holdings of subject-based data centres and institutional data repositories alike to be searched from a single location. The purpose of the prototype is to prove the concept of the registry, and uncover challenges that will need to be addressed if and when the registry is developed into a sustainable service. The prototype is being tested using metadata records harvested from nine UK data centres and the data repositories of nine UK universities.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Ball, Alexander, Sean Chen, Jane Greenberg, Cristina Perez, Keith Jeffery, and Rebecca Koskela. "Building a Disciplinary Metadata Standards Directory." International Journal of Digital Curation 9, no. 1 (2014): 142-151.

The Research Data Alliance (RDA) Metadata Standards Directory Working Group (MSDWG) is building a directory of descriptive, discipline-specific metadata standards. The purpose of the directory is to promote the discovery, access and use of such standards, thereby improving the state of research data interoperability and reducing duplicative standards development work.

This work builds upon the UK Digital Curation Centre's Disciplinary Metadata Catalogue, a resource created with much the same aim in mind. The first stage of the MSDWG's work was to update and extend the information contained in the catalogue. In the current, second stage, a new platform is being developed in order to extend the functionality of the directory beyond that of the catalogue, and to make it easier to maintain and sustain. Future work will include making the directory more amenable to use by automated tools.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Ball, Alexander, Mansur Darlington, Thomas Howard, Chris McMahon, and Steve Culley. "Visualizing Research Data Records for Their Better Management." Journal of Digital Information 13, no. 1 (2012).

Ball, Joanna. "Research Data Management for Libraries: Getting Started." Insights: The UKSG journal 26, no. 3 (2013): 256-260.

Many libraries are keen to take on new roles in providing support for effective research data management (RDM), but lack the necessary skills and resources to do so. This article explores the approach used by the University of Sussex to engage with academic departments about their RDM practices and requirements in order to develop relevant library support services. It describes a project undertaken with three Academic Schools to inform a list of recommendations for senior management, to include areas which should be taken forward by the Library, IT and Research Office in order to create a sustainable RDM service. The article is unflinchingly honest in sharing the differing reactions to the project and the lessons learnt along the way.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Barateiro, José, Gonçalo Antunes, Manuel Cabral, José Borbinha, and Rodrigo Rodrigues. "Digital Preservation of Scientific Data." Lecture Notes in Computer Science 5173 (2008): 388-391.

———. "Using a Grid for Digital Preservation." Lecture Notes in Computer Science 5362 (2008): 225-235.

Barbrow, Sarah, Denise Brush, and Julie Goldman. "Research Data Management and Services: Resources for Novice Data Librarians." College & Research Libraries News 78, no. 5 (2017): 274-278.

Bardi, Alessia, and Paolo Manghi. "Enhanced Publications: Data Models and Information Systems." LIBER Quarterly 23, no. 4 (2014): 240-273.

"Enhanced publications" are commonly intended as digital publications that consist of a mandatory narrative part (the description of the research conducted) plus related "parts", such as datasets, other publications, images, tables, workflows, devices. The state-of-the-art on information systems for enhanced publications has today reached the point where some kind of common understanding is required, in order to provide the methodology and language for scientists to compare, analyse, or simply discuss the multitude of solutions in the field. In this paper, we thoroughly examined the literature with a two-fold aim: firstly, introducing the terminology required to describe and compare structural and semantic features of existing enhanced publication data models; secondly, proposing a classification of enhanced publication information systems based on their main functional goals.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Bardyn, Tania P., Taryn Resnick, and Susan K. Camina. "Translational Researchers' Perceptions of Data Management Practices and Data Curation Needs: Findings from a Focus Group in an Academic Health Sciences Library." Journal of Web Librarianship 6, no. 4 (2012): 274-287.

Baru, Chaitanya. "Sharing and Caring of eScience Data." International Journal on Digital Libraries 7, no. 1/2 (2007): 113-116.

Baum, Benjamin, R. Bauer Christian, Thomas Franke, Harald Kusch, Marcel Parciak, Thorsten Rottmann, Nadine Umbach, and Ulrich Sax. "Opinion Paper: Data Provenance Challenges in Biomedical Research." it—Information Technology 59, no. 4 (2017): 191-196.

Baykoucheva, Svetla. Managing Scientific Information and Research Data. Elsevier: Waltham, MA, 2015.

Beagrie, Neil, Robert Beagrie, and Ian Rowlands. "Research Data Preservation and Access: The Views of Researchers." Ariadne, no. 60 (2009).

Beagrie, Neil, Julia Chruszcz, and Brian Lavoie. Keeping Research Data Safe: A Cost Model and Guidance for UK Universities. London: JISC, 2008.

Beagrie, Neil, and John Houghton. The Value and Impact of Data Sharing and Curation: A Synthesis of Three Recent Studies of UK Research Data Centres. London: JISC, 2014.

Beale, Gareth, and Hembo Pagi. Datapool Imaging Case Study: Final Report. Southampton: University of Southampton, 2013.

Beaujardière, Jeff De La. "NOAA Environmental Data Management." Journal of Map & Geography Libraries 12, no. 1 (2016): 5-27.

Beckett, Mark G., Chris R. Allton, Christine T. H. Davies, Ilan Davis, Jonathan M. Flynn, Eilidh J. Grant, Russell S. Hamilton, Alan C. Irving, R. D. Kenway, Radoslaw H. Ostrowski, James T. Perry, Jason R. Swedlow, and Arthur Trew. "Building a Scientific Data Grid with DiGS." Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 367, no. 1897 (2009): 2471-2481.

Belter, Christopher W. "Measuring the Value of Research Data: A Citation Analysis of Oceanographic Data Sets." PLoS ONE 9, no. 3 (2014): e92590.

Evaluation of scientific research is becoming increasingly reliant on publication-based bibliometric indicators, which may result in the devaluation of other scientific activities—such as data curation—that do not necessarily result in the production of scientific publications. This issue may undermine the movement to openly share and cite data sets in scientific publications because researchers are unlikely to devote the effort necessary to curate their research data if they are unlikely to receive credit for doing so. This analysis attempts to demonstrate the bibliometric impact of properly curated and openly accessible data sets by attempting to generate citation counts for three data sets archived at the National Oceanographic Data Center. My findings suggest that all three data sets are highly cited, with estimated citation counts in most cases higher than 99% of all the journal articles published in Oceanography during the same years. I also find that methods of citing and referring to these data sets in scientific publications are highly inconsistent, despite the fact that a formal citation format is suggested for each data set. These findings have important implications for developing a data citation format, encouraging researchers to properly curate their research data, and evaluating the bibliometric impact of individuals and institutions.

This work is licensed under a Creative Commons Public Domain Dedication.

Bender, Stefam, and Jorg Heining. "The Research-Data-Centre in Research-Data-Centre Approach: A First Step towards Decentralised International Data Sharing." IASSIST Quarterly 35, no. 3 (2011): 10-16.

Berman, Elizabeth A. "An Exploratory Sequential Mixed Methods Approach to Understanding Researchers' Data Management Practices at UVM: Integrated Findings to Develop Research Data Services." Journal of eScience Librarianship 5, no. 1 (2017): e1104.

Berman, Francine. "Got Data? A Guide to Data Preservation in the Information Age." Communications of the ACM 51, no. 12 (2008): 50-56.

Bethune, Alec, Butch Lazorchak, and Zsolt Nagy. "GeoMAPP: A Geospatial Multistate Archive and Preservation Partnership." Journal of Map & Geography Libraries 6, no. 1 (2009): 45-56.

Bird, Colin, Simon Coles, Iris Garrelfs, Tom Griffin, Magnus Hagdorn, Graham Klyne, Mike Mineter, and Cerys Willoughby. "Using Metadata Actively." International Journal of Digital Curation 11, no. 1 (2016): 76-85.

Almost all researchers collect and preserve metadata, although doing so is often seen as a burden. However, when that metadata can be, and is, used actively during an investigation or creative process, the benefits become apparent instantly. Active use can arise in various ways, several of which are being investigated by the Collaboration for Research Enhancement by Active use of Metadata (CREAM) project, which was funded by Jisc as part of their Research Data Spring initiative. The CREAM project is exploring the concept through understanding the active use of metadata by the partners in the collaboration. This paper explains what it means to use metadata actively and describes how the CREAM project characterises active use by developing use cases that involve documenting the key decision points during a process. Well-documented processes are accordingly more transparent, reproducible, and reusable.

This work is licensed under a Creative Commons Attribution 2.0 UK: England & Wales License.

Bird, Colin L., Cerys Willoughby, Simon J. Coles, and Jeremy G. Frey. "Data Curation Issues in the Chemical Sciences." Information Standards Quarterly 25, no. 3 (2013): 4-12.

Bishoff, Carolyn, and Lisa Johnston. "Approaches to Data Sharing: An Analysis of NSF Data Management Plans from a Large Research University." Journal of Librarianship and Scholarly Communication 3, no. 2 (2015): eP1231.

INTRODUCTION Sharing digital research data is increasingly common, propelled by funding requirements, journal publishers, local campus policies, or community-driven expectations of more collaborative and interdisciplinary research environments. However, it is not well understood how researchers are addressing these expectations and whether they are transitioning from individualized practices to more thoughtful and potentially public approaches to data sharing that will enable reuse of their data. METHODS The University of Minnesota Libraries conducted a local opt-in study of data management plans (DMPs) included in funded National Science Foundation (NSF) grant proposals from January 2011 through June 2014. In order to understand the current data management and sharing practices of campus researchers, we solicited, coded, and analyzed 182 DMPs, accounting for 41% of the total number of plans available. RESULTS DMPs from seven colleges and academic units were included. The College of Science of Engineering accounted for 70% of the plans in our review. While 96% of DMPs mentioned data sharing, we found a variety of approaches for how PIs shared their data, where data was shared, the intended audiences for sharing, and practices for ensuring long-term reuse. CONCLUSION DMPs are useful tools to investigate researchers' current plans and philosophies for how research outputs might be shared. Plans and strategies for data sharing are inconsistent across this sample, and researchers need to better understand what kind of sharing constitutes public access. More intervention is needed to ensure that researchers implement the sharing provisions in their plans to the fullest extent possible. These findings will help academic libraries develop practical, targeted data services for researchers that aim to increase the impact of institutional research.

This work is licensed under a Creative Commons Attribution 4.0 License.

Bishop, Bradley Wade, Tony H. Grubesic, and Sonya Prasertong. "Digital Curation and the GeoWeb: An Emerging Role for Geographic Information Librarians." Journal of Map & Geography Libraries: Advances in Geospatial Information, Collections & Archives 9, no. 3 (2013): 296-312.

Bishop, Libby, and Arja Kuula-Luumi. "Revisiting Qualitative Data Reuse." SAGE Open 7, no. 1 (2017): 2158244016685136.

Secondary analysis of qualitative data entails reusing data created from previous research projects for new purposes. Reuse provides an opportunity to study the raw materials of past research projects to gain methodological and substantive insights. In the past decade, use of the approach has grown rapidly in the United Kingdom to become sufficiently accepted that it must now be regarded as mainstream. Several factors explain this growth: the open data movement, research funders’ and publishers’ policies supporting data sharing, and researchers seeing benefits from sharing resources, including data. Another factor enabling qualitative data reuse has been improved services and infrastructure that facilitate access to thousands of data collections. The UK Data Service is an example of a well-established facility; more recent has been the proliferation of repositories being established within universities. This article will provide evidence of the growth of data reuse in the United Kingdom and in Finland by presenting both data and case studies of reuse that illustrate the breadth and diversity of this maturing research method. We use two distinct data sources that quantify the scale, types, and trends of reuse of qualitative data: (a) downloads of archived data collections held at data repositories and (b) publication citations. Although the focus of this article is on the United Kingdom, some discussion of the international environment is provided, together with data and examples of reuse at the Finnish Social Science Data Archive. The conclusion summarizes the major findings, including some conjectures regarding what makes qualitative data attractive for reuse and sharing.

This work is licensed under a Creative Commons Attribution 3.0 Unported License.

Borgman, Christine L. "The Conundrum of Sharing Research Data." Journal of the American Society for Information Science and Technology 63, no. 6 (2012): 1059-1078.

Borgman, Christine L., Milena S. Golshan, Ashley E. Sands, Jillian C. Wallis, Rebekah L. Cummings, Peter T. Darch, and Bernadette M. Randles. "Data Management in the Long Tail: Science, Software, and Service." International Journal of Digital Curation 11, no. 1 (2016): 128-149.

Scientists in all fields face challenges in managing and sustaining access to their research data. The larger and longer term the research project, the more likely that scientists are to have resources and dedicated staff to manage their technology and data, leaving those scientists whose work is based on smaller and shorter term projects at a disadvantage. The volume and variety of data to be managed varies by many factors, only two of which are the number of collaborators and length of the project. As part of an NSF project to conceptualize the Institute for Empowering Long Tail Research, we explored opportunities offered by Software as a Service (SaaS). These cloud-based services are popular in business because they reduce costs and labor for technology management, and are gaining ground in scientific environments for similar reasons. We studied three settings where scientists conduct research in small and medium-sized laboratories. Two were NSF Science and Technology Centers (CENS and C-DEBI) and the third was a workshop of natural reserve scientists and managers. These laboratories have highly diverse data and practices, make minimal use of standards for data or metadata, and lack resources for data management or sustaining access to their data, despite recognizing the need. We found that SaaS could address technical needs for basic document creation, analysis, and storage, but did not support the diverse and rapidly changing needs for sophisticated domain-specific tools and services. These are much more challenging knowledge infrastructure requirements that require long-term investments by multiple stakeholders.

This work is licensed under a Creative Commons Attribution 2.0 UK: England & Wales License.

Borgman, Christine L., Jillian C. Wallis, and Noel Enyedy. "Little Science Confronts the Data Deluge: Habitat Ecology, Embedded Sensor Networks, and Digital Libraries." International Journal on Digital Libraries 7, no. 1 (2007): 17-30.

Borgman, Christine L., Jillian C. Wallis, and Matthew S. Mayernik. "Who's Got the Data? Interdependencies in Science and Technology Collaborations." Computer Supported Cooperative Work 21, no. 6 (2012): 485-523.

Bracke, Marianne Stowell. "Emerging Data Curation Roles for Librarians: A Case Study of Agricultural Data." Journal of Agricultural & Food Information 12, no. 1 (2011): 65-74.

Bradić-Martinović, Aleksandra, and Aleksandar Zdravković. "Researchers' Interest in Data Service in Bosnia and Herzegovina, Croatia, and Serbia." IASSIST Quarterly 38, no. 2 (2014): 22-28.

Brandt, D. Scott, and Eugenia Kim. "Data Curation Profiles as a Means to Explore Managing, Sharing, Disseminating or Preserving Digital Outcomes." International Journal of Performance Arts and Digital Media 10, no. 1 (2014): 21-34.

Bresnahan, Megan M., and Andrew M. Johnson. "Assessing Scholarly Communication and Research Data Training Needs." Reference Services Review 41, no. 3 (2013): 413-433.

Brewerton, Gary. "Research Data Management: A Case Study." Ariadne, no. 74 (2015).

Briney, Kristin. Data Management for Researchers: Organize, Maintain and Share Your Data for Research Success Pelagic Publishing, 2015.

Briney, Kristin, Abigail Goben, and Lisa Zilinski. "Do You Have an Institutional Data Policy? A Review of the Current Landscape of Library Data Services and Institutional Data Policies." Journal of Librarianship and Scholarly Communication 3, no. 2 (2015): eP1232.

INTRODUCTION Many research institutions have developed research data services in their libraries, often in anticipation of or in response to funder policy. However, policies at the institution level are either not well known or nonexistent. METHODS This study reviewed library data services efforts and institutional data policies of 206 American universities, drawn from the July 2014 Carnegie list of universities with "Very High" or "High" research activity designation. Twenty-four different characteristics relating to university type, library data services, policy type, and policy contents were examined. RESULTS The study has uncovered findings surrounding library data services, institutional data policies, and content within the policies. DISCUSSION Overall, there is a general trend toward the development and implementation of data services within the university libraries. Interestingly, just under half of the universities examined had a policy of some sort that either specified or mentioned research data. Many of these were standalone data policies, while others were intellectual property policies that included research data. When data policies were discoverable, not behind a log in, they focused on the definition of research data, data ownership, data retention, and terms surrounding the separation of a researcher from the institution. CONCLUSION By becoming well versed on research data policies, librarians can provide support for researchers by navigating the policies at their institutions, facilitating the activities needed to comply with the requirements of research funders and publishers. This puts academic libraries in a unique position to provide insight and guidance in the development and revisions of institutional data policies.

This work is licensed under a Creative Commons Attribution 4.0 License.

Broeder, Daan, and Laurence Lannom. "Data Type Registries: A Research Data Alliance Working Group." D-Lib Magazine 20, no. 1/2 (2014).

Brown, Rebecca A., Malcolm Wolski, and Joanna Richardson. "Developing New Skills For Research Support Librarians." The Australian Library Journal 64, no. 3 (2015): 224-234.

Brownlee, Rowan. "Research Data and Repository Metadata: Policy and Technical Issues at the University of Sydney Library." Cataloging & Classification Quarterly 47, no. 3/4 (2009): 370-379.

Burgess, Lucie, Neil Jefferies, Sally Rumsey, John Southall, David Tomkins, and James A. J. Wilson. "From Compliance to Curation: ORA-Data at the University of Oxford." Alexandria 26, no. 2 (2016): 107-135.

Burgi, Pierre-Yves, Eliane Blumer, and Basma Makhlouf-Shabou. "Research Data Management in Switzerland." IFLA Journal 43, no. 1 (2017): 5-21.

Burnette, Margaret H., Sarah C. Williams, and Heidi J. Imker. "From Plan to Action: Successful Data Management Plan Implementation in a Multidisciplinary Project." Journal of eScience Librarianship 5, no. 1 (2016): e1101.

Burton, A., D. Groenewegen, C. Love, A. Treloar, and R. Wilkinson. "Making Research Data Available in Australia." Intelligent Systems 27, no. 3 (2012): 40-43.

Burton, Adrian, and Andrew Treloar. "Designing for Discovery and Re-use: The 'ANDS Data Sharing Verbs' Approach to Service Decomposition." International Journal of Digital Curation 4, no. 3 (2009): 44-56.

Buys, Cunera M., and Pamela L. Shaw. "Data Management Practices Across an Institution: Survey and Report." Journal of Librarianship and Scholarly Communication 3, no. 2 (2015): eP1225.

INTRODUCTION Data management is becoming increasingly important to researchers in all fields. The E-Science Working Group designed a survey to investigate how researchers at Northwestern University currently manage data and to help determine their future needs regarding data management. METHODS A 21-question survey was distributed to approximately 12,940 faculty, graduate students, postdoctoral candidates, and selected research-affiliated staff at Northwestern's Evanston and Chicago Campuses. Survey questions solicited information regarding types and size of data, current and future needs for data storage, data retention and data sharing, what researchers are doing (or not doing) regarding data management planning, and types of training or assistance needed. There were 831 responses and 788 respondents completed the survey, for a response rate of approximately 6.4%. RESULTS Survey results indicate investigators need both short and long term storage and preservation solutions. However, 31% of respondents did not know how much storage they will require. This means that establishing a correctly sized research storage service will be difficult. Additionally, research data is stored on local hard drives, departmental servers or equipment hard drives. These types of storage solutions limit data sharing and long term preservation. Data sharing tends to occur within a research group or with collaborators prior to publication, expanding to more public availability after publication. Survey responses also indicate a need to provide increased consulting and support services, most notably for data management planning, awareness of regulatory requirements, and use of research software.

This work is licensed under a Creative Commons Attribution 4.0 License.

Byatt, Dorothy, Federico De Luca, Harry Gibbs, Meriel Patrick, Sally Rumsey, and Wendy White. Supporting Researchers with Their Research Data Management: Professional Service Training Requirements—A DataPool Project Report. Southampton, UK: University of Southampton, 2013.

Through the JISC funded Institutional Research Management Blueprint Project (IDMB) the University of Southampton developed its 10 year blueprint (Brown et al, 2011) for building the required infrastructure. It did this by investigating what researchers were currently doing with their data and what they thought they required. As well as the blueprint, the IDMB project also developed a draft research data management policy to underpin this work. In DataPool: Engaging with our Research Data Management Policy White & Brown (2013) detail how this draft policy was refined and approved. The policy on its own is insufficient but is an important step in enabling the development of the supporting infrastructure, both technological and personnel. The training strand of the DataPool project included an assessment of professional development requirements for staff supporting researchers in managing their data throughout the research life cycle. This report will focus on the investigation undertaken to assess the level of expertise in the relevant support staff groups, identify the training needs of those staff and consider what networks need to be developed to enable collaborative support of researchers in the area of research data management. It will report on the results of the survey carried out at the University of Southampton.

This work is licensed under a Creative Commons Attribution 2.5 Generic License.

Byatt, Dorothy, Mark Scott, Gareth Beale, Simon J. Cox, and Wendy White. Developing Researcher Skills in Research Data Management: Training for the Future—A DataPool Project Report. Southampton, UK: University of Southampton, 2013.

This report will look at the multi-level approach to developing researcher skills in research data management in the University of Southampton, developed as part of the training strand of the JISC DataPool project, and embedded into the University engagement with research data management. It will look at how:

  • the multi-level approach to research data management training provides opportunities for cross- and multi-disciplinary sharing events as well as bespoke subject specific sessions;
  • co-delivery with active researchers and/or other professional support services benefits the presentation and relevance of the material to the researchers;
  • focussing the event and matching content to the expected audience is key;
  • using the Institutional Data Management Blueprint dual approach of bottom-up (researchers needs)/top-down (institutional policies and infrastructure) worked

This work is licensed under a Creative Commons Attribution 3.0 Unported License.

Byatt, Dorothy, and Wendy White. Research Data Management Planning, Guidance and Support: A DataPool Project Report. Southampton: University of Southampton, 2013.

This report will review the development of research data management support in the University of Southampton following the approval of its research data management policy in February 2012. Wendy White (2013) in her report DataPool: Engaging with our Research Data Management Policy discusses the rationale and approach to the development of the policy. This report will look at the development of the research data management web pages, including the supporting policy guidance, and then will focus on the ResearchData@soton email, phone and desk side service launched to provide research data support to the University.

This work is licensed under a Creative Commons Attribution 2.5 Generic License.

Callaghan, Sarah. "Preserving the Integrity of the Scientific Record: Data Citation and Linking." Learned Publishing 27, no. 5 (2014): 15-24.

Callaghan, Sarah, Steve Donegan, Sam Pepler, Mark Thorley, Nathan Cunningham, Peter Kirsch, Linda Ault, Patrick Bell, Rod Bowie, Adam Leadbetter, Roy Lowry, Gwen Moncoiffé, Kate Harrison, Ben Smith-Haddon, Anita Weatherby, and Dan Wright. "Making Data a First Class Scientific Output: Data Citation and Publication by NERC's Environmental Data Centres." International Journal of Digital Curation 7, no. 1 (2012): 107-113.

The NERC Science Information Strategy Data Citation and Publication project aims to develop and formalise a method for formally citing and publishing the datasets stored in its environmental data centres. It is believed that this will act as an incentive for scientists, who often invest a great deal of effort in creating datasets, to submit their data to a suitable data repository where it can properly be archived and curated. Data citation and publication will also provide a mechanism for data producers to receive credit for their work, thereby encouraging them to share their data more freely.

This work is licensed under a Creative Commons Attribution License.

Callaghan, Sarah, Jonathan Tedds, John Kunze, Varsha Khodiyar, Rebecca Lawrence, Matthew S. Mayernik, Fiona Murphy, Timothy Roberts, and Angus Whyte."Guidelines on Recommending Data Repositories as Partners in Publishing Research Data." International Journal of Digital Curation 9, no. 1 (2014): 152-163.

This document summarises guidelines produced by the UK Jisc-funded PREPARDE data publication project on the key issues of repository accreditation. It aims to lay out the principles and the requirements for data repositories intent on providing a dataset as part of the research record and as part of a research publication. The data publication requirements that repository accreditation may support are rapidly changing, hence this paper is intended as a provocation for further discussion and development in the future.

This work is licensed under a Creative Commons Attribution 2.0 UK: England & Wales License.

Candela, Leonardo, Donatella Castelli, Paolo Manghi, and Sarah Callaghan. "On Research Data Publishing." International Journal on Digital Libraries 18, no. 2 (2017): 73-75.

Candela, Leonardo, Donatella Castelli, Paolo Manghi, and Alice Tani. "Data Journals: A Survey." Journal of the Association for Information Science and Technology 66, no. 9 (2015): 1747-1762.

Capó-Lugo, Carmen E., Abel N. Kho, Linda C. O'Dwyer, and Marc B. Rosenman. "Data Sharing and Data Registries in Physical Medicine and Rehabilitation." PM&R 9, no. 5 (2017): S59-S74.

Carlson, Jake. "Demystifying the Data Interview: Developing a Foundation for Reference Librarians to Talk with Researchers about Their Data." Reference Services Review 40, no. 1 (2012): 7-23.

——— "Opportunities and Barriers for Librarians in Exploring Data: Observations from the Data Curation Profile Workshops." Journal of eScience Librarianship 2, no. 2 (2013): 17-33.

Carlson, Jake, Megan Sapp Nelson, Lisa R. Johnston, and Amy Koshoffer. "Developing Data Literacy Programs: Working with Faculty, Graduate Students and Undergraduates " Bulletin of the Association for Information Science and Technology 41, no. 6 (2015): 14-17.

Carlson, Jake, and Marianne Stowell-Bracke. "Data Management and Sharing from the Perspective of Graduate Students: An Examination of the Culture and Practice at the Water Quality Field Station." portal: Libraries and the Academy 13, no. 4 (2013): 343-361.

Carroll, Michael W. "Sharing Research Data and Intellectual Property Law: A Primer." PLOS Biology 13, no. 8 (2015): e1002235.

Sharing research data by depositing it in connection with a published article or otherwise making data publicly available sometimes raises intellectual property questions in the minds of depositing researchers, their employers, their funders, and other researchers who seek to reuse research data. In this context or in the drafting of data management plans, common questions are (1) what are the legal rights in data; (2) who has these rights; and (3) how does one with these rights use them to share data in a way that permits or encourages productive downstream uses? Leaving to the side privacy and national security laws that regulate sharing certain types of data, this Perspective explains how to work through the general intellectual property and contractual issues for all research data.

This work is licensed under a Attribution 4.0 International License.

Castro, Eleni, and Alex Garnett. "Building a Bridge Between Journal Articles and Research Data: The PKP-Dataverse Integration Project." International Journal of Digital Curation 9, no. 1 (2014): 176-184.

A growing number of funding agencies and international scholarly organizations are requesting that research data be made more openly available to help validate and advance scientific research. Thus, this is an opportune moment for research data repositories to partner with journal editors and publishers in order to simplify and improve data curation and publishing practices. One practical example of this type of cooperation is currently being facilitated by a two year (2012-2014) one million dollar Sloan Foundation grant, integrating two well-established open source systems: the Public Knowledge Project's (PKP) Open Journal Systems (OJS), developed by Stanford University and Simon Fraser University; and Harvard University's Dataverse Network web application, developed by the Institute for Quantitative Social Science (IQSS). To help make this interoperability possible, an OJS Dataverse plugin and Data Deposit API are being developed, which together will allow authors to submit their articles and datasets through an existing journal management interface, while the underlying data are seamlessly deposited into a research data repository, such as the Harvard Dataverse. This practice paper will provide an overview of the project, and a brief exploration of some of the specific challenges to and advantages of this integration.

This work is licensed under a Creative Commons Attribution 2.0 UK: England & Wales License.

Chad, Ken, and Suzanne Enright. "The Research Cycle and Research Data Management (RDM): Innovating Approaches at the University of Westminster." Insights: The UKSG Journal 27, no. 2 (2014): 147-153.

This article presents a case study based on experience of delivering a more joined-up approach to supporting institutional research activity and processes, research data management (RDM) and open access (OA). The result of this small study, undertaken at the University of Westminster in 2013, indicates that a more holistic approach should be adopted, embedding RDM more fully into the wider research management landscape and taking researchers' priorities into consideration. Rapid development of an innovative pilot system followed closely on from a positive engagement with researchers, and today a purpose-built, integrated and fully working set of tools are functioning within the virtual research environment (VRE). This provides a coherent 'thread' to support researchers, doctoral students and professional support staff throughout the research cycle. The article describes the work entailed in more detail, together with the impact achieved so far and what future work is planned.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Chao, Tiffany C., Melissa H. Cragin, and Carole L. Palmer. "Data Practices and Curation Vocabulary (DPCVocab): An Empirically Derived Framework of Scientific Data Practices and Curatorial Processes." Journal of the Association for Information Science and Technology 66, no. 3 (2015): 616-633.

Chapple, Michael J. "Speaking the Same Language: Building a Data Governance Program for Institutional Impact." EDUCAUSE Review 48, no. 6 (2013): 14-27.

Charbonneau, Deborah H. "Strategies for Data Management Engagement." Medical Reference Services Quarterly 32, no. 3 (2013): 365-374.

Charbonneau, Deborah H., and Joan E. Beaudoin. "State of Data Guidance in Journal Policies: A Case Study in Oncology." International Journal of Digital Curation 10, no. 2 (2015): 136-156.

This article reports the results of a study examining the state of data guidance provided to authors by 50 oncology journals. The purpose of the study was the identification of data practices addressed in the journals' policies. While a number of studies have examined data sharing practices among researchers, little is known about how journals address data sharing. Thus, what was discovered through this study has practical implications for journal publishers, editors, and researchers. The findings indicate that journal publishers should provide more meaningful and comprehensive data guidance to prospective authors. More specifically, journal policies requiring data sharing, should direct researchers to relevant data repositories, and offer better metadata consultation to strengthen existing journal policies. By providing adequate guidance for authors, and helping investigators to meet data sharing mandates, scholarly journal publishers can play a vital role in advancing access to research data.

This work is licensed under a Creative Commons Attribution 2.0 UK: England & Wales License.

Chervenaka, Ann, Ian Foster, Carl Kesselman, Charles Salisbury, and Steven Tuecke. "The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets." Journal of Network and Computer Applications 23, no. 3 (2000): 187-200.

Childs, Sue, Julie McLeod, Elizabeth Lomas, and Glenda Cook. "Opening Research Data: Issues and Opportunities." Records Management Journal 24, no. 2 (2014): 14-162.

Chiware, Elisha R.T., and Zanele Mathe. "Academic Libraries' Role in Research Data Management Services: A South African Perspective." South African Journal of Libraries and Information Science 81, No 2 (2015).

Chou, Chiu-chuang Lu. "50 Years of Social Science Data Services: A Case Study from the University of Wisconsin-Madison." International Journal of Librarianship 2, no. 1 (2017): 42-52.

The Data and Information Services Center (DISC), formerly known as the Data and Program Library Services (DPLS) has provided learning, teaching and research support to students, staff and faculty in social sciences at the University of Wisconsin-Madison for 50 years. What changes have our organization, collections, and services experienced? How has DISC evolved with the advancement of technology? What role does DISC play in the current and future landscape of social science data services on our campus and beyond? This paper gives answers to these questions and recommends a few simple steps in adding social science data services in academic libraries.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Choudhury, G. Sayeed. "Case Study in Data Curation at Johns Hopkins University." Library Trends 57, no. 2 (2008): 211-220.

——— "Data Curation: An Ecological Perspective." College & Research Libraries News 71, no. 4 (2010): 194-196.

Choudhury, Sayeed, Tim DiLauro, Alex Szalay, Ethan Vishniac, Robert J. Hanisch, Julie Steffen, Robert Milkey, Teresa Ehling, and Ray Plante. "Digital Data Preservation for Scholarly Publications in Astronomy." International Journal of Digital Curation 2, no. 2 (2007): 20-30.

Claibourn, Michele P. "Bigger on the Inside: Building Research Data Services at the University of Virginia." Insights: The UKSG journal 28, no. 2 (2015): 100-106.

Every story has a beginning, where the narrator chooses to start, though this is rarely the genesis. This story begins with the launch of the University of Virginia Library's new Research Data Services unit in October 2013. Born from the conjoining of a data management team and a data analysis team, Research Data Services expanded to encompass data discovery and acquisitions, research software support, and new expertise in the use of restricted data. Our purpose is to respond to the challenges created by the growing ubiquity and scale of data by helping researchers acquire, analyze, manage, and archive these resources. We have made serious strides toward becoming 'the face of data services at U.Va.' This article tells a bit of our story so far, relays some early challenges and how we've responded to them, outlines several initial successes, and summarizes a few lessons going forward.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Clement, Ryan, Amy Blau, Parvaneh Abbaspour, and Eli Gandour-Rood. "Team-based Data Management Instruction at Small Liberal Arts Colleges." IFLA Journal 43, no. 1 (2017): 105-118.

Clements, Anna. "Research Information Meets Research Data Management. . . in the Library?" Insights: The UKSG journal 26, no. 3 (2013): 298-304.

Research data management (RDM) is a major priority for many institutions as they struggle to cope with the plethora of pronouncements including funder policies, a G8 statement, REF2020 consultations, all stressing the importance of open data in driving everything from global innovation through to more accountable governance; not to mention the more direct possibility that non-compliance could result in grant income drying up. So, at the coalface, how do we become part of this global movement?

Special Section: Research Data Access & Preservation

Tracking citations and altmetrics for research data: Challenges and opportunities


  • Stacy Konkiel

    1. Science data management librarian at Indiana University
    Search for more papers by this author


Editor's Summary

Methods for determining research quality have long been debated but with little lasting agreement on standards, leading to the emergence of alternative metrics. Altmetrics are a useful supplement to traditional citation metrics, reflecting a variety of measurement points that give different perspectives on how a dataset is used and by whom. A positive development is the integration of a number of research datasets into the ISI Data Citation Index, making datasets searchable and linking them to published articles. Yet access to data resources and tracking the resulting altmetrics depend on specific qualities of the datasets and the systems where they are archived. Though research on altmetrics use is growing, the lack of standardization across datasets and system architecture undermines its generalizability. Without some standards, stakeholders' adoption of altmetrics will be limited.

The recently announced San Francisco Declaration on Research Assessment [1], which calls for the abandonment of the journal impact factor as a means to determine the quality of research, highlights how important and contested the measurement of scholarly impact has become. Measuring impact for research data is also complicated. Data citation itself is not yet a standard practice [2, 3], and there is no authoritative agreement on how and when data should be cited [4]. Altmetrics, which track scholarship's usage on the social and scholarly web, comprise a nebulous group of metrics that use an ever-shifting list of web services' APIs as a source of their data [5]. As with data citations, standards do not yet exist to record or report the impact of different types of altmetrics. In light of these challenges, a panel was convened at the ASIS&T Research Data Access & Preservation Summit 2013 (RDAP13) to discuss new developments in exactly how researchers track the impact of data.

Overview of Data Metrics

Though discussions of data citation practices have occurred since the 1980s, it is in recent years that domain specialists, scientometricians and data curators have attempted to define standards for the citation of data and other data-related metrics. The closest the field has come to defining a standard is establishing DataCite [6], an organization that registers permanent identifiers (PIDs) for data and indexes associated metadata for discovery.

Such standards were the subject of the National Academies' Board on Research Data and Information workshop, “For Attribution—Developing Data Attribution and Citation Practices and Standards: Summary of an International Workshop” (2012), a full report of which is available at the National Academies Press website [7]. Various stakeholders, including researchers, librarians and publishers, put forth their positions on what attribution for data should look like (citation versus varied metrics), what functions it should serve (attribution, showing provenance or defining the impact of researchers overall), how its infrastructure should behave (characteristics of host repositories, executable papers or linked data) and which communities are responsible for its development and implementation (libraries, publishers, data centers or researchers). No single position or suite of recommendations emerged from the meeting nor from a similar meeting, “Bridging Data Lifecycles: Tracking Data Use via Data Citations Data Workshop” [8], held earlier that year.

Other researchers are tackling the problem of tracking impact with a bottom-up approach. The Data Usage Index (DUI) has been proposed for the field of biodiversity, based on a variety of metrics culled from the Global Biodiversity Information Facility (GBIF) repository [4]. The authors call for a move beyond data citations, which mimic the citation of traditional publications, primarily because existing metrics do not “recognize all players involved in the life cycle of those data from collection to publication” nor are they yet standardized. Based on usage logs from the GBIF servers, Ingwersen and Chavan conceptualized a set of measures that are either “absolute” or “relative”: number of searched records, download frequencies, number of datasets, download densities and number of searches, to name a few of the 14 metrics. These measures are intended to show value to the researcher and to be used to demonstrate impact in a manner analogous to other altmetrics. While the study has implications for further development of related DUIs, the authors acknowledge that their index is specific to the GBIF repository and therefore not generalizable to all research data repositories.

These limitations are the starting point for the study, “The Product and System Specificities of Measuring Impact: Indicators of Use in Research Data Archives,” presented at the 2013 International Digital Curation Conference [9]. The overall aim of the group's research is to develop a suite of metrics that can expose the value that data curators add to a dataset, which in itself is an intriguing concept. The researchers' conceptual framework is especially interesting in that they acknowledge that so-called “specificities” of systems and products—that is, the various sociotechnical factors that influence a system's or an organization's design and development—have more to do with the value of metrics that can be extracted than external factors.

Data curation work is related to both system and product specificities. It is reliant on a system's specificities—architecture and arrangement that dictate how the user can interact with an archive—in that such specificities have an influence on metrics like number of search hits and number unique users who can discover the content. Equally important are product specificities, which are “the qualities and properties of the datasets themselves—their file structure, format and size—that affect the way a user can interact with the archive in consuming and discovering data” [9]. Though the researchers do not go into detail about the effects of particular data-curation activities (such as describing data using metadata standards and controlled vocabularies, reorganizing data for understandability and consumption) on data metrics, the area is tantalizingly open for further study.

Another major study in this area that addresses the various metrics, stakeholders and infrastructure considerations from a 10,000-foot view is the report “The Value of Research Data: Metrics for Datasets from a Cultural and Technical Point of View” accessible at the Knowledge Exchange website [10]. The authors give a rich overview of the challenges and opportunities that lie in capturing metrics for data and report on stakeholder views of the viability of the currently available metrics. Chief among the challenges are culture and infrastructure.

The authors posit that researchers have little reason to value data metrics (including citations) as yet, since they are not considered as valuable as citations to traditional publications. They also have little reason to adopt practices that will enable data metrics to be easily tracked, such as standardized citations for data or the assignment of permanent identifiers such as DOIs (digital object identifiers) to datasets, because the technical infrastructure currently does not support such practices for the most part. This presents a chicken-or-egg conundrum for those developing infrastructure for data citation, which is currently suboptimal, as there does not yet seem to be a need for such an infrastructure, given the lack of interest from researchers. Results from stakeholder interviews and environmental scans inform much of their report.

At RDAP13, Kathleen Fear, University of Michigan; Elizabeth Moss, ICPSR; and Heather Piwowar, ImpactStory/Duke University, presented their work and research related to measuring the impact of data. While all three researchers agreed that data citation is a good way to measure scholarly impact, they also shared their ideas on how to capture a fuller picture of the impact of data, including how the data has been reused and by which audiences.

The Impact of Data Reuse: A Pilot Study of Five Measures

Fear, a PhD candidate at University of Michigan, began the panel by sharing her research into the many ways that citations and usage statistics such as downloads can be used to track various degrees of impact for social science datasets [11]. The impacts boil down into five categories: data reuse, quality of publications that reuse data, diversity of publications that reuse data, size of network stemming from a single dataset and number of unique individuals who download a dataset.

The measurement for the number of times the data has been reused is analogous to how many times a dataset has been cited. While most datasets in Fear's sample had never been cited, many were cited two to 10 times over the course of their lifecycle, with some receiving as many as 30 citations in journal articles. Fear measured the quality and diversity of publications that cite (reuse) the data by determining the citation rates for articles that cite the datasets and the breadth of publications. She noted that reuse rates can be affected by the publications in which a dataset was cited and also by disciplinary differences.

By counting the number of unique individuals who download a dataset, repositories can make general estimates of the data's popular impact. However, we cannot be sure if downloads mean that the dataset has been used in any way, just as we cannot be sure that downloads of journal articles guarantee a paper has been read [9].

The final metric, the size of publication network that stems from a single dataset, is still being researched. The other measures are, interestingly, for the most part all interrelated. Fear found that data reuse counts had little to do with unique downloaders or the data's secondary impact.

The results of Fear's study are interesting, but are they generalizable to all data and data repositories? In our current environment, the answer is, “No.” In working with social science datasets culled from the Inter-university Consortium for Political and Social Research (ICPSR), Fear was able to track reuse using the repository's Bibliography of Data-Related Literature (which is described in more detail below). The bibliography is, by necessity, a manually curated list; data citation standards have not yet been fully developed or implemented in a way that can automate the tracking for all data held in the ICPSR.

However, in a future where data is cited as strictly as prior publications are cited, one could imagine that Fear's measures of impact take on great importance. Data potentially could have a much broader impact than publications, because they are open to interpretation and analysis: different communities often repurpose data in many different ways with many different results. Determining the scope and quality of that impact could speak volumes about the quality and utility of the data itself.

Viable Data Citation: Expanding the Impact of Social Science Research

ICPSR has done much in the years since its launch to track the citations for data stored in its repository via its Bibliography of Data-Related Literature—a manually curated list of more than 60,000 articles that are based in whole or part on findings culled from ICPSR data. In her presentation, the bibliography's chief architect, Elizabeth Moss, stressed the importance of cultivating a culture of data citation: “Impact can be better measured if data use is readily discernible.” [12]

Impact is broken down by ICPSR to help understand who uses the data and to what effect. There are certain measures that ICPSR's own website tracks easily: download statistics, unique sessions and users and the names of ICPSR member institutions where downloads of datasets occur. These metrics track who uses the data, while the Bibliography of Data-Related Literature more broadly tracks the data's impact in the literature.

ICPSR has engineered some aspects of its repository to encourage citation of both data and related publications, as well as to support different uses by its various audiences. Within the bibliography, literature is searchable and exportable to reference manager programs. Item records for publications link back to related datasets. This tool can be used in teaching students how to conduct and document their own research, helping researchers perform literature reviews, allowing researchers and funders to track how data is used and enabling reporters and policymakers to see both statistics and the related reports [12]. Digital object identifiers (DOIs) are also issued for data, both at the collection and the study level, with links resolving to the web page with the richest metadata that can help users understand the dataset. These system specificities likely have an effect on how the data is cited and on the other metrics that are collected, as described in the previous paragraph.

Despite ICPSR's efforts to encourage good citation practices, Moss finds that data is rarely explicitly referred to in the literature or discoverable within academic databases. Often, ICPSR staff must comb through articles' methods descriptions and figures to uncover the original dataset a project might be based upon. Most academic databases do not index data—it is simply out of their scope—and current full-text search capabilities are not sophisticated enough for the nuanced search techniques that are currently required to uncover references to datasets. Moss's current strategy to overcome these challenges is to combine text-mining scripts with Google Alerts, which can alert Moss whenever a dataset's creator is mentioned or its DOI is referenced.

ICPSR's recent partnership with the Institute for Scientific Information's (ISI) new Data Citation Index (DCI) initiative aims to address some of these issues by integrating its datasets and the Bibliography of Data-Related Literature, as well as many other repositories' data and related citations, into the DCI database. Within the DCI, datasets are fully searchable and are treated as research objects that are on par with journal articles, conference proceedings and other traditional outputs. The database search functionality for the DCI as well as related databases like the Web of Knowledge is being converted to meet the needs of those searching for data. As a result, articles can be more easily linked to data, leading to increased data discovery, which is itself a reward for data citation and also rewards those who make their data easily citable—all these benefits from a search interface that many researchers are already using to find emerging research.

Moss concluded by explaining how ICPSR helps “build a culture of viable data citation to improve measures of impact” by providing principal investigators and users with citations, metrics and DOIs for data. Moss encouraged the audience to join groups and attend conferences to advocate for viable data citation practices, including DataCite, iASSIST and the Research Data Alliance. She also advocated that journal editors, domain repositories and funders work together to support repositories and change publishing practices, by requiring authors to better steward and clearly cite the data that underpins their studies.

No More Waiting! Tools that Work Today to Reveal Dataset Use”

Heather Piwowar, co-founder of the altmetrics service ImpactStory, discussed the responsibility of librarians, metrics providers and data scientists to go beyond citations when considering the impact of dataset reuse [13]. Altmetrics can track many types of engagement (views, saves, discussions, formal references and recommendations) that many different types of user groups (researchers, teachers, students, policy makers and practitioners) can have with a single dataset. Those are characterized as “impact flavors,” and tools such as ImpactStory, and Plum Analytics are well suited to help aggregate and display them.

Piwowar laid out three ways in which the community can help encourage more diverse research metrics for dataset reuse: by exposing more metrics, supporting more types of engagement with datasets and lobbying and negotiating for Open Access to research.

Taking the ICPSR and its metrics as an example, Piwowar argued that content providers not only should provide information on dataset usage (downloads and pageviews for descriptive information), but also other rich metrics such as institutions from which a dataset was downloaded and classifications of unique users (into categories such as graduate students, undergraduates, university staff or faculty). However, many repositories do not expose any metrics, especially at the dataset level [10]. It is the responsibility of data curators and repository administrators to expose such metrics.

Secondly, datasets are complicated research products. The scholarly community has not yet figured out an efficient or standardized way to support peer-reviewed data publications. It follows that scholarly social media sites like Faculty of 1000 or Mendeley would have difficulty addressing datasets and their usage. Piwowar called upon service providers—and altmetrics service providers—to report metrics for all types of engagement with data.

Finally, Piwowar advocated for advocacy itself, as it relates to data metrics. As data curators, librarians, researchers and university administrators, Piwowar argued that it is our duty to lobby and negotiate for open access to research, including open-text mining of articles, open data from repositories and open metrics from aggregators.

Piwowar's last point led to a general discussion of whether repositories like ICPSR should allow commercial, toll-access services such as the DCI to index their metadata, much of which is the result of manual curation. Moss proposed the idea that any exposure to data, whether via the open web or a service like the DCI, is beneficial to the data creator and end user alike. Piwowar, as the founder of a service that relies on open APIs to report metrics, acknowledged that toll-access services and closed APIs inhibit both the ability of end-users to find datasets and platforms such as hers to track their impact.


Data citations are just one metric that can be tracked to determine the impact of datasets made available through repositories. Altmetrics and usage statistics can determine the impact of data and publications beyond the academy and are useful supplements to citations. The technical infrastructure of repositories and the characteristics of the datasets stored in them can sometimes dictate which metrics can be applied to fully evaluate the impact of data. No metrics can be fully implemented until certain standards, such as DOI usage or commonly agreed-upon best practices for data citation, are widely adopted. Even then, manual intervention to link data to publications and other research outputs may be necessary, making the role of repository staff and librarians ever more essential.


Article Information



Format Available

Full text: HTML | PDF

Copyright © 2013 American Society for Information Science and Technology

Request Permissions

Publication History

  • Issue online:
  • Version of record online:

Resources Mentioned in the Article

  • 1The San Francisco Declaration on Research Assessment (DORA):
  • 2Borgman, C. L. (2012). Why are the attribution and citation of scientific data important? In P. F. Uhlir (Raporteur) & Board on Research Data and Information Policy and Global Affairs, National Research Council. For Attribution - Developing data attribution and citation practices and standards: Summary of an international workshop (pp. 1–10). Washington, D.C.: National Academies Press. Retrieved June 19, 2013, from
  • 3Mooney, H., & Newton, M. (2012). The anatomy of a data citation: Discovery, reuse and credit. Journal of Librarianship and Scholarly Communication, 1(1), eP1035. doi:10.7710/2162-3309.1035. Retrieved June 19, 2013, from
  • 4Ingwersen, P., & Chavan, V. (2011). Indicators for the Data Usage Index (DUI): An incentive for publishing primary biodiversity data through global information infrastructure. BMC Bioinformatics, 12 (Suppl 15), S3. doi:10.1186/1471-2105-12-S15-S3. Retrieved June 19, 2013 from
  • 5Priem, J., Taraborelli, D., Groth, P., & Neylon, C. (2010). Alt-metrics: A manifesto. Retrieved October 26, 2010, from
  • 6DataCite:
  • 7Uhlir, P. F. (Raporteur), & The National Research Council. (2012). For attribution - developing data attribution and citation practices and standards: Summary of an international workshop. Washington, DC: The National Academies Press. Retrieved June 19, 2013, from
  • 8University Corporation for Atmospheric Research (UCAR). (2012). Bridging Data Lifecycles: Tracking Data Use via Data Citations Data Workshop. Retrieved June 19, 2013, from
  • 9Weber, N. M., Thomer, A. K., Mayernik, M. S., Dattore, B., Ji, Z., & Worley, S. (2013). Indicators of use in research data archives. 8th International Digital Curation Conference (IDCC). Amsterdam, The Netherlands.
  • 10Costas, R., Meijer, I., Zahedi, Z., & Wouters, P. (2013). The Value of research data: Metrics for datasets from a cultural and technical point of view. Copenhagen, Denmark. Knowledge Exchange. Retrieved June 19, 2013, from
  • 11Fear, K. (2013). The impact of data reuse: A pilot study of five measures [Powerpoint slides]. Research Data Access & Preservation Summit. Baltimore, MD. Retrieved June 19, 2013, from
  • 12Moss, E. (2013). Viable Data Citation: Expanding the impact of social science research [Powerpoint slides]. Research Data Access & Preservation Summit. Baltimore, MD. Retrieved June 19, 2013, from
  • 13Piwowar, H. A. (2013). No more waiting! Tools that work today to reveal dataset use [Powerpoint slides]. Research Data Access & Preservation Summit. Baltimore, MD. Retrieved June 19, 2013, from

Related content

Articles related to the one you are viewing

Citing Literature

0 Replies to “Research Data Curation Bibliography Creator”

Lascia un Commento

L'indirizzo email non verrĂ  pubblicato. I campi obbligatori sono contrassegnati *