Reproducibility and SC: Embracing the Challenge

The Library of Stuttgart
The Library of Stuttgart. Photo by Max Langelott shared in Unsplash.

 

New for SC19

This year, the conference is taking its Reproducibility Initiative to a whole new level. First, the obvious: an Artifact Description (AD) Appendix becomes a requirement for all technical papers (remaining optional for posters and workshop papers). The HPC community is embracing the challenge to publish reproducible works, and we are confident that this change is timely.

 

In Support of Authors

To support SC authors in this requirement, we have crafted a standard AD form that is integrated into the submission system. We invite you to visit a new “Author-Kit” GitHub repository that contains a draft of the AD form.

The new AD form contains a set of questions addressing specifically the eligibility for an Artifacts Available badge. The ACM guidance on this badge is:

  • Author-created artifacts relevant to this paper have been placed on a publicly accessible archival repository.
  • A DOI or link to this repository along with a unique identifier for the object is provided.

In the SC19 AD form, this guidance is made explicit by the following criteria of eligibility:

  • Software Artifact Availability — All author-created software artifacts are maintained in a public repository under an OSI- approved license.
  • Hardware Artifact Availability — All author-created hardware artifacts are available and comply with the Open Source Hardware Definition.
  • Data Artifact Availability — All author-created data artifacts are maintained in a public repository with a stable identifier, such as a DOI.

The form will also ask for the URLs of author-created artifacts, and these will be checked by the AD/AE Appendices Committee as part of the review process. This new committee will operate in a “double-open” format of review, and will also work with authors when they need to improve their appendices during paper revision.

 

Guaranteeing Persistence

A common misconception about artifact availability, particularly software, is that hosting the code on GitHub is sufficient. It is not: owners can delete a GitHub repository at any time. An archival repository guarantees persistence for a long period (at least 10 years) and provides a global identifier for the artifact (like a DOI).

Pro tip: Make a tagged release of your software (on the version used for your paper), and archive this release via GitHub integration, for example with the Zenodo service. See Making Your Code Citable, on the GitHub Guides. Also, remember to always include a LICENSE file in your code repository!

Hosting code and/or data on an author’s website, or your institution’s website, is insufficient. Use one of the many data repositories that satisfy the persistence and ID requirements: e.g., DataVerse, Dryad, Figshare, OSF, and Zenodo. Some fields have community recognized repositories, like the Protein Data Bank. Whatever host you choose for your data, just make sure it follows the FAIR Principles for data stewardship.

 

Questions?

In collaboration with the AD/AE Appendices committee (chaired by John Linford), we will be offering more education on the topic of reproducibility via this blog and other channels, and we invite your questions via email.

–––

lorena barba

Lorena Barba, SC19 Reproducibility Chair (George Washington University)
Feel free to also ping me on Twitter!

SC19 logo

Back To Top Button