Going FAIR and Reproducible: a User-Centric Perspective on Data Driven Workflows
Practitioners are asked to embrace Open Science, follow the FAIR principles and make sure their work is reproducible. At the same time, the amount of data to be processed is growing in massive proportions. Most data-driven scientific research is articulated around three pillars that are at the core of every scientist’s day-to-day life: data, code, and computing environments. Data and code are tightly interconnected via workflows, which run on some specific computing environment (for example, with a given version of a Python package or of an operating system). Virtualization and containerization technologies such as Docker and Singularity are increasingly used in order to replicate a specific computational environment and aid in the repeatability of data analysis.
In this talk, I argue that assisting the creation and reuse of these three pillars by providing an appropriate middleware software such as Renku can facilitate a smooth conjoint use of Cloud and HPC in FAIR and reproducible ways.