Popper: Creating Reproducible Computational and Data Science Experimentation Pipelines

Series: HPC Best Practices Webinars

Current approaches used in computational and data science research may require significant time without necessarily advancing scientific understanding. For example, researchers may spend countless hours reformatting data and writing code to attempt to reproduce previously published research. What if the scientific community could find a better way to create and publish workflows, data, and models that are easy to reproduce, thus streamlining scientific analysis? Popper is a protocol and command language interpreter (CLI) tool for implementing scientific exploration pipelines following a DevOps approach of unifying software development and operation in order to handle complexity in large codebases. Popper repurposes DevOps practices in the context of scientific explorations, so that researchers can leverage existing tools and technologies to enable reproducibility. This webinar will introduce the Popper protocol, including a demo of the CLI tool and HPC examples.


Presenter Bio

Ivo Jimenez is a PhD candidate at the UC Santa Cruz Computer Science Department and a member of the Systems Research Lab. He is interested in large-scale distributed data management systems. His thesis focuses on the practical aspects in the reproducible evaluation of systems research, work for which he was awarded the 2018 Better Scientific Software Fellowship. Ivo is originally from Mexico, where he got his B.S. in Computer Science from Universidad de Sonora. From 2006 to 2010 he worked as a research associate in the Database Research Lab at HP Labs. His goal in life is to make a difference through science.