Best Practices for HPC Software Developers (Webinars)

Jump to: Upcoming webinars | Past webinars | 20182017 | 2016

The HPC Best Practices webinars address issues faced by developers of computational science and engineering (CSE) software on high-performance computers  (HPC). The sessions are independent, so join any or all.

Who should attend: Participation is free and open to the public, however registration is required for each event. This series is designed for HPC software developers who are seeking help in increasing their team’s productivity, as well as facility staff who interact extensively with users.

Schedule and format: The webinars will occur approximately monthly and last about one hour each. The webinars usually take place on a Wednesday at 1:00-2:00pm ET (but this can change due to speaker availability). Audience questions and discussion will be encouraged, however due to the number of participants, we use the webinar tool’s chat capability and a shared Google Doc to do this in written form.  Recordings of the webinars along with the presentation slides will be posted.

Notifications: If you’d like to receive announcements of upcoming webinars and other IDEAS organized events, and followups when recordings become available, please subscribe to our mailing list.

Organizers: These webinars have been organized by the IDEAS project in collaboration with the DOE/ASCR computing facilities (ALCF, NERSC, and OLCF), and the Exascale Computing Project (ECP).

Logos of organizers

Suggestions Welcome! Want to request another topic?  Want to give a webinar?  Email us at IDEASProductivity@gmail.com.

Upcoming Webinars

Webinars are free and open to the public, but registration is required.

  1. Open Source Best Practices: From Continuous Integration to Static Linters [Register]

    • Date and Time:  Wednesday, October 17, 2018, 1:00-2:00 pm ET
    • Presenter: Daniel Smith and Ben Pritchard, Molecular Sciences Software Institute (MolSSI)
    • Archives: Slides (PDF) | Video (YouTube) | Q&A (PDF)
    • Description: This webinar will continue the discussion of open source software (OSS) opportunities within the scientific ecosystem to include the many cloud and local services available to OSS free of charge. The services to be discussed include continuous integration, code coverage, and static analysis. The presenters will demonstrate the usefulness of these tools and how a small time investment at the beginning is traded for long-term benefits. These services and ideas are agnostic to software language or HPC software application and should apply to any party interested in tools that help ease the burden of software maintenance.

Past Webinars

Listed in reverse chronological order.

2018

  1. Modern CMake  (2018-09-19)

    • Presenter: Bill Hoffman, Kitware
    • Archives: Slides (PDF) | Video (YouTube) | Q&A (PDF)
    • Description: Bill Hoffman, the creator of the CMake project, will give an introduction to development with modern CMake constructs. CMake is 17 years old and has evolved over time into the most widely used C++ build tool in the world. In the past 5 years, many new features have been added to CMake to make the creation of cross-platform build files easier. This webinar will provide best practices for development and maintenance of a CMake build system. The webinar will cover the “target centric” approach to writing CMake files. In addition, testing and quality dashboards with CDash will be covered. Kitware’s experience with HPC systems and CMake will also be discussed.
  2. Software Sustainability — Lessons Learned from Different Disciplines (2018-08-21)

    • Presenter: Neil Chue Hong, Software Sustainability Institute (University of Edinburgh)
    • Archives: Slides (PDF @ FigShare) | Video (YouTube) | Q&A (PDF)
    • Description: How do you make software sustainable? How much is it about process and how much about practice? Does it vary between countries or disciplines? In this webinar, I’ll present what the UK’s Software Sustainability Institute has learned from 8 years of work in this area including efforts around understanding the scale of software use in research, raising the profile of software as a key part of the research ecosystem, and how we can enable researchers and developers to build better software.
  3. How Open Source Software Supports the Largest Computers on the Planet (2018-07-18)

    • Presenter: Ian Lee, Lawrence Livermore National Laboratory
    • Archives: Slides (PDF) | Video (YouTube) | Q&A (PDF)
    • Description: This talk will provide an overview of the work at Lawrence Livermore National Laboratory to re-vamp our open source project offerings, release processes, and engagements across the Department of Energy and the US government through efforts such as DOECode and Code.gov. We will also discuss ongoing work to make it easier for our staff to engage with open source communities, via both the creation of new projects and contributions to existing open source projects.  We believe that these experiences and insights may be useful to a wide range of developers of high-performance scientific software.
  4. Popper: Creating Reproducible Computational and Data Science Experimentation Pipelines (2018-06-13)

    • Presenter: Ivo Jimenez, UC Santa Cruz
    • Archives: Slides (PDF) | Video (YouTube) | Q&A (PDF)
    • Description: Current approaches used in computational and data science research may require significant time without necessarily advancing scientific understanding. For example, researchers may spend countless hours reformatting data and writing code to attempt to reproduce previously published research. What if the scientific community could find a better way to create and publish workflows, data, and models that are easy to reproduce, thus streamlining scientific analysis? Popper is a protocol and command language interpreter (CLI) tool for implementing scientific exploration pipelines following a DevOps approach of unifying software development and operation in order to handle complexity in large codebases. Popper repurposes DevOps practices in the context of scientific explorations, so that researchers can leverage existing tools and technologies to enable reproducibility. This webinar will introduce the Popper protocol, including a demo of the CLI tool and HPC examples.
  5. On-demand Learning for Better Scientific Software: How to Use Resources & Technology to Optimize your Productivity (2018-05-09)

    • Presenter: Elaine Raybourn, Sandia National Laboratories
    • Archives: Slides (PDF) | Video (YouTube) | Q&A (PDF)
    • Description: Continual advances in new technologies for computational science often require members of the HPC community to learn new tools,  techniques, or processes on-demand, or outside of a formal education setting. While the variety of media and deluge of content make on-demand learning a reality, very few learners apply guiding principles from learning science to set themselves up for success. Applying on-demand learning strategies for self-paced “learning in the wild” can augment professional learning courses from EdX, Udacity, and YouTube. Employing use cases and examples from Python and Git, this webinar will demonstrate how to develop a personalized learning framework leveraging massively open online courses (MOOC), podcasts, social media, videos, and more. A walk through of relevant learning applications will be provided. Participants of this webinar will take away practical strategies, resources, and tools that can be applied toward learning more productively in general, and specifically to software development. 
  6. Software Citation Today and Tomorrow  (2018-04-18)

    • Presenter: Daniel S. Katz, NCSA and University of Illinois at Urbana-Champaign
    • Archives: Slides (PDF) | Video (YouTube) | Q&A (PDF)
    • Description:  Software is increasingly important in research, and some of the scholarly communications community, for example, in FORCE11, has  been pushing the concept of software citations as a method to allow software developers and maintainers to get academic credit for their work: software releases are published and assigned DOIs, and software users then cite these releases when they publish research that uses the  software. This webinar will discuss the state of software citation, starting with history of work done by the FORCE11 Software Citation Working Group, leading to a published set of software citation principles (https://doi.org/10.7717/peerj-cs.86), as well as other prior work. It will also talk about where the community is going, what the obstacles to progress are, and how they may be overcome.
  7. Scientific Software Development with Eclipse  (2018-03-28)

    • Presenters: Greg Watson, ORNL
    • Archives: Slides (PDF) | Video (YouTube) | Q&A (PDF)
    • Description:  The Eclipse IDE is one of the most popular IDEs available, and its support for multiple languages, particularly C, C++ and Fortran has made it the go to IDE for scientific software development. Although an IDE like Eclipse can provide advanced development capabilities such as code recommendation and refactoring, these features can be difficult to utilize for complex code bases. Other challenges, such as ease of installation and use, reliability, and compatibility with existing development practices also play a role. Ultimately the usefulness of the tool is a tradeoff between the capabilities it provides and the challenges of incorporating it into the development workflow. This webinar will demonstrate some of the latest features available in Eclipse that are particularly useful for scientific application development, and examine how they can be used in a variety of different scenarios using realistic sample codes.
  8. Jupyter and HPC: Current State and Future Roadmap (2018-02-28)
    • Presenters: Matthias Bussonnier (UC Berkeley), Suhas Somnath (ORNL), and Shereyas Cholia (NERSC)
    • Archives: Slides (PDF) | Video (YouTube) | Q&A (PDF)
    • Description:  During the last few years the Jupyter notebook has become one of the tools of choice for the data science and high-performance computing (HPC) communities. This webinar will provide an overview of why Jupyter is gaining traction in education, data science, and HPC, with emphasis on how notebooks can be used as interactive documents for exploration and reporting.  We will present an overview of how Jupyter works and how the network protocol can be leveraged for both a local single machine and remote-cluster work.  We will discuss the nuts and bolts of how Jupyter has been deployed at NERSC as a case study in implementation of Jupyter in an HPC environment. This work implies learning the Jupyter ecosystem to take advantage of its powerful abstractions to develop custom infrastructure to satisfy policies and user needs.
      The webinar will show, as a use case, how Jupyter notebooks have transformed data discovery, visualization, and interactive analysis for the scanning probe and electron microscopy communities at Oak Ridge National Laboratory. It will also show how notebooks can seamlessly accommodate measurements from a wide variety of instruments through Pycroscopy, a framework for instrument agnostic data storage and analysis.
  9. Bringing Best Practices to  a Long-Lived Production Code (2018-01-17)
    • Presenter: Charles R. Ferenbaugh, LANL
    • Archives: Slides (PDF) | Video (YouTube) | Q&A (PDF)
    • Description: How can you introduce best software practices to a long-lived scientific production code, with a significant user base, that has “gotten along fine” for years doing things its own way? Often developers in such projects must struggle with overly complex code, inadequate documentation, little or no software process, and a “just write the code fast” culture; these are challenges to software quality that are generally not issues for new projects. In this presentation we’ll discuss some of the peculiar problems faced by long-lived scientific codes, and present a case study of how we’re dealing with these issues at LANL in the xRage radiation-hydrodynamics simulation code.

2017

  1. Better Scientific Software (https://bssw.io): So your code will see the future  (2017-12-06)
    • Presenters: Mike Heroux, SNL, and Lois McInnes, ANL
    • Archives: Slides (PDF) | Video (YouTube)
    • Description: Better Scientific Software (BSSw) is an organization dedicated to improving developer productivity and software sustainability for computational science and engineering (CSE).  This presentation will introduce a new website (https://bssw.io )—a community exchange for scientific software improvement.  We’re creating a clearinghouse to gather, discuss, and disseminate experiences, techniques, tools, and other resources to improve software productivity and sustainability for CSE. Site users can find information on scientific software topics and can propose to curate or create new content based on their own experiences. The backend enables collaborative content development using standard GitHub tools and processes.  We need your contributions to build the BSSw site into a vibrant resource, with content and editorial processes provided by volunteers throughout the international CSE community.  Join us!
  2. Managing Defects in HPC Software Development (2017-11-01)
    • Presenter: Tom Evans, ORNL
    • Archives: Slides (PDF) | Video (YouTube)
    • Description: Software Quality Engineering (SQE) and methods research and scientific investigation are often thought to be incompatible.  However, in reality they are not only compatible, but required in order to have confidence in the results of even basic scientific computations.  This is especially true for parallel software.  In this talk we will look at methods for performing software verification.  Software verification is a method for removing defects at code construction time; these techniques can help in both algorithm and method development, as well as increased productivity.
  3. Barely Sufficient Project Management: A few techniques for improving your scientific software development efforts (2017-09-13)
    • Presenter: Michael A. Heroux, SNL
    • Archives: Slides (PDF) | Video (YouTube) | Q&A (PDF)
    • Description: Software development is an essential activity for many scientific teams.  Modeling, simulation and data analysis, using team-developed software, are increasing valuable for scientific discovery and engineering. Many teams use informal, ad hoc approaches for managing their software efforts.  While sufficient for many efforts, a modest emphasis on team models and processes can substantially improve developer productivity and software sustainability. In this presentation, we discuss several light-weight techniques for managing scientific software efforts.  Using checklists, policy statements and a Kanban workflow system, we emphasize techniques for managing the initiation and exit of team members, approaches to synthesizing team culture, and ways to improve communication within a team and with its stakeholders.
  4. Using the Roofline Model and Intel Advisor (2017-08-16)
    • Presenter: Sam Williams, LBNL, and Tuomas Koskela, NERSC
    • Archives: Slides (PDF) | Video (YouTube) | Q&A (PDF)
    • Description: In this webinar, we will begin by introducing the Roofline Model and its “Cache-Aware” variant. We will proceed with some general guidelines and historical approaches to Roofline-based program analysis. Next, we will provide a short discussion of how changes in data locality and arithmetic intensity of two canonical benchmarks visually manifest in the context of these two Roofline formulations. Subsequently, we will provide two demonstrations of using Intel Advisor and the Roofline model within Intel Advisor. The first demo will be primarily instructive on how to compile, benchmark, and use Advisor. The second demo will focus on using variants of a simple benchmark to highlight changes in the Roofline model as well as providing correlation to Advisor’s other capabilities. We will conclude with a few comments on future directions.
  5. Intermediate Git (2017-07-12)
    • Presenter: Roscoe A. Bartlett, SNL
    • Archives: Slides (PDF) | Git Tutorial and Reference Collection (PDF) | Video (YouTube) | Q&A (PDF)
    • Description: This presentation will emphasize intermediate-level tutorial and reference information about the Git version control (VC) system. This overview takes the view that the best way to learn to use Git effectively is to learn it as a data structure and a set of algorithms to manipulate that data structure. This perspective is important because the Git command-line interface is widely considered to be overly complex and confusing. For example, a Git command like ‘checkout’ can do wildly different things depending on the other arguments passed into the command or the state of the Git repository.  But Git is still the dominant VC system; many people consider that Git has won the version control wars due to its power and flexibility. 
  6. Python in HPC (2017-06-07)
    • Presenters: Rollin Thomas, NERSC; William Scullin, ANL; Matt Belhorn, ORNL
    • Archives: Slides (PDF) | Video (YouTube) | Q&A (PDF)
    • Description: Python’s powerful elegance has driven its adoption at HPC centers for job orchestration, visualization, exploratory data analysis, and even simulation.  But maximizing performance from Python applications can be challenging especially on supercomputing architectures.  This webinar will explain those challenges with a practical emphasis on using Python at NERSC, ALCF, and OLCF.  We will outline a variety of performance optimization strategies, tools for measuring and addressing performance problems, and establish best practices for Python in HPC.

2016

  1. Basic Performance Analysis and Optimization – An Ant Farm Approach (2016-08-09)
    • Presenter: Jack Deslippe, NERSC
    • Archives: Slides (PDF) | Video (YouTube)
    • Description: How is optimizing HPC applications like an Ant Farm? Attend this presentation to find out. We’ll discuss the basic concepts around optimizing code for the HPC systems of today and tomorrow. These systems require codes to effectively exploit both parallelism between nodes and an ever growing amount of parallelism on-node. We’ll discuss profiling strategies, tools (for profiling and debugging) and common issues with both internode communication and on-node parallelism. We will give an overview of traditional optimizations areas in HPC applications like parallel IO and MPI strong and weak scaling as well as topics relevant for modern GPU and many-core systems like threading, SIMD/AVX, SIMT and effectively using cache and memory hierarchies. The “Ant Farm” approach places a heavy emphasis on the roofline performance model and encouraging users to understand the compute, bandwidth and latency sensitivity of their applications and kernels through a series of easy to perform experiments and an easy to follow flow chart. Finally, we’ll discuss what we expect to change in the optimization process as we move towards exascale computers.
  2. An Introduction to High-Performance Parallel I/O (2016-07-28)
    • Presenter: Feiyi Wang, OLCF. Feiyi Wang received his Ph.D. in Computer Engineering from North Carolina State University (NCSU). Before he joined Oak Ridge National Laboratory as research scientist, he worked at Cisco Systems and Microelectronic Center of North Carolina (MCNC) as a lead developer and principal investigator for several DARPA-funded projects.  His current research interests include high performance storage system, parallel I/O and file systems, fault tolerance and system simulation, and scientific data management and integration.  Dr. Wang is a Joint Faculty Professor at EECS Department of University of Tennessee and a senior member of IEEE.
    • Archives: Slides (PDF) | Video (YouTube)
    • Description: Parallel data management is a complex problem at large-scale HPC environments. The HPC I/O stack can be viewed as a multi-layered cake and presents an high-level abstraction to the scientists. While this abstraction shields the users from many of the I/O system details, it is very hard to obtain parallel I/O performance or functionality without understanding the end-to-end hierarchical I/O stack in today’s modern complex HPC environments. This talk will introduce the basic parallel I/O concepts and will provide guidelines on obtaining better I/O performance on large-scale parallel platforms.
  3. How the HPC Environment is Different from the Desktop (and Why)  (2016-07-14)
    • Presenter: Katherine Riley, ALCF
    • Archives: Slides (PDF) | Video (YouTube)
    • Description: High performance computing has transformed how science and engineering research is conducted.  Answering a question in 30 minutes that used to take 6 months can quickly change the way one asks questions.  Large computing facilities provide access to some of the world’s largest computing, data, and network resources in the world.  Indeed, the DOE complex has the highest concentration of supercomputing capability in the world.  However, by nature of their existence, making use of the largest computers in the world can be a challenging and unique task. This talk will discuss how supercomputers are unique and explain how that impacts their use.
  4. Testing and Documenting your Code (2016-06-15)
    • Presenter: Alicia Klinvex, SNL
    • Archives: Slides (PDF) | Video (YouTube)
    • Description: Software verification and validation are needed for high-quality and reliable scientific codes. For software with moderate to long lifecycles, a strong automated testing regime is indispensable for continued reliability. Similarly, comprehensive and comprehensible documentation is vital for code maintenance and extensibility. This presentation will provide guidelines on testing and documentation that can help to ensure high-quality and long-lived HPC software. We will present methodologies, with examples, for developing tests and adopting regular automated testing. We also will provide guidelines for minimum, adequate, and good documentation practices depending on the available resources of the development team.
  5. Distributed Version Control and Continuous Integration Testing (2016-06-02 )
    • Presenter: Jeff Johnson, LBNL
    • Archives: Slides (PDF) | Video (YouTube)
    • Description: Recently, many tools and workflows have emerged in the software industry that have greatly enhanced the productivity of development teams. GitHub, a site that hosts projects in Git repositories, is a popular platform for open source and closed source projects.  GitHub has encoded several best practices into easily followed procedures such as pull requests, which enrich the software engineering vocabularies of non-professionals and professionals alike.  GitHub also provides integration to other services (for example, continuous integration such as Travis CI, which allows code changes to be automatically tested before they are merged into a master development branch).   This presentation will discuss how to set up a project on GitHub, illustrate the use of pull requests to incorporate code changes, and show how Travis CI can be used to boost confidence that changes will not break existing code.
  6. Developing, Configuring, Building, and Deploying HPC Software (2016-05-18)
    • Presenter: Barry Smith, ANL
    • Archives: Slides (PDF) | Video (YouTube)
    • Description: The process of developing HPC software requires consideration of issues in software design as well as practices that support the collaborative writing of well-structured code that is easy to maintain, extend, and support.  This presentation will provide an overview of development environments and how to configure, build, and deploy HPC software using some of the tools that are frequently used in the community.  We will also discuss ways in which these and other tools are best utilized by various categories of scientific software developers, ranging from small teams (for example, a faculty member and graduate students who are writing research code intended primarily for their own use) through moderate/large teams (for example, collaborating developers spread among multiple institutions who are writing publicly distributable code intended for use by others in the community).
  7. What All Codes Should Do:  Overview of Best Practices in HPC Software Development (2016-05-04)
    • Presenter: Anshu Dubey, ANL
    • Archives: Slides (PDF) | Video (YouTube)
    • Description: Scientific code developers have increasingly been adopting software processes derived from the mainstream (non-scientific) community.  Software practices are typically adopted when continuing without them becomes impractical. However, many software best practices need modification and/or customization, partly because the codes are used for research and exploration, and partly because of the combined funding and sociological challenges. This presentation will describe the lifecycle of scientific software and important ways in which it differs from other software development.  We will provide a compilation of software engineering best practices that have generally been found to be useful by science communities, and we will provide guidelines for adoption of practices based on the size and the scope of the project.