Best Practices for HPC Software Developers (Webinars)

Jump to: Upcoming webinars | Past webinars | 2017 | 2016

The HPC Best Practices webinars address issues faced by developers of computational science and engineering (CSE) software on high-performance computers  (HPC). The sessions are independent, so join any or all.

Who should attend: Participation is free and open to the public, however registration is required for each event. This series is designed for HPC software developers who are seeking help in increasing their team’s productivity, as well as facility staff who interact extensively with users.

Schedule and format: The webinars will occur approximately monthly and last about one hour each.  Audience questions and discussion will be encouraged, however due to the number of participants, we use the webinar tool’s chat capability and a shared Google Doc to do this in written form.  Recordings of the webinars along with the presentation slides will be posted.

Notifications: If you’d like to receive announcements of upcoming webinars and other IDEAS organized events, and followups when recordings become available, please subscribe to our mailing list.

Organizers: These webinars have been organized by the IDEAS project in collaboration with the DOE/ASCR computing facilities (ALCF, NERSC, and OLCF), and the Exascale Computing Project (ECP).

Logos of organizers

Upcoming Webinars

  1. Better Scientific Software (https://bssw.io): So your code will see the future  [Register]
    • Date and Time:  Wednesday, December 6, 2017, 1:00-2:00pm ET
    • Presenter: Mike Heroux, SNL, and Lois McInnes, ANL
    • Archives: Slides (PDF) | Video (YouTube) | Q&A (PDF)
    • Description: Better Scientific Software (BSSw) is an organization dedicated to improving developer productivity and software sustainability for computational science and engineering (CSE).  This presentation will introduce a new website (https://bssw.io )—a community exchange for scientific software improvement.  We’re creating a clearinghouse to gather, discuss, and disseminate experiences, techniques, tools, and other resources to improve software productivity and sustainability for CSE. Site users can find information on scientific software topics and can propose to curate or create new content based on their own experiences. The backend enables collaborative content development using standard GitHub tools and processes.  We need your contributions to build the BSSw site into a vibrant resource, with content and editorial processes provided by volunteers throughout the international CSE community.  Join us!

Planned Future Topics

  • Software Sustainability
  • Automated Testing
  • Coding standards
  • Code review
  • CMake
  • Spack for Package Management

Suggestions: Want to request another topic?  Want to give a webinar?  Email us at IDEASProductivity@gmail.com.

Past Webinars

Listed in reverse chronological order.

2017

  1. Managing Defects in HPC Software Development (2017-11-01)
    • Presenter: Tom Evans, ORNL
    • Archives: Slides (PDF) | Video (YouTube)
    • Description: Software Quality Engineering (SQE) and methods research and scientific investigation are often thought to be incompatible.  However, in reality they are not only compatible, but required in order to have confidence in the results of even basic scientific computations.  This is especially true for parallel software.  In this talk we will look at methods for performing software verification.  Software verification is a method for removing defects at code construction time; these techniques can help in both algorithm and method development, as well as increased productivity.
  2. Barely Sufficient Project Management: A few techniques for improving your scientific software development efforts (2017-09-13)
    • Presenter: Michael A. Heroux, SNL
    • Archives: Slides (PDF) | Video (YouTube) | Q&A (PDF)
    • Description: Software development is an essential activity for many scientific teams.  Modeling, simulation and data analysis, using team-developed software, are increasing valuable for scientific discovery and engineering. Many teams use informal, ad hoc approaches for managing their software efforts.  While sufficient for many efforts, a modest emphasis on team models and processes can substantially improve developer productivity and software sustainability. In this presentation, we discuss several light-weight techniques for managing scientific software efforts.  Using checklists, policy statements and a Kanban workflow system, we emphasize techniques for managing the initiation and exit of team members, approaches to synthesizing team culture, and ways to improve communication within a team and with its stakeholders.
  3. Using the Roofline Model and Intel Advisor (2017-08-16)
    • Presenter: Sam Williams, LBNL, and Tuomas Koskela, NERSC
    • Archives: Slides (PDF) | Video (YouTube) | Q&A (PDF)
    • Description: In this webinar, we will begin by introducing the Roofline Model and its “Cache-Aware” variant. We will proceed with some general guidelines and historical approaches to Roofline-based program analysis. Next, we will provide a short discussion of how changes in data locality and arithmetic intensity of two canonical benchmarks visually manifest in the context of these two Roofline formulations. Subsequently, we will provide two demonstrations of using Intel Advisor and the Roofline model within Intel Advisor. The first demo will be primarily instructive on how to compile, benchmark, and use Advisor. The second demo will focus on using variants of a simple benchmark to highlight changes in the Roofline model as well as providing correlation to Advisor’s other capabilities. We will conclude with a few comments on future directions.
  4. Intermediate Git (2017-07-12)
    • Presenter: Roscoe A. Bartlett, SNL
    • Archives: Slides (PDF) | Git Tutorial and Reference Collection (PDF) | Video (YouTube) | Q&A (PDF)
    • Description: This presentation will emphasize intermediate-level tutorial and reference information about the Git version control (VC) system. This overview takes the view that the best way to learn to use Git effectively is to learn it as a data structure and a set of algorithms to manipulate that data structure. This perspective is important because the Git command-line interface is widely considered to be overly complex and confusing. For example, a Git command like ‘checkout’ can do wildly different things depending on the other arguments passed into the command or the state of the Git repository.  But Git is still the dominant VC system; many people consider that Git has won the version control wars due to its power and flexibility. 
  5. Python in HPC (2017-06-07)
    • Presenters: Rollin Thomas, NERSC; William Scullin, ANL; Matt Belhorn, ORNL
    • Archives: Slides (PDF) | Video (YouTube) | Q&A (PDF)
    • Description: Python’s powerful elegance has driven its adoption at HPC centers for job orchestration, visualization, exploratory data analysis, and even simulation.  But maximizing performance from Python applications can be challenging especially on supercomputing architectures.  This webinar will explain those challenges with a practical emphasis on using Python at NERSC, ALCF, and OLCF.  We will outline a variety of performance optimization strategies, tools for measuring and addressing performance problems, and establish best practices for Python in HPC.

2016

  1. Basic Performance Analysis and Optimization – An Ant Farm Approach (2016-08-09)
    • Presenter: Jack Deslippe, NERSC
    • Archives: Slides (PDF) | Video (YouTube)
    • Description: How is optimizing HPC applications like an Ant Farm? Attend this presentation to find out. We’ll discuss the basic concepts around optimizing code for the HPC systems of today and tomorrow. These systems require codes to effectively exploit both parallelism between nodes and an ever growing amount of parallelism on-node. We’ll discuss profiling strategies, tools (for profiling and debugging) and common issues with both internode communication and on-node parallelism. We will give an overview of traditional optimizations areas in HPC applications like parallel IO and MPI strong and weak scaling as well as topics relevant for modern GPU and many-core systems like threading, SIMD/AVX, SIMT and effectively using cache and memory hierarchies. The “Ant Farm” approach places a heavy emphasis on the roofline performance model and encouraging users to understand the compute, bandwidth and latency sensitivity of their applications and kernels through a series of easy to perform experiments and an easy to follow flow chart. Finally, we’ll discuss what we expect to change in the optimization process as we move towards exascale computers.
  2. An Introduction to High-Performance Parallel I/O (2016-07-28)
    • Presenter: Feiyi Wang, OLCF. Feiyi Wang received his Ph.D. in Computer Engineering from North Carolina State University (NCSU). Before he joined Oak Ridge National Laboratory as research scientist, he worked at Cisco Systems and Microelectronic Center of North Carolina (MCNC) as a lead developer and principal investigator for several DARPA-funded projects.  His current research interests include high performance storage system, parallel I/O and file systems, fault tolerance and system simulation, and scientific data management and integration.  Dr. Wang is a Joint Faculty Professor at EECS Department of University of Tennessee and a senior member of IEEE.
    • Archives: Slides (PDF) | Video (YouTube)
    • Description: Parallel data management is a complex problem at large-scale HPC environments. The HPC I/O stack can be viewed as a multi-layered cake and presents an high-level abstraction to the scientists. While this abstraction shields the users from many of the I/O system details, it is very hard to obtain parallel I/O performance or functionality without understanding the end-to-end hierarchical I/O stack in today’s modern complex HPC environments. This talk will introduce the basic parallel I/O concepts and will provide guidelines on obtaining better I/O performance on large-scale parallel platforms.
  3. How the HPC Environment is Different from the Desktop (and Why)  (2016-07-14)
    • Presenter: Katherine Riley, ALCF
    • Archives: Slides (PDF) | Video (YouTube)
    • Description: High performance computing has transformed how science and engineering research is conducted.  Answering a question in 30 minutes that used to take 6 months can quickly change the way one asks questions.  Large computing facilities provide access to some of the world’s largest computing, data, and network resources in the world.  Indeed, the DOE complex has the highest concentration of supercomputing capability in the world.  However, by nature of their existence, making use of the largest computers in the world can be a challenging and unique task. This talk will discuss how supercomputers are unique and explain how that impacts their use.
  4. Testing and Documenting your Code (2016-06-15)
    • Presenter: Alicia Klinvex, SNL
    • Archives: Slides (PDF) | Video (YouTube)
    • Description: Software verification and validation are needed for high-quality and reliable scientific codes. For software with moderate to long lifecycles, a strong automated testing regime is indispensable for continued reliability. Similarly, comprehensive and comprehensible documentation is vital for code maintenance and extensibility. This presentation will provide guidelines on testing and documentation that can help to ensure high-quality and long-lived HPC software. We will present methodologies, with examples, for developing tests and adopting regular automated testing. We also will provide guidelines for minimum, adequate, and good documentation practices depending on the available resources of the development team.
  5. Distributed Version Control and Continuous Integration Testing (2016-06-02 )
    • Presenter: Jeff Johnson, LBNL
    • Archives: Slides (PDF) | Video (YouTube)
    • Description: Recently, many tools and workflows have emerged in the software industry that have greatly enhanced the productivity of development teams. GitHub, a site that hosts projects in Git repositories, is a popular platform for open source and closed source projects.  GitHub has encoded several best practices into easily followed procedures such as pull requests, which enrich the software engineering vocabularies of non-professionals and professionals alike.  GitHub also provides integration to other services (for example, continuous integration such as Travis CI, which allows code changes to be automatically tested before they are merged into a master development branch).   This presentation will discuss how to set up a project on GitHub, illustrate the use of pull requests to incorporate code changes, and show how Travis CI can be used to boost confidence that changes will not break existing code.
  6. Developing, Configuring, Building, and Deploying HPC Software (2016-05-18)
    • Presenter: Barry Smith, ANL
    • Archives: Slides (PPTX) | Video (YouTube)
    • Description: The process of developing HPC software requires consideration of issues in software design as well as practices that support the collaborative writing of well-structured code that is easy to maintain, extend, and support.  This presentation will provide an overview of development environments and how to configure, build, and deploy HPC software using some of the tools that are frequently used in the community.  We will also discuss ways in which these and other tools are best utilized by various categories of scientific software developers, ranging from small teams (for example, a faculty member and graduate students who are writing research code intended primarily for their own use) through moderate/large teams (for example, collaborating developers spread among multiple institutions who are writing publicly distributable code intended for use by others in the community).
  7. What All Codes Should Do:  Overview of Best Practices in HPC Software Development (2016-05-04)
    • Presenter: Anshu Dubey, ANL
    • Archives: Slides (PDF) | Video (YouTube)
    • Description: Scientific code developers have increasingly been adopting software processes derived from the mainstream (non-scientific) community.  Software practices are typically adopted when continuing without them becomes impractical. However, many software best practices need modification and/or customization, partly because the codes are used for research and exploration, and partly because of the combined funding and sociological challenges. This presentation will describe the lifecycle of scientific software and important ways in which it differs from other software development.  We will provide a compilation of software engineering best practices that have generally been found to be useful by science communities, and we will provide guidelines for adoption of practices based on the size and the scope of the project.