Data-Intensive High Performance Computing Pedagogic Modules

About

These open source pedagogic modules are developed as part of NSF Grants 1849559 and 2042155 (PI: Gowanlock).

These modules are used in Mike Gowanlock’s high performance computing class at Northern Arizona University.

Overview

These pedagogic modules teach high performance computing (HPC) using data-intensive computing to unearth key concepts studied in typical HPC courses. The data-intensive lens allows students to understand real-world scenarios that arise when working with data. Some examples contained in these pedagogic modules include:

Algorithm performance that varies as a function of data distribution.
Load imbalance that arises due to skewed data distributions.
Memory-bound applications that may benefit from scaling out rather than scaling up to take advantage of more memory bandwidth rather than CPU cores.

These pedagogic modules are targeted towards small clusters, such that students can obtain resources shortly after submitting their job to a job queue. Many of the key concepts in the modules can be discovered using a typical workstation computer with roughly 24 physical cores; however, the modules typically exceed the capacity of a laptop computer. In most cases, the input sizes and computational intensity can be increased to scale on more nodes/cores than presented in these modules.

Audience

Computer scientists and computational/domain scientists can benefit from these modules.

Questions

Please e-mail me if:

You have any questions.
You are an instructor wanting to use these modules, and would like solutions to the problems.
You have detected bugs in the modules or material that should be clarified.

Acknowledgments

This material is based upon work supported by the National Science Foundation under Grants 1849559 and 2042155.