CAREER: Exploiting Parallel Heterogeneous Architectures to Enable Time-domain Astronomy in the LSST era


The Vera C. Rubin Observatory will carry out the Legacy Survey of Space and Time (LSST) over a ten year period. This project focuses on LSST supporting cyberinfrastructure (CI) in the context of Solar System science. The project will directly support the Solar System Notification and Alert Processing System (SNAPS), which is a Rubin-approved alert broker. SNAPS will send alerts the the astronomy community to enable other telescopes to perform follow-up observations of objects of interest in the Solar System.

To ensure that rapid follow-up opportunities are possible, two types of unsupervised learning algorithms are needed: (i) near real-time outlier detection that occurs during nighttime observing and alerts the community of interesting events on small bodies; and (ii) outlier detection that determines if a small body is intrinsically interesting relative to all other small bodies. Addressing these two goals necessitates the advancement of new outlier detection algorithms that address very different workloads, and thus present several interesting computational challenges. These new algorithms will be incorporated into SNAPS CI and made publicly available for incorporation into other alert brokers and standalone use.

The fast and scalable outlier detection algorithms are the missing step between Rubin data measurements and comprehensive scientific investigations for Solar System science and other science cases addressed by other alert brokers that require detecting objects exhibiting interesting (outlying) behavior.

The project will address new facets of parallel heterogeneous computing including:

  • Examining the potential of new application specific integrated circuits (ASICs), such as tensor and ray tracing cores found on recent generations of GPUs, for accelerating outlier detection tasks.
  • Examining the scalability of heterogeneous systems, including compute nodes with multi-core CPUs and GPUs, the latter of which are equipped with the abovementioned ASICs.
  • A systematic exploration of the algorithm design space to understand which hardware configurations and algorithm design choices are best suited for a given workload/science case.

NSF Award Information

NSF Grant No. 2042155

Broader Impact: Pedagogic Modules [Coming Soon]

Part of this project includes pedagogic modules for teaching general purpose computing on graphics processing units using data-intensive applications. The modules will also feature the use of ASICs found on GPUs (tensor and ray tracing cores). These modules are intended to be integrated into courses at the undergraduate and graduate levels. Once available, the modules will be posted at the following link. Website


  • CUDA-DClust+: Revisiting Early GPU-Accelerated DBSCAN Clustering Designs [PDF]
    Poudel, M., & Gowanlock, M.
    Proceedings of the 28th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2021), pp. 354–363. DOI:


Most of the code in the publications above is publicly available in the repositories below.