N-Ways to GPU Programming Bootcamp
-
Updated
Oct 10, 2024 - Jupyter Notebook
N-Ways to GPU Programming Bootcamp
Easier, quicker command-line CUDA profiling
Profiling with NVIDIA Nsight Tools Bootcamp
N-Ways to GPU Programming Bootcamp
Julia tools for NVIDIA Nsight Systems
References content from the OLCF CUDA Training Series. (https://github.com/olcf/cuda-training-series)
Reproducible Pascal GPU Unified Memory benchmark with Nsight and nvprof profiling
Lecture slides and exercise files of the CUDA 4 Dummies course (2025)
Profile-first ML systems project optimizing a multi-camera end-to-end driving model for hardware efficiency using PyTorch, CUDA streams, NVTX instrumentation, and Nsight Systems.
Real-time CUDA physics engine for N-body gravity, SPH fluids, and rigid-body collisions. Uses shared-memory tiling, kernel fusion, and spatial hashing on RTX 4080/4090.
Profiling with Precision. Documenting with Style.
libHPC is a high-performance computing library focused on Linux and Windows environments. It provides SIMD-optimized kernels, concurrent data structures, GPU utilities, and HPC-oriented memory management components.
This project demonstrates the integration of a CUDA kernel within an NVIDIA Holoscan application. It consists of two custom operators: one for memory allocation and data initialization, and another for executing the CUDA kernel. The application was profiled using Nsight systems and the kernel with Nsight compute
A systematic framework for studying MoE inference optimization strategies including expert-aware batching, expert placement (co-located vs. distributed), and scaling analysis across single-GPU, multi-GPU, and multi-node configurations.
Add a description, image, and links to the nsight-systems topic page so that developers can more easily learn about it.
To associate your repository with the nsight-systems topic, visit your repo's landing page and select "manage topics."