gpu-kernels

Runtime correctness checker for custom CUDA kernels. Attach a single decorator to periodically verify outputs against a reference implementation, with outlier-biased sampling and zero training graph impact.

deep-learning gpu cuda python3 pytorch triton gpu-kernels pytohn

Updated Mar 13, 2026
Python

fulvius31 / triton-cache-tracker

Star

A lightweight utility for monitoring and analyzing Triton kernel compilation cache behavior.

gpu cache cuda triton gpu-kernels triton-openai

Updated Jun 30, 2025
Python

kalyani-25 / Reimplementation_flash-attention-from-scratch

Star

16-step CUDA optimization of FlashAttention-2 achieving 99.2% of official performance on A100 — Ampere architecture

deep-learning cuda pytorch ampere gpu-kernels nsight llm-inference flashattention

Updated Mar 6, 2026
Cuda

SergiuDeveloper / self-attention-cuda-kernel-comparison

Star

Benchmarking hand-written CUDA C, Numba, and Triton self-attention kernels against PyTorch's SDPA - how fast can you go depending on the tool?

deep-learning cuda torch pytorch transformer triton cuda-kernels numba gpu-kernels self-attention

Updated Mar 18, 2026
Python

anviit / triton-llm-kernels

Star

LLM primitives rebuilt in Triton — FlashAttention 2.52×, fused AdamW 3.45×, Bias+GELU 14.65× faster than PyTorch

deep-learning cuda inference pytorch triton gpu-kernels llm flash-attention

Updated Mar 18, 2026
Python

DanielChaeS / GEM-A-Deep-Learning-Workload-Optimizer

Star

Triton optimizations ran on AMD GPU

deep-learning amd triton rocm gpu-kernels

Updated Oct 20, 2025
Python

InboraStudio / Vulkan-AMD-developed-RadeonRays

Sponsor

Star

A ray intersection acceleration library for heterogeneous hardware and software systems That Used AMD developed RadeonRays to help developers make the most of GPU, APUs and CPUs,

kernel amd gpu drivers gpu-kernels vulkna

Updated Mar 4, 2026
C++

Improve this page

Add a description, image, and links to the gpu-kernels topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gpu-kernels topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu-kernels

Here are 21 public repositories matching this topic...

ROCm / rocprofiler-compute

xmartlabs / cuda-calculator

dlsys-course / assignment2-2017

eyalroz / gpu-kernel-runner

upenn-acg / gpudrano-static-analysis_v1.0

beehive-lab / beehive-spirv-toolkit

anoojpatel / metaxu

hliadis / High-Performance-Computing

sean1832 / Macho

martini9393 / gpu-executor

shyamsridhar123 / MedAssist_MOE

poyea / lollipop

AregGevorgyan / JaxonFlow

SergiuDeveloper / cuda-kernel-verifier

fulvius31 / triton-cache-tracker

kalyani-25 / Reimplementation_flash-attention-from-scratch

SergiuDeveloper / self-attention-cuda-kernel-comparison

anviit / triton-llm-kernels

DanielChaeS / GEM-A-Deep-Learning-Workload-Optimizer

InboraStudio / Vulkan-AMD-developed-RadeonRays

Improve this page

Add this topic to your repo