Skills

All Skills

cuda

Skills tagged with #cuda

@huggingface

cuda-kernels

Provides guidance for writing and benchmarking optimized CUDA kernels for NVIDIA GPUs (H100, A100, T4) targeting HuggingFace diffusers and transformers libraries. Supports models like LTX-Video, Stable Diffusion, LLaMA, Mistral, and Qwen. Includes integration with HuggingFace Kernels Hub (get_kernel) for loading pre-compiled kernels. Includes benchmarking scripts to compare kernel performance against baseline implementations.

huggingface/kernels
18d ago
4980
@Apra-Labs

aprapipes-devops

Diagnose and fix ApraPipes CI/CD build failures across all platforms (Windows, Linux x64/ARM64, Jetson, macOS, Docker). Handles vcpkg dependencies, GitHub Actions workflows, self-hosted CUDA runners, and platform-specific issues. Use when builds fail or when modifying CI configuration.

Apra-Labs/ApraPipes
18d ago
330
@NVIDIA-NeMo

cuda-graphs

Validate and use CUDA graph capture in Megatron Bridge, including local full-iteration graphs and Transformer Engine scoped graphs for attention, MLP, and MoE modules.

NVIDIA-NeMo/Megatron-Bridge+9 more
18d ago
4970
@maxiaosong1124

ncu-cuda-profiling

Automated NCU (Nsight Compute) profiling workflow with full metrics collection and persistent storage

cudaprofilingncuperformanceoptimization
maxiaosong1124/ncu-cuda-profiling-skill
18d ago
610
@sgl-project

add-jit-kernel

Step-by-step tutorial for adding a new lightweight JIT CUDA kernel to sglang's jit_kernel module

sgl-project/sglang+5 more
18d ago
24.4K0
@jaccen

3dgs-code-reviewer

Review 3D Gaussian Splatting implementation code for correctness, performance bugs, and best practices. Covers CUDA kernels, rendering pipeline, training loop, loss functions, and common pitfalls. Detects 42+ known bug patterns.

3dgsgaussian-splattingcode-reviewcudadebuggingperformance
jaccen/Awesome-Gaussian-Skills+3 more
13h ago
70