kernel

Skills tagged with #kernel

cuda-kernels

Provides guidance for writing and benchmarking optimized CUDA kernels for NVIDIA GPUs (H100, A100, T4) targeting HuggingFace diffusers and transformers libraries. Supports models like LTX-Video, Stable Diffusion, LLaMA, Mistral, and Qwen. Includes integration with HuggingFace Kernels Hub (get_kernel) for loading pre-compiled kernels. Includes benchmarking scripts to compare kernel performance against baseline implementations.

huggingface/kernels

18d ago

4980

@AMD-AGI

magpie

Performs GPU kernel correctness and performance evaluation and LLM inference benchmarking with Magpie. Analyzes single or multiple kernels (HIP/CUDA/PyTorch), compares kernel implementations, runs vLLM/SGLang benchmarks with profiling and TraceLens, and runs gap analysis on torch traces. Creates kernel config YAMLs, discovers kernels in a project, and queries GPU specs. Use when the user mentions Magpie, kernel analyze or compare, HIP/CUDA kernel evaluation, vLLM/SGLang benchmark, gap analysis, TraceLens, creating kernel configs, or discovering GPU kernels.

agentic-data-science-competition

AI Agent-driven Kaggle competition workflow. Learn from real competition experience: score stabilization patterns, submission troubleshooting, kernel workflows, GPU task delegation, and the spec-driven development approach that achieved top leaderboard positions. Use when: working on any Kaggle competition, analyzing submission failures, setting up automated pipelines, or replicating top notebook solutions.

kagglecompetitionmlagentsautomationdata-science

FrankS-IntelLab/agentic-kaggle-skill

2d ago

120

@guqiong96

diffusion-perf

Deprecated alias (merged into diffusion-kernel).

ipsw

Apple firmware and binary reverse engineering with the ipsw CLI tool. Use when analyzing iOS/macOS binaries, disassembling functions in dyld_shared_cache, dumping Objective-C headers from private frameworks, downloading IPSWs or kernelcaches, extracting entitlements, analyzing Mach-O files, or researching Apple security. Triggers on requests involving Apple RE, iOS internals, kernel analysis, KEXT extraction, or vulnerability research on Apple platforms.

blacktop/ipsw-skill+1 more

5d ago

430

@Orchestra-Research

awq-quantization

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner.

OptimizationAWQQuantization4-BitActivation-AwareMemory Optimization

Orchestra-Research/AI-research-SKILLs+46 more

Io.Github.RightNow AI/Forge Mcp Server

Turn PyTorch into fast CUDA/Triton kernels on real datacenter GPUs with up to 14x speedup.

mcpgithubai

RightNow-AI/forge-mcp-server

19d ago

@onkernel

MCP

Kernel Mcp Server

Access Kernel's cloud-based browsers and app actions via MCP (remote HTTP + OAuth).

mcpbrowser

onkernel/kernel-mcp-server

19d ago

@sgl-project

add-jit-kernel

Step-by-step tutorial for adding a new lightweight JIT CUDA kernel to sglang's jit_kernel module

sgl-project/sglang+5 more

18d ago

24.4K0

@gmh5225

anti-cheat-systems

Guide for modern game anti-cheat architecture, Windows kernel monitoring, and detection tradeoffs. Use this skill when analyzing EAC, BattlEye, Vanguard, FACEIT AC, kernel callbacks, handle protection, manual-map detection, boot-start drivers, BYOVD, DMA threats, or behavioral telemetry in game security research.

gmh5225/awesome-game-security+8 more

18d ago

2.8K0

@jaccen

3dgs-code-reviewer

Review 3D Gaussian Splatting implementation code for correctness, performance bugs, and best practices. Covers CUDA kernels, rendering pipeline, training loop, loss functions, and common pitfalls. Detects 42+ known bug patterns.

3dgsgaussian-splattingcode-reviewcudadebuggingperformance

jaccen/Awesome-Gaussian-Skills+3 more

12h ago

@flashinfer-ai

add-reference-tests

Add pytest tests to validate reference implementations in flashinfer_trace against FlashInfer or SGLang ground truth. Use when validating kernel definitions, adding tests for new op_types, or verifying reference implementations are correct.

flashinfer-ai/flashinfer-bench+3 more

HelioSPICE

Spacecraft ephemeris made easy — auto-managed SPICE kernels for heliophysics

mcpgithub

huangzesen/heliospice

19d ago

@moshthepitt

channel-telegram

Run and operate a Telegram channel worker for LionClaw using the kernel channel bridge APIs.

moshthepitt/lionclaw+1 more

4d ago

@Lakr233

kernel-analysis-vphone600

Analyze vphone600 kernel artifacts using the local symbol database and XNU source tree. Use when working on kernel reverse engineering, address-to-symbol lookup, release-vs-research kernel comparison, or patch analysis for vphone600 variants in this repository.

XHelio-SPICE

Spacecraft ephemeris made easy — auto-managed SPICE kernels for heliophysics

mcpgithub

huangzesen/xhelio-spice

19d ago

@flagos-ai

kernelgen-flagos

Unified GPU kernel operator generation skill. Automatically detects the target repository type (FlagGems, vLLM, or general Python/Triton) and dispatches to the appropriate specialized sub-skill. Also includes a feedback submission sub-skill for bug reports. Use this skill when the user wants to generate a GPU kernel operator, create a Triton kernel, or says things like "generate an operator", "create a kernel for X", or "/kernelgen-flagos". This single skill replaces the need to install kernelgen-general, kernelgen-for-flaggems, kernelgen-for-vllm, and kernelgen-submit-feedback separately.

flagos-ai/KernelGen

18d ago

300