swift-mlx-lm
MLX Swift LM - Run LLMs and VLMs on Apple Silicon using MLX. Covers local inference, streaming, wired memory coordination, tool calling, LoRA fine-tuning, embeddings, and model porting.
hf-mem
CLI to estimate the required memory to load either Safetensors or GGUF model weights for inference from the Hugging Face Hub
Massive Context Mcp
Handles 10M+ token contexts with chunking, sub-queries, and local Ollama inference.
magpie
Performs GPU kernel correctness and performance evaluation and LLM inference benchmarking with Magpie. Analyzes single or multiple kernels (HIP/CUDA/PyTorch), compares kernel implementations, runs vLLM/SGLang benchmarks with profiling and TraceLens, and runs gap analysis on torch traces. Creates kernel config YAMLs, discovers kernels in a project, and queries GPU specs. Use when the user mentions Magpie, kernel analyze or compare, HIP/CUDA kernel evaluation, vLLM/SGLang benchmark, gap analysis, TraceLens, creating kernel configs, or discovering GPU kernels.
zustand-state-management
Build type-safe global state in React with Zustand. Supports TypeScript, persist middleware, devtools, slices pattern, and Next.js SSR with hydration handling. Prevents 6 documented errors. Use when setting up React state, migrating from Redux/Context, or troubleshooting hydration errors, TypeScript inference, infinite render loops, or persist race conditions.
Automated TikTok-style video review of every PR.
Scaffold a tiktest.md config file in the current project by inferring dev-server URL, start command, and auth flow from README, package.json, and framework configs. Use when the user wants to set up tik-test for a new project, or when /tiktest:run / /tiktest:quick says "no tiktest.md found".
alterlab-arboreto
Infer gene regulatory networks (GRNs) from gene expression data using scalable algorithms (GRNBoost2, GENIE3). Use when analyzing transcriptomics data (bulk RNA-seq, single-cell RNA-seq) to identify transcription factor-target gene relationships and regulatory interactions. Supports distributed computation for large-scale datasets. Part of the AlterLab Academic Skills suite.
remote-training
Manages remote training infrastructure on Nebius VMs. Use for building/pushing Docker images, starting/stopping VM machines (train, train2, train3), running training jobs, dataset generation, and starting inference servers.
Add Morpheus â Decentralized Inference for NanoClaw
huggingface-transformers
Hugging Face Transformers best practices including model loading, tokenization, fine-tuning workflows, and inference optimization. Use when working with transformer models, fine-tuning LLMs, implementing NLP tasks, or optimizing transformer inference.
Io.Github.WattCoin Org/Wattcoin Mcp Server
Earn WATT tokens on Solana — tasks, skills marketplace, AI inference, and bounties.
ATOM Pricing Intelligence
The Global Price Benchmark for AI Inference. 1,600+ SKUs, 40+ vendors, 14 price indexes.
orchestrator-worktree-conventions
Project folder + git-worktree conventions for Agent Orchestrator. Use when creating a new project under ~/GitHub, converting an existing repo into a master/workN worktree layout, adding worktrees, choosing safe ports, or when the orchestrator needs to infer project/worktree paths from the folder structure.
Io.Github.Octid Io/Osmp
Agentic AI instruction encoding. 60%+ compression over JSON. Inference-free decode. Any channel.
awq-quantization
Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner.
livekit-agents
Build voice AI agents with LiveKit Cloud and the Agents SDK. Use when the user asks to "build a voice agent", "create a LiveKit agent", "add voice AI", "implement handoffs", "structure agent workflows", or is working with LiveKit Agents SDK. Provides opinionated guidance for the recommended path: LiveKit Cloud + LiveKit Inference. REQUIRES writing tests for all implementations.
CSV Data Profiler
Analyzes CSV datasets to produce column-level statistics, missing value reports, type inference, and data quality scores.
brev-cli
Manage GPU and CPU cloud instances with the Brev CLI for ML workloads and general compute. Use when users want to create instances, search for GPUs or CPUs, SSH into instances, open editors, copy files, port forward, manage organizations, or work with cloud compute. Supports fine-tuning, reinforcement learning, training, inference, batch processing, and other ML/AI workloads. Trigger keywords - brev, gpu, cpu, instance, create instance, ssh, vram, vcpu, A100, H100, cloud gpu, cloud cpu, remote machine, finetune, fine-tune, RL, RLHF, training, inference, deploy model, serve model, batch job.
add-model
Add a new AI model to the Pipelex inference system. Guides through all required steps: backend TOML configuration (OpenAI, Azure, Anthropic, Google, etc.), kit sync, test profile collections, and fixture regeneration. Use when the user says "add a model", "add GPT-X", "add Claude X", "new model", "register a model", "add Gemini X", "support model X", "add model to backend", or any variation of introducing a new AI model to the inference configuration. Also use when the user mentions a model name that doesn't exist in the backend configs yet and wants to add it.
Code Pathfinder
Code intelligence MCP server: call graphs, type inference, and symbol search for Python/Go.
benchmark-model
Benchmark inference performance for a specific model
r-bayes
Patterns for Bayesian inference in R using brms, including multilevel models, DAG validation, and marginal effects. Use when performing Bayesian analysis.
ai-services
Configure DigitalOcean Gradient AI serverless inference and Agent Development Kit. Use when adding LLM inference, model access keys, serverless AI endpoints, or building AI agents with ADK on App Platform.
Io.Github.Kimbo128/Drain Mcp
Pay for AI inference with USDC micropayments on Polygon. No API keys needed.
docker-model-runner
Skills for using Docker Model Runner to run local LLM inference
Terradev
Complete GPU infrastructure for Claude Code — 192 MCP tools for provisioning, training, inference
vowline
General operating skill for AI agents handling meaningful work across domains: ambiguous requests, multi-step execution, tool use, coding, debugging, research, writing, artifacts, planning, review, decisions, visual work, prompt work, and handoff. Use when intent inference, safe action, evidence, verification, concise reporting, or completion criteria matter, including alongside narrower active skills. Skip only trivial one-shot replies.
abstract-domain-explorer
Applies abstract interpretation using different abstract domains (intervals, octagons, polyhedra, sign, congruence) to statically analyze program variables and infer invariants, value ranges, and relationships. Use when analyzing program properties, inferring loop invariants, detecting potential errors, or understanding variable relationships through static analysis.
inference-server
Start and test the prime-rl inference server. Use when asked to run inference, start vLLM, test a model, or launch the inference server.