rune-diff
Compare a RUNE specification against its implementation to detect drift. Use when verifying that code still matches its spec after modifications, or when auditing existing implementations against their contracts.
cuda-kernels
Provides guidance for writing and benchmarking optimized CUDA kernels for NVIDIA GPUs (H100, A100, T4) targeting HuggingFace diffusers and transformers libraries. Supports models like LTX-Video, Stable Diffusion, LLaMA, Mistral, and Qwen. Includes integration with HuggingFace Kernels Hub (get_kernel) for loading pre-compiled kernels. Includes benchmarking scripts to compare kernel performance against baseline implementations.
Npm Search
npm search MCP — find packages, view details, compare, check downloads.
solodit
Search Solodit for similar smart contract security findings. Use when reviewing vulnerabilities, comparing to known issues, or researching prior art from real audits.
Umbrela Eval
Analyze and compare umbrela evaluation results across backends, models, and configurations.
daily-review
End of day review - compare planned vs actual, update task statuses. Part of chief-of-staff system.
Io.Github.Houtini Ai/Fanout
Multi-URL comparative content analysis with topical gap detection
skill-forge-benchmark
Benchmark Claude Code skill performance with variance analysis, tracking pass rate, execution time, and token usage across iterations. Runs multiple trials per eval for statistical reliability, aggregates results into benchmark.json, and generates comparison reports between skill versions. Use when user says "benchmark skill", "measure skill performance", "skill metrics", "compare skill versions", "skill performance", "track skill improvement", "skill regression test", or "skill A/B test".
Npm Plus
npm MCP — search packages, bundle sizes, vulnerabilities, compare downloads.
Ticket Discovery
Search and compare concert ticket prices across platforms
docs-audit-and-refresh
Audit the repository's docs/ content against the current codebase, find missing, incorrect, or stale documentation, and refresh the affected pages. Use when the user asks to review docs coverage, find outdated docs, compare docs with the current repo, or fix documentation drift across features, settings, tools, or integrations.
Io.Github.Newageflyfish Max/Volthq
Compute price oracle for AI agents — compare pricing across 8 providers in real time.
Verify Composite Expansion
Verify the correctness of composite action expansion by comparing CLI output with actual GitHub Actions logs.
clone-patterns
Learn From the Best â analyze patterns from any codebase and apply them to yours. Use when user wants to adopt best practices from another repo, compare code quality, or learn how top projects are structured.
magpie
Performs GPU kernel correctness and performance evaluation and LLM inference benchmarking with Magpie. Analyzes single or multiple kernels (HIP/CUDA/PyTorch), compares kernel implementations, runs vLLM/SGLang benchmarks with profiling and TraceLens, and runs gap analysis on torch traces. Creates kernel config YAMLs, discovers kernels in a project, and queries GPU specs. Use when the user mentions Magpie, kernel analyze or compare, HIP/CUDA kernel evaluation, vLLM/SGLang benchmark, gap analysis, TraceLens, creating kernel configs, or discovering GPU kernels.
bom
BOM (Bill of Materials) management for electronics projects â the primary orchestrator skill that coordinates DigiKey, Mouser, LCSC, element14, JLCPCB, PCBWay, and KiCad skills into a unified workflow. Create, update, and maintain BOMs with part numbers, costs, quantities stored as KiCad symbol properties. ALWAYS trigger this skill for any task involving component sourcing, pricing, ordering, distributor searches, BOM export, or fabrication preparation â even if the user names a specific distributor or fab house (e.g. "search DigiKey for...", "generate JLCPCB BOM", "order from Mouser"). This skill decides which distributor/fab skills to invoke and in what order. Also trigger on phrases like "what parts do I need", "order components", "how much will this cost", "export for JLCPCB", "find parts for this board", "cost estimate", "compare pricing", or "check stock".
Agentry — AI Agent Directory
AI agent directory — search and compare 120+ agents across 12 categories.
design-compare
Compare Figma designs against implementation screenshots, identifying layout, typography, color, and sizing discrepancies. Generates a structured visual review table and an interactive HTML comparison page with swipe and side-by-side modes. Use when the user asks to compare design with preview, compare Figma with screenshot, check design implementation, or provides a Figma URL alongside a screenshot.
Koko Finance
AI credit card advisor - search cards, compare portfolios, and optimize rewards
Io.Github.N0isy/Mcp Icon Visual
Search, retrieve, compare, and render SVG icons visually for AI agents.
dotnet-inspect
Query .NET APIs across NuGet packages, platform libraries, and local files. Search for types, list API surfaces, compare versions, find extension methods and implementors. Use whenever you need to answer questions about .NET library contents.
Rideshare Comparison
Compare Uber and Lyft prices for any route
b0
Delegate tasks to AI agents via Box0. Use when the user asks to review code, check security, run tests, compare tools, get multiple perspectives, research a topic, analyze data, write docs, or any task that could benefit from specialized or parallel execution. Also use when the user mentions agent names or says "ask", "delegate", "get opinions from", or "have someone".
context-analysis
Analyze plain text documents to understand their semantic structure and token distribution. Use when asked to analyze context, visualize token usage, segment text, identify components, create waffle charts, or compare multiple documents.
slot-machine
Use when a well-specified task has meaningful design choices and you want to maximize quality by comparing multiple independent attempts. Works for coding, writing, and custom task types. Triggers on "slot-machine", "best-of-N", "pull the lever", "parallel implementations", or when quality matters more than speed and the spec is clear enough for independent work.
sidecar
Spawn conversations with other LLMs (Gemini, GPT, ChatGPT, Codex, o3, DeepSeek, Qwen, Grok, Mistral, etc.) and fold results back into your context. TRIGGER when: user asks to talk to, chat with, use, call, or spawn another LLM or model; user mentions Gemini, GPT, ChatGPT, Codex, o3, DeepSeek, Claude (as a sidecar target), Qwen, Grok, Mistral, or any non-current model by name; user asks to get a second opinion from another model; user wants parallel exploration with a different model; user says "sidecar", "fork", or "fold". CRITICAL RULES: (1) ALWAYS launch sidecar CLI commands with Bash tool's run_in_background: true. Never run sidecar start/resume/continue in the foreground. (2) The fold summary returns on stdout when the user clicks Fold in the GUI or the headless agent finishes. Use TaskOutput to read it when the background task completes. (3) Use --prompt for the start command (NOT --briefing). --briefing is only for subagent spawn. (4) NEVER use o3 or o3-pro unless the user explicitly asks for it by name. These models are extremely expensive ($10-60+ per request). If the user asks for o3, warn them about the cost before proceeding. Default to gemini for most tasks. (5) When the user asks to query MULTIPLE LLMs simultaneously (e.g., "ask Gemini AND ChatGPT", "compare Gemini vs GPT"), ALWAYS use --no-ui (headless) for all of them unless the user explicitly requests interactive. Opening multiple Electron windows at once is disruptive. Launch them all in parallel with run_in_background: true.
ab-test-generator
Generate A/B test variants for affiliate content. Triggers on: "create A/B test", "test my headline", "optimize my CTA", "generate variants", "split test ideas", "improve click-through rate", "test my landing page copy", "headline alternatives", "CTA variations", "which version is better", "optimize conversions", "test my email subject line", "compare approaches".
pdf-diff
Visual PDF regression test comparing current Beamer output against a baseline branch. Use when checking if slide changes introduced visual regressions.
analytics-diagnostic-method
The spine of analytics investigation. Use whenever interpreting analytics numbers, answering "why did X change", reading funnels, comparing cohorts, or presenting findings. Teaches a five-step method (load profile, frame the question, build a MECE hypothesis tree, triangulate, present with Pyramid Principle), how to separate signal from noise, and how to spot Simpson's paradox before it misleads you.
Io.Github.PersistenceOne/Bridgekitty
Cross-chain bridge aggregator — compares LI.FI, deBridge, Relay, Across & more for best rates.
android-check-pr-translations
Audit translation accuracy for strings changed in a GitHub PR. Use when asked to check, review, audit, or verify translations in a pull request. Identifies inaccurate translated strings by comparing PR changes against the English source.
Hoteloracle
Hotel Intelligence MCP — search, price compare, area guides, price calendars via Google Hotels
Unity Agent Harness for Claude Code — auto-compile, test pipelines, cross-model review, out of the box.
Compare the current branch against develop and generate two review documents: architecture change diagram + PR review checklist.
sync-acp-spec
Sync the ACP (Agent Client Protocol) schema implementation with the official reference repo by comparing Rust source types against our Python Pydantic models.
climpred-forecast-verification
Verify weather and climate forecasts using climpred. Use when computing forecast skill metrics (RMSE, ACC, CRPS, etc.), comparing hindcasts to observations, bootstrapping significance, removing bias, or working with HindcastEnsemble/PerfectModelEnsemble objects. Triggers on: forecast verification, prediction skill, hindcast, climate prediction, skill score, predictability.
benchmark-context
Automatically benchmark your custom memory implementation against established systems like Supermemory. Set up a public benchmark, or create your own. Compare solutions against quality, latency, features and cost, easily, with a simple UI and CLI.
evalyn-analyze
Use when analyzing evalyn evaluation results, investigating failures, comparing runs, or understanding agent performance
golang-benchmark
Golang benchmarking, profiling, and performance measurement. Use when writing, running, or comparing Go benchmarks, profiling hot paths with pprof, interpreting CPU/memory/trace profiles, analyzing results with benchstat, setting up CI benchmark regression detection, or investigating production performance with Prometheus runtime metrics. Also use when the developer needs deep analysis on a specific performance indicator - this skill provides the measurement methodology, while golang-performance provides the optimization patterns.
Philidor DeFi Vault Risk Analytics
Search 700+ DeFi vaults, compare risk scores, analyze protocols. No API key needed.
bls-oews-api
Query the BLS Occupational Employment and Wage Statistics (OEWS) API for market wage data by occupation, geography, and industry. Trigger for any mention of BLS, Bureau of Labor Statistics, OEWS, OES, occupational wages, market wages, salary data, wage percentiles, median wage, mean wage, labor market rates, SOC codes, or geographic wage differentials. Also trigger when the user needs to compare wages across metro areas, benchmark contractor labor rates against market data, support IGCE development with market wage research, or validate price proposals against BLS data. Complements the GSA CALC+ skill (ceiling rates from awarded contracts) by providing independent market wage data from employer surveys. Together they form a complete pricing toolkit - BLS OEWS for what the market pays, CALC+ for what GSA contractors charge.
Io.Github.Shin Bot Litellm/Litellm Mcp
Give AI agents access to 100+ LLMs. Call any model, compare outputs.
dependabot-pr-rollup
Find open Dependabot PRs for the current GitHub repo, compare each PR head to its base branch, replay only the net dependency changes in a fresh worktree and branch, run npm validation, and optionally commit, push, and open a PR. Use when you want to batch or manually replicate active Dependabot updates.
browse-environments
Discover and inspect verifiers environments through the Prime ecosystem. Use when asked to find environments on the Hub, compare options, inspect metadata, check action status, pull local copies for inspection, or choose environment starting points before evaluation, training, or migration work.
Flightoracle
Flight Intelligence MCP — search, cheapest dates, multi-city, airline compare via Google Flights
Io.Github.KylinMountain/Web Fetch Mcp
MCP server for web content fetching, summarizing, comparing, and extracting information
compare-erb-js
Compare ERB and JavaScript template outputs for the offline scoring SPA. Use when working on ERB-to-JS conversion, debugging template parity issues, or verifying that changes to scoring views work correctly in both ERB and SPA modes.
Clinicaltrialsgov Mcp Server
MCP server for ClinicalTrials.gov v2 API. Search, retrieve, compare, and analyze trials.
check-config-diff
This skill should be used when the user asks to "check config diff", "compare upstream config", "diff rubocop config", "sync with Rails rubocop", or wants to see differences between the upstream Rails .rubocop.yml and this repository's config/rails.yml.
academic-pdf-to-gfm
Convert academic PDF papers to GitHub-renderable GFM markdown with inline figures and correctly formatted math equations. Use this skill when converting research papers, technical reports, or math-heavy PDFs for display on GitHub or GitLab. Also use it when GFM math equations are broken or not rendering on GitHub, when someone asks about the $$-vs-```math decision, when equations look garbled on GitHub, when KaTeX validation is needed, or when investigating why LaTeX renders locally but not on GitHub. Also use when comparing GitHub vs GitLab math rendering, when asking about self-hosting GitLab for math documents, or when looking for a platform that requires less LaTeX workarounds. Covers PDF type detection (Word vs LaTeX vs scanned), tool selection (pymupdf4llm/pdftotext/marker-pdf), image extraction, GitHub math rendering rules ($$-vs-```math decision), GitLab native math support (no workarounds needed), KaTeX validation, and multi-agent adversarial equation verification.
Encode stage -- stop before the loop, compare ALL inputs:
ref_ckpts = {"preloop": Checkpoint(save=True, stop=True)} run_reference_pipeline(ref_ckpts) ref_data = ref_ckpts["preloop"].data