benchmark-translate
Run a quality benchmark of the /translate skill by selecting stratified test keys, capturing ground truth, translating, judging with sub-agents, and compiling a regression report. Invoke with /benchmark-translate.
Skill Plan: HA alignment improvements
Goal - Improve HA best-practice alignment and UX without introducing regressions.
pyfoma-codebase
Rapidly onboard to pyfoma core internals (regex compiler + FST algorithms), make safe code changes, and avoid common semantic and performance regressions in fst.py, regexparse.py, atomic.py, algorithms.py, paradigm.py, and partition_refinement.py.
skill-forge-benchmark
Benchmark Claude Code skill performance with variance analysis, tracking pass rate, execution time, and token usage across iterations. Runs multiple trials per eval for statistical reliability, aggregates results into benchmark.json, and generates comparison reports between skill versions. Use when user says "benchmark skill", "measure skill performance", "skill metrics", "compare skill versions", "skill performance", "track skill improvement", "skill regression test", or "skill A/B test".
change-reaudit
Re-audit code changes to identify side effects, regression risks, and unhandled edge cases before merging or deploying.
fory-performance-optimization
Run profile-driven bottleneck optimization across Apache Fory implementations (Java, C++, Python/Cython, Go, Rust, Swift, C#, JavaScript/TypeScript, Dart, Kotlin, Scala). Use when improving serialize/deserialize throughput or latency, recovering regressions against a reference commit, diagnosing flamegraphs, fixing perf-related CI failures, or porting proven optimizations across languages without protocol or API regressions.
Super Smoke Test â Post-Execution QA Gate
Automated UAT-level QA pipeline. Exercises what was built, not just "does the page load". Catches CSS collapses, missing hrefs, broken Server Actions, RLS issues, and regressions introduced by auto-fixes.
queue-workflow
Queue-first implementation workflow for clisbot queues. Use when work should keep going past the first pass and needs protection against early stopping, shallow review, naming drift, DRY/KISS regressions, missing docs/tests, or bad fallback behavior.
claude-a11y-audit
Use when reviewing UI diffs, accessibility audits, or flaky UI tests to catch a11y regressions, semantic issues, keyboard/focus problems, and to recommend minimal fixes plus role-based test selectors.
bug-triage
Reproduce, isolate, and fix a bug (or failing build/test), then summarize root cause, fix, and verification steps. Use when the user reports a bug, regression, or failing build/test and wants a fix.
pdf-diff
Visual PDF regression test comparing current Beamer output against a baseline branch. Use when checking if slide changes introduced visual regressions.
skillforge
Intelligent skill router and creator. Analyzes ANY input to recommend existing skills, improve them, or create new ones. Uses deep iterative analysis with 11 thinking models, regression questioning, evolution lens, and multi-agent synthesis panel. Phase 0 triage ensures you never duplicate existing functionality.
torch_bisect
Bisect PyTorch commits to find the regression that breaks TorchTitan. Use when the user wants to bisect PyTorch or invokes /torch_bisect.
ai-slop-cleaner
Clean AI-generated code slop with a regression-safe, deletion-first workflow and optional reviewer-only mode
golang-benchmark
Golang benchmarking, profiling, and performance measurement. Use when writing, running, or comparing Go benchmarks, profiling hot paths with pprof, interpreting CPU/memory/trace profiles, analyzing results with benchstat, setting up CI benchmark regression detection, or investigating production performance with Prometheus runtime metrics. Also use when the developer needs deep analysis on a specific performance indicator - this skill provides the measurement methodology, while golang-performance provides the optimization patterns.
code-change-verification
Verify code changes by identifying correctness, regression, security, and performance risks from diffs or patches, then produce prioritized findings with file/line evidence and concrete fixes. Use when reviewing commits, PRs, and merged patches before/after release.
chatcrystal-debug-recall
Recall ChatCrystal memories for debugging tasks involving failing tests, compiler errors, runtime exceptions, dependency issues, environment breakage, or performance regressions. Use when historical root causes, fixes, or pitfalls may accelerate diagnosis before proposing a fix.
javascript-testing-expert
Expert-level JavaScript testing skill focused on writing high-quality tests that find bugs, serve as documentation, and prevent regressions. Advocates for property-based testing with fast-check and protects against indeterministic code in tests. Does not cover black-box e2e testing.
roadmap-safety-execution
Plans and executes roadmap work in one-by-one low-risk change-sets with mandatory gates (feature flags, tests, rollback path, and acceptance checks). Use for multi-phase delivery where regressions must be minimized.
render-topologies
Render all .mmd files to PNG, pixel-diff against main, and open only changed renders as BEFORE/AFTER pairs in Preview. Use after layout or rendering changes to check for visual regressions. Works in worktree mode (fix branch vs main) or standalone mode (current working tree vs main). Companion to the fix-issue skill, which delegates full regression checks here.
codspeed-optimize
Autonomously optimize code for performance using CodSpeed benchmarks, flamegraph analysis, and iterative improvement. Use this skill whenever the user wants to make code faster, reduce CPU usage, optimize memory, improve throughput, find performance bottlenecks, or asks to 'optimize', 'speed up', 'make faster', 'reduce latency', 'improve performance', or points at a CodSpeed benchmark result wanting improvements. Also trigger when the user mentions a slow function, a regression, or wants to understand where time is spent in their code.
Io.Github.KryptosAI/Mcp Observatory
Regression testing for MCP servers. Checks capabilities, invokes tools, detects schema drift.
Io.Github.SepineTam/Stata Mcp
Let LLM help you achieve your regression analysis with Stata
Io.Github.Hidai25/Evalview Mcp
Regression testing for AI agents. Golden baselines, CI/CD, LangGraph, CrewAI, OpenAI, Claude.
turbo-benchmark
Run performance benchmarks for TurboAPI. Use when testing performance, checking for regressions, or comparing against FastAPI.
pupu-test-api
Use when running QA / regression tests against PuPu, when verifying a code change actually works in the running app, or when reading PuPu UI/state without screenshotting manually. Triggers on tasks like "test that PuPu still creates chats correctly", "verify the new model selector works end-to-end", "send a message and check the response", "what's PuPu's current state?". Phase 1 covers chat lifecycle, message send (blocking), model/toolkit/character switching, logs, state snapshot, screenshot, eval.
vscode-visual-regression
Write Storybook stories and visual regression tests for the Kilo VS Code extension webview UI
visual-debug
This skill should be used when the user provides screenshots, videos, screen recordings, or mentions visual bugs, UI glitches, layout shifts, animation issues, or visual regressions. Analyzes media files to create annotated montage grids with diff overlays for visual debugging.
cast-subagents
Use when suggesting exactly one Codex subagent lineup before work begins for multi-lane tasks: branch/PR review across bugs, security, tests, maintainability, docs, or regression risk; codepath tracing plus docs/API verification; option research with tradeoff synthesis; auth/codebase mapping before risk assessment or planning. Advisory only; no auto-spawn; approval required. Do not use for delegated subagent handoffs, trivial single-file fixes, wording-only edits, one fact lookup, unclear requests, or explicit opt-out.
benchmark
Run scalex performance benchmarks, profiling, and timing analysis. Use this skill whenever the user asks to benchmark scalex, measure performance, profile index/query times, compare before/after performance of a change, investigate bottlenecks, or mentions "benchmark", "perf", "how fast", "timing", "hyperfine", "profile", "flame graph", "profiling", "--timings", "slow", "bottleneck", "regression", "memory", "heap", "GC", "allocation". Also use proactively after implementing performance improvements to verify gains. Covers 6 layers: built-in --timings, hyperfine benchmarks, async-profiler flame graphs, JFR recording, microbenchmarks, and memory profiling.
meeseeks-cli-smoketest
End-to-end smoke testing of the Meeseeks CLI via tmux. Use this skill when asked to test the CLI, verify CLI behavior after changes, smoke-test the agent loop, check for regressions, or validate MCP/plugin/session features work correctly through the terminal interface. Also use when debugging CLI crashes, MCP connection issues, or session lifecycle problems that need live reproduction.