Skills

All Skills

regression

Skills tagged with #regression

@shapeshift

benchmark-translate

Run a quality benchmark of the /translate skill by selecting stratified test keys, capturing ground truth, translating, judging with sub-agents, and compiling a regression report. Invoke with /benchmark-translate.

shapeshift/web+4 more
18d ago
2010
@krozgrov

Skill Plan: HA alignment improvements

Goal - Improve HA best-practice alignment and UX without introducing regressions.

krozgrov/ha-omlet-integration
18d ago
300
@mhulden

pyfoma-codebase

Rapidly onboard to pyfoma core internals (regex compiler + FST algorithms), make safe code changes, and avoid common semantic and performance regressions in fst.py, regexparse.py, atomic.py, algorithms.py, paradigm.py, and partition_refinement.py.

mhulden/pyfoma+1 more
19d ago
600
@AgriciDaniel

skill-forge-benchmark

Benchmark Claude Code skill performance with variance analysis, tracking pass rate, execution time, and token usage across iterations. Runs multiple trials per eval for statistical reliability, aggregates results into benchmark.json, and generates comparison reports between skill versions. Use when user says "benchmark skill", "measure skill performance", "skill metrics", "compare skill versions", "skill performance", "track skill improvement", "skill regression test", or "skill A/B test".

AgriciDaniel/skill-forge+6 more
18d ago
420
@comsky

change-reaudit

Re-audit code changes to identify side effects, regression risks, and unhandled edge cases before merging or deploying.

comsky/remy-skill-recipes+7 more
18d ago
90
@apache

fory-performance-optimization

Run profile-driven bottleneck optimization across Apache Fory implementations (Java, C++, Python/Cython, Go, Rust, Swift, C#, JavaScript/TypeScript, Dart, Kotlin, Scala). Use when improving serialize/deserialize throughput or latency, recovering regressions against a reference commit, diagnosing flamegraphs, fixing perf-related CI failures, or porting proven optimizations across languages without protocol or API regressions.

apache/fory
18d ago
4.3K0
@nchemb

Super Smoke Test — Post-Execution QA Gate

Automated UAT-level QA pipeline. Exercises what was built, not just "does the page load". Catches CSS collapses, missing hrefs, broken Server Actions, RLS issues, and regressions introduced by auto-fixes.

nchemb/super-smoke-test
15d ago
80
@longbkit

queue-workflow

Queue-first implementation workflow for clisbot queues. Use when work should keep going past the first pass and needs protection against early stopping, shallow review, naming drift, DRY/KISS regressions, missing docs/tests, or bad fallback behavior.

longbkit/clisbot+1 more
6d ago
440
@SwitchbackTech

claude-a11y-audit

Use when reviewing UI diffs, accessibility audits, or flaky UI tests to catch a11y regressions, semantic issues, keyboard/focus problems, and to recommend minimal fixes plus role-based test selectors.

SwitchbackTech/compass+2 more
18d ago
2140
@jMerta

bug-triage

Reproduce, isolate, and fix a bug (or failing build/test), then summarize root cause, fix, and verification steps. Use when the user reports a bug, regression, or failing build/test and wants a fix.

jMerta/codex-skills+11 more
18d ago
1190
@alohays

pdf-diff

Visual PDF regression test comparing current Beamer output against a baseline branch. Use when checking if slide changes introduced visual regressions.

alohays/paper2pr+1 more
18d ago
50
@tripleyak

skillforge

Intelligent skill router and creator. Analyzes ANY input to recommend existing skills, improve them, or create new ones. Uses deep iterative analysis with 11 thinking models, regression questioning, evolution lens, and multi-agent synthesis panel. Phase 0 triage ensures you never duplicate existing functionality.

tripleyak/SkillForge
18d ago
5450
@pytorch

torch_bisect

Bisect PyTorch commits to find the regression that breaks TorchTitan. Use when the user wants to bisect PyTorch or invokes /torch_bisect.

pytorch/torchtitan
18d ago
5.1K0
@Yeachan-Heo

ai-slop-cleaner

Clean AI-generated code slop with a regression-safe, deletion-first workflow and optional reviewer-only mode

Yeachan-Heo/oh-my-claudecode+24 more
18d ago
10.8K0
@samber

golang-benchmark

Golang benchmarking, profiling, and performance measurement. Use when writing, running, or comparing Go benchmarks, profiling hot paths with pprof, interpreting CPU/memory/trace profiles, analyzing results with benchstat, setting up CI benchmark regression detection, or investigating production performance with Prometheus runtime metrics. Also use when the developer needs deep analysis on a specific performance indicator - this skill provides the measurement methodology, while golang-performance provides the optimization patterns.

samber/cc-skills-golang+30 more
8d ago
120
@rustfs

code-change-verification

Verify code changes by identifying correctness, regression, security, and performance risks from diffs or patches, then produce prioritized findings with file/line evidence and concrete fixes. Use when reviewing commits, PRs, and merged patches before/after release.

rustfs/rustfs+2 more
19d ago
23.2K0
@ZengLiangYi

chatcrystal-debug-recall

Recall ChatCrystal memories for debugging tasks involving failing tests, compiler errors, runtime exceptions, dependency issues, environment breakage, or performance regressions. Use when historical root causes, fixes, or pitfalls may accelerate diagnosis before proposing a fix.

ZengLiangYi/ChatCrystal+1 more
13d ago
170
@dubzzz

javascript-testing-expert

Expert-level JavaScript testing skill focused on writing high-quality tests that find bugs, serve as documentation, and prevent regressions. Advocates for property-based testing with fast-check and protects against indeterministic code in tests. Does not cover black-box e2e testing.

dubzzz/fast-check
18d ago
4.8K0
@Vrun-design

roadmap-safety-execution

Plans and executes roadmap work in one-by-one low-risk change-sets with mandatory gates (feature flags, tests, rollback path, and acceptance checks). Use for multi-phase delivery where regressions must be minimized.

Vrun-design/openflowkit
18d ago
1960
@pinin4fjords

render-topologies

Render all .mmd files to PNG, pixel-diff against main, and open only changed renders as BEFORE/AFTER pairs in Preview. Use after layout or rendering changes to check for visual regressions. Works in worktree mode (fix branch vs main) or standalone mode (current working tree vs main). Companion to the fix-issue skill, which delegates full regression checks here.

pinin4fjords/nf-metro
19d ago
460
@CodSpeedHQ

codspeed-optimize

Autonomously optimize code for performance using CodSpeed benchmarks, flamegraph analysis, and iterative improvement. Use this skill whenever the user wants to make code faster, reduce CPU usage, optimize memory, improve throughput, find performance bottlenecks, or asks to 'optimize', 'speed up', 'make faster', 'reduce latency', 'improve performance', or points at a CodSpeed benchmark result wanting improvements. Also trigger when the user mentions a slow function, a regression, or wants to understand where time is spent in their code.

CodSpeedHQ/codspeed+1 more
19d ago
1060
@KryptosAI
MCP

Io.Github.KryptosAI/Mcp Observatory

Regression testing for MCP servers. Checks capabilities, invokes tools, detects schema drift.

mcpgithubai
KryptosAI/mcp-observatory
19d ago
0
@SepineTam
MCP

Io.Github.SepineTam/Stata Mcp

Let LLM help you achieve your regression analysis with Stata

mcpgithubllm
SepineTam/stata-mcp
19d ago
0
@hidai25
MCP

Io.Github.Hidai25/Evalview Mcp

Regression testing for AI agents. Golden baselines, CI/CD, LangGraph, CrewAI, OpenAI, Claude.

mcpgithubai
hidai25/eval-view+1 more
19d ago
0
@justrach

turbo-benchmark

Run performance benchmarks for TurboAPI. Use when testing performance, checking for regressions, or comparing against FastAPI.

justrach/turboAPI+4 more
18d ago
470
@haoxiang-xu

pupu-test-api

Use when running QA / regression tests against PuPu, when verifying a code change actually works in the running app, or when reading PuPu UI/state without screenshotting manually. Triggers on tasks like "test that PuPu still creates chats correctly", "verify the new model selector works end-to-end", "send a message and check the response", "what's PuPu's current state?". Phase 1 covers chat lifecycle, message send (blocking), model/toolkit/character switching, logs, state snapshot, screenshot, eval.

haoxiang-xu/PuPu
12d ago
270
@Kilo-Org

vscode-visual-regression

Write Storybook stories and visual regression tests for the Kilo VS Code extension webview UI

Kilo-Org/kilocode
19d ago
16.7K0
@unknown-studio-dev

visual-debug

This skill should be used when the user provides screenshots, videos, screen recordings, or mentions visual bugs, UI glitches, layout shifts, animation issues, or visual regressions. Analyzes media files to create annotated montage grids with diff overlays for visual debugging.

unknown-studio-dev/hoangsa
18d ago
180
@917Dhj

cast-subagents

Use when suggesting exactly one Codex subagent lineup before work begins for multi-lane tasks: branch/PR review across bugs, security, tests, maintainability, docs, or regression risk; codepath tracing plus docs/API verification; option research with tradeoff synthesis; auth/codebase mapping before risk assessment or planning. Advisory only; no auto-spawn; approval required. Do not use for delegated subagent handoffs, trivial single-file fixes, wording-only edits, one fact lookup, unclear requests, or explicit opt-out.

917Dhj/cast-subagents
11d ago
50
@nguyenyou

benchmark

Run scalex performance benchmarks, profiling, and timing analysis. Use this skill whenever the user asks to benchmark scalex, measure performance, profile index/query times, compare before/after performance of a change, investigate bottlenecks, or mentions "benchmark", "perf", "how fast", "timing", "hyperfine", "profile", "flame graph", "profiling", "--timings", "slow", "bottleneck", "regression", "memory", "heap", "GC", "allocation". Also use proactively after implementing performance improvements to verify gains. Covers 6 layers: built-in --timings, hyperfine benchmarks, async-profiler flame graphs, JFR recording, microbenchmarks, and memory profiling.

nguyenyou/scalex+2 more
18d ago
470
@bearlike

meeseeks-cli-smoketest

End-to-end smoke testing of the Meeseeks CLI via tmux. Use this skill when asked to test the CLI, verify CLI behavior after changes, smoke-test the agent loop, check for regressions, or validate MCP/plugin/session features work correctly through the terminal interface. Also use when debugging CLI crashes, MCP connection issues, or session lifecycle problems that need live reproduction.

bearlike/Assistant+3 more
11d ago
320