regression

Skills tagged with #regression

benchmark-translate

Run a quality benchmark of the /translate skill by selecting stratified test keys, capturing ground truth, translating, judging with sub-agents, and compiling a regression report. Invoke with /benchmark-translate.

shapeshift/web+4 more

18d ago

2010

@krozgrov

Skill Plan: HA alignment improvements

Goal - Improve HA best-practice alignment and UX without introducing regressions.

krozgrov/ha-omlet-integration

18d ago

300

@mhulden

pyfoma-codebase

Rapidly onboard to pyfoma core internals (regex compiler + FST algorithms), make safe code changes, and avoid common semantic and performance regressions in fst.py, regexparse.py, atomic.py, algorithms.py, paradigm.py, and partition_refinement.py.

mhulden/pyfoma+1 more

19d ago

600

@AgriciDaniel

skill-forge-benchmark

Benchmark Claude Code skill performance with variance analysis, tracking pass rate, execution time, and token usage across iterations. Runs multiple trials per eval for statistical reliability, aggregates results into benchmark.json, and generates comparison reports between skill versions. Use when user says "benchmark skill", "measure skill performance", "skill metrics", "compare skill versions", "skill performance", "track skill improvement", "skill regression test", or "skill A/B test".

AgriciDaniel/skill-forge+6 more

18d ago

420

@comsky

change-reaudit

Re-audit code changes to identify side effects, regression risks, and unhandled edge cases before merging or deploying.

comsky/remy-skill-recipes+7 more

18d ago

@apache

fory-performance-optimization

Run profile-driven bottleneck optimization across Apache Fory implementations (Java, C++, Python/Cython, Go, Rust, Swift, C#, JavaScript/TypeScript, Dart, Kotlin, Scala). Use when improving serialize/deserialize throughput or latency, recovering regressions against a reference commit, diagnosing flamegraphs, fixing perf-related CI failures, or porting proven optimizations across languages without protocol or API regressions.

Super Smoke Test â Post-Execution QA Gate

Automated UAT-level QA pipeline. Exercises what was built, not just "does the page load". Catches CSS collapses, missing hrefs, broken Server Actions, RLS issues, and regressions introduced by auto-fixes.

nchemb/super-smoke-test

15d ago

@longbkit

queue-workflow

Queue-first implementation workflow for clisbot queues. Use when work should keep going past the first pass and needs protection against early stopping, shallow review, naming drift, DRY/KISS regressions, missing docs/tests, or bad fallback behavior.

longbkit/clisbot+1 more

6d ago

440

@SwitchbackTech

claude-a11y-audit

Use when reviewing UI diffs, accessibility audits, or flaky UI tests to catch a11y regressions, semantic issues, keyboard/focus problems, and to recommend minimal fixes plus role-based test selectors.

SwitchbackTech/compass+2 more

18d ago

2140

@jMerta

bug-triage

Reproduce, isolate, and fix a bug (or failing build/test), then summarize root cause, fix, and verification steps. Use when the user reports a bug, regression, or failing build/test and wants a fix.

jMerta/codex-skills+11 more

18d ago

1190

@alohays

pdf-diff

Visual PDF regression test comparing current Beamer output against a baseline branch. Use when checking if slide changes introduced visual regressions.

alohays/paper2pr+1 more

18d ago

@tripleyak

skillforge

Intelligent skill router and creator. Analyzes ANY input to recommend existing skills, improve them, or create new ones. Uses deep iterative analysis with 11 thinking models, regression questioning, evolution lens, and multi-agent synthesis panel. Phase 0 triage ensures you never duplicate existing functionality.

torch_bisect

Bisect PyTorch commits to find the regression that breaks TorchTitan. Use when the user wants to bisect PyTorch or invokes /torch_bisect.

ai-slop-cleaner

Clean AI-generated code slop with a regression-safe, deletion-first workflow and optional reviewer-only mode

Yeachan-Heo/oh-my-claudecode+24 more

18d ago

10.8K0

@samber

golang-benchmark

Golang benchmarking, profiling, and performance measurement. Use when writing, running, or comparing Go benchmarks, profiling hot paths with pprof, interpreting CPU/memory/trace profiles, analyzing results with benchstat, setting up CI benchmark regression detection, or investigating production performance with Prometheus runtime metrics. Also use when the developer needs deep analysis on a specific performance indicator - this skill provides the measurement methodology, while golang-performance provides the optimization patterns.

samber/cc-skills-golang+30 more

8d ago

120

@rustfs

code-change-verification

Verify code changes by identifying correctness, regression, security, and performance risks from diffs or patches, then produce prioritized findings with file/line evidence and concrete fixes. Use when reviewing commits, PRs, and merged patches before/after release.

chatcrystal-debug-recall

Recall ChatCrystal memories for debugging tasks involving failing tests, compiler errors, runtime exceptions, dependency issues, environment breakage, or performance regressions. Use when historical root causes, fixes, or pitfalls may accelerate diagnosis before proposing a fix.

ZengLiangYi/ChatCrystal+1 more

13d ago

170

@dubzzz

javascript-testing-expert

Expert-level JavaScript testing skill focused on writing high-quality tests that find bugs, serve as documentation, and prevent regressions. Advocates for property-based testing with fast-check and protects against indeterministic code in tests. Does not cover black-box e2e testing.

roadmap-safety-execution

Plans and executes roadmap work in one-by-one low-risk change-sets with mandatory gates (feature flags, tests, rollback path, and acceptance checks). Use for multi-phase delivery where regressions must be minimized.

Vrun-design/openflowkit

18d ago

1960

@pinin4fjords

render-topologies

Render all .mmd files to PNG, pixel-diff against main, and open only changed renders as BEFORE/AFTER pairs in Preview. Use after layout or rendering changes to check for visual regressions. Works in worktree mode (fix branch vs main) or standalone mode (current working tree vs main). Companion to the fix-issue skill, which delegates full regression checks here.

pinin4fjords/nf-metro

19d ago

460

@CodSpeedHQ

codspeed-optimize

Autonomously optimize code for performance using CodSpeed benchmarks, flamegraph analysis, and iterative improvement. Use this skill whenever the user wants to make code faster, reduce CPU usage, optimize memory, improve throughput, find performance bottlenecks, or asks to 'optimize', 'speed up', 'make faster', 'reduce latency', 'improve performance', or points at a CodSpeed benchmark result wanting improvements. Also trigger when the user mentions a slow function, a regression, or wants to understand where time is spent in their code.

CodSpeedHQ/codspeed+1 more

Io.Github.KryptosAI/Mcp Observatory

Regression testing for MCP servers. Checks capabilities, invokes tools, detects schema drift.

mcpgithubai

KryptosAI/mcp-observatory

19d ago

@SepineTam

MCP

Io.Github.SepineTam/Stata Mcp

Let LLM help you achieve your regression analysis with Stata

Io.Github.Hidai25/Evalview Mcp

Regression testing for AI agents. Golden baselines, CI/CD, LangGraph, CrewAI, OpenAI, Claude.

mcpgithubai

hidai25/eval-view+1 more

19d ago

@justrach

turbo-benchmark

Run performance benchmarks for TurboAPI. Use when testing performance, checking for regressions, or comparing against FastAPI.

justrach/turboAPI+4 more

18d ago

470

@haoxiang-xu

pupu-test-api

Use when running QA / regression tests against PuPu, when verifying a code change actually works in the running app, or when reading PuPu UI/state without screenshotting manually. Triggers on tasks like "test that PuPu still creates chats correctly", "verify the new model selector works end-to-end", "send a message and check the response", "what's PuPu's current state?". Phase 1 covers chat lifecycle, message send (blocking), model/toolkit/character switching, logs, state snapshot, screenshot, eval.

vscode-visual-regression

Write Storybook stories and visual regression tests for the Kilo VS Code extension webview UI

visual-debug

This skill should be used when the user provides screenshots, videos, screen recordings, or mentions visual bugs, UI glitches, layout shifts, animation issues, or visual regressions. Analyzes media files to create annotated montage grids with diff overlays for visual debugging.

unknown-studio-dev/hoangsa

18d ago

180

@917Dhj

cast-subagents

Use when suggesting exactly one Codex subagent lineup before work begins for multi-lane tasks: branch/PR review across bugs, security, tests, maintainability, docs, or regression risk; codepath tracing plus docs/API verification; option research with tradeoff synthesis; auth/codebase mapping before risk assessment or planning. Advisory only; no auto-spawn; approval required. Do not use for delegated subagent handoffs, trivial single-file fixes, wording-only edits, one fact lookup, unclear requests, or explicit opt-out.

917Dhj/cast-subagents

11d ago

@nguyenyou

benchmark

Run scalex performance benchmarks, profiling, and timing analysis. Use this skill whenever the user asks to benchmark scalex, measure performance, profile index/query times, compare before/after performance of a change, investigate bottlenecks, or mentions "benchmark", "perf", "how fast", "timing", "hyperfine", "profile", "flame graph", "profiling", "--timings", "slow", "bottleneck", "regression", "memory", "heap", "GC", "allocation". Also use proactively after implementing performance improvements to verify gains. Covers 6 layers: built-in --timings, hyperfine benchmarks, async-profiler flame graphs, JFR recording, microbenchmarks, and memory profiling.

nguyenyou/scalex+2 more

18d ago

470

@bearlike

meeseeks-cli-smoketest

End-to-end smoke testing of the Meeseeks CLI via tmux. Use this skill when asked to test the CLI, verify CLI behavior after changes, smoke-test the agent loop, check for regressions, or validate MCP/plugin/session features work correctly through the terminal interface. Also use when debugging CLI crashes, MCP connection issues, or session lifecycle problems that need live reproduction.

bearlike/Assistant+3 more

11d ago

320

regression

benchmark-translate

Skill Plan: HA alignment improvements

pyfoma-codebase

skill-forge-benchmark

change-reaudit

fory-performance-optimization

Super Smoke Test â Post-Execution QA Gate

queue-workflow

claude-a11y-audit

bug-triage

pdf-diff

skillforge

torch_bisect

ai-slop-cleaner

golang-benchmark

code-change-verification

chatcrystal-debug-recall

javascript-testing-expert

roadmap-safety-execution

render-topologies

codspeed-optimize

Io.Github.KryptosAI/Mcp Observatory

Io.Github.SepineTam/Stata Mcp

Io.Github.Hidai25/Evalview Mcp

turbo-benchmark

pupu-test-api

vscode-visual-regression

visual-debug

cast-subagents

benchmark

meeseeks-cli-smoketest

Super Smoke Test â Post-Execution QA Gate