VerifiedGit
v1.0.0

agent-eval

by @affaan-m0 pulls
URLopenbooklet.com/s/agent-eval
Pinnedopenbooklet.com/s/agent-eval@1.0.0
APIGET /api/v1/skills/agent-eval

Head-to-head comparison of coding agents (Claude Code, Aider, Codex, etc.) on custom tasks with pass rate, cost, time, and consistency metrics

21 skills from this repoaffaan-m/everything-claude-code
agent-evalviewing
/search-first — 编码前先研究docs/zh-CN/skills/search-first/SKILL.md

系统化“在实现之前先寻找现有解决方案”的工作流程。

使用 TDD 进行 Django 测试docs/zh-CN/skills/django-tdd/SKILL.md

使用 pytest、factory\_boy 和 Django REST Framework 进行 Django 应用程序的测试驱动开发。

正则表达式 vs LLM 用于结构化文本解析docs/zh-CN/skills/regex-vs-llm-structured-text/SKILL.md

一个用于解析结构化文本(测验、表单、发票、文档)的实用决策框架。核心见解是:正则表达式能以低成本、确定性的方式处理 95-98% 的情况。将昂贵的 LLM 调用留给剩余的边缘情况。

成本感知型 LLM 流水线docs/zh-CN/skills/cost-aware-llm-pipeline/SKILL.md

在保持质量的同时控制 LLM API 成本的模式。将模型路由、预算跟踪、重试逻辑和提示词缓存组合成一个可组合的流水线。

Agent Eval 技能docs/zh-CN/skills/agent-eval/SKILL.md

一个轻量级 CLI 工具,用于在可复现的任务上对编码代理进行头对头比较。每个“哪个编码代理最好?”的比较都基于感觉——本工具将其系统化。

agent-harness-constructionskills/agent-harness-construction/SKILL.md

Design and optimize AI agent action spaces, tool definitions, and observation formatting for higher completion rates.

agentic-engineering.kiro/skills/agentic-engineering/SKILL.md

Operate as an agentic engineer using eval-first execution, decomposition, and cost-aware model routing.

ai-first-engineeringskills/ai-first-engineering/SKILL.md

Engineering operating model for teams where AI agents generate a large share of implementation output.

ai-regression-testingskills/ai-regression-testing/SKILL.md

Regression testing strategies for AI-assisted development. Sandbox-mode API testing without database dependencies, automated bug-check workflows, and patterns to catch AI blind spots where the same model writes and reviews code.

Android 整洁架构docs/zh-CN/skills/android-clean-architecture/SKILL.md

适用于 Android 和 KMP 项目的整洁架构模式。涵盖模块边界、依赖反转、UseCase/Repository 模式,以及使用 Room、SQLDelight 和 Ktor 的数据层设计。

Android Clean Architectureskills/android-clean-architecture/SKILL.md

Clean Architecture patterns for Android and KMP projects. Covers module boundaries, dependency inversion, UseCase/Repository patterns, and data layer design with Room, SQLDelight, and Ktor.

基于协议的 Swift 依赖注入测试docs/zh-CN/skills/swift-protocol-di-testing/SKILL.md

通过将外部依赖(文件系统、网络、iCloud)抽象为小型、专注的协议,使 Swift 代码可测试的模式。支持无需 I/O 的确定性测试。

API 设计模式docs/zh-CN/skills/api-design/SKILL.md

用于设计一致、对开发者友好的 REST API 的约定和最佳实践。

api-design.kiro/skills/api-design/SKILL.md

REST API design patterns including resource naming, status codes, pagination, filtering, error responses, versioning, and rate limiting for production APIs.

architecture-decision-recordsskills/architecture-decision-records/SKILL.md

Capture architectural decisions made during Claude Code sessions as structured ADRs. Auto-detects decision moments, records context, alternatives considered, and rationale. Maintains an ADR log so future developers understand why the codebase is shaped the way it is.

article-writing.agents/skills/article-writing/SKILL.md

Write articles, guides, blog posts, tutorials, newsletter issues, and other long-form content in a distinctive voice derived from supplied examples or brand guidance. Use when the user wants polished written content longer than a paragraph, especially when voice consistency, structure, and credibility matter.

backend-patterns.kiro/skills/backend-patterns/SKILL.md

Backend architecture patterns, API design, database optimization, and server-side best practices for Node.js, Express, and Next.js API routes.

blueprintskills/blueprint/SKILL.md

Turn a one-line objective into a step-by-step construction plan for multi-session, multi-agent engineering projects. Each step has a self-contained context brief so a fresh agent can execute it cold. Includes adversarial review gate, dependency graph, parallel step detection, anti-pattern catalog, and plan mutation protocol. TRIGGER when: user requests a plan, blueprint, or roadmap for a complex multi-PR task, or describes work that needs multiple sessions. DO NOT TRIGGER when: task is completable in a single PR or fewer than 3 tool calls, or user says "just do it".

Blueprint — 施工计划生成器docs/zh-CN/skills/blueprint/SKILL.md

将单行目标转化为分步施工计划,任何编码代理都能冷启动执行。

Browser QA — Automated Visual Testing & Interactionskills/browser-qa/SKILL.md

Auto-indexed from affaan-m/everything-claude-code

Are you the author? Claim this skill to take ownership and manage it.

Related Skills

@openbooklet

graceful-error-recovery

Use this skill when a tool call, command, or API request fails. Diagnose the root cause systematically before retrying or changing approach. Do not retry the same failing call without first understanding why it failed.

1.1K0
@openbooklet

audience-aware-communication

Use this skill when writing any explanation, documentation, or response that will be read by someone else. Match vocabulary, depth, and format to the audience's expertise level before writing.

1.1K0
@openbooklet

Refactoring Expert

Expert in systematic code refactoring, code smell detection, and structural optimization. Use PROACTIVELY when encountering duplicated code, long methods, complex conditionals, or any code quality issues. Detects code smells and applies proven refactoring techniques without changing external behavior.

600
@openbooklet

Research Expert

Specialized research expert for parallel information gathering. Use for focused research tasks with clear objectives and structured output requirements.

600
@openbooklet

clarify-ambiguous-requests

Use this skill when the user's request is ambiguous, under-specified, or could be interpreted in multiple ways. If proceeding with a wrong assumption would waste significant work, always ask exactly one focused clarifying question before doing anything.

1.1K0
@openbooklet

structured-step-by-step-reasoning

Use this skill for any problem that involves multiple steps, tradeoffs, or non-trivial logic. Think out loud before answering to improve accuracy and transparency. Apply whenever the answer is not immediately obvious.

1.1K0