VerifiedGit

v1.0.0

agent-eval

Name: agent-eval
Availability: InStock
Rating: 5 (89132 reviews)
Author: affaan-m

by @affaan-m0 pulls

URLopenbooklet.com/s/agent-eval

Pinnedopenbooklet.com/s/agent-eval@1.0.0

APIGET /api/v1/skills/agent-eval

Head-to-head comparison of coding agents (Claude Code, Aider, Codex, etc.) on custom tasks with pass rate, cost, time, and consistency metrics

21 skills from this repoaffaan-m/everything-claude-code

agent-evalviewing

/search-first â ç¼ç ååç ç©¶docs/zh-CN/skills/search-first/SKILL.md

ç³»ç»åâå¨å®ç°ä¹ååå¯»æ¾ç°æè§£å³æ¹æ¡âçå·¥ä½æµç¨ã

ä½¿ç¨ TDD è¿è¡ Django æµè¯docs/zh-CN/skills/django-tdd/SKILL.md

ä½¿ç¨ pytestãfactory\_boy å Django REST Framework è¿è¡ Django åºç¨ç¨åºçæµè¯é©±å¨å¼åã

æ£åè¡¨è¾¾å¼ vs LLM ç¨äºç»æåææ¬è§£ædocs/zh-CN/skills/regex-vs-llm-structured-text/SKILL.md

ææ¬æç¥å LLM æµæ°´çº¿docs/zh-CN/skills/cost-aware-llm-pipeline/SKILL.md

Agent Eval æè½docs/zh-CN/skills/agent-eval/SKILL.md

agent-harness-constructionskills/agent-harness-construction/SKILL.md

Design and optimize AI agent action spaces, tool definitions, and observation formatting for higher completion rates.

agentic-engineering.kiro/skills/agentic-engineering/SKILL.md

Operate as an agentic engineer using eval-first execution, decomposition, and cost-aware model routing.

ai-first-engineeringskills/ai-first-engineering/SKILL.md

Engineering operating model for teams where AI agents generate a large share of implementation output.

ai-regression-testingskills/ai-regression-testing/SKILL.md

Regression testing strategies for AI-assisted development. Sandbox-mode API testing without database dependencies, automated bug-check workflows, and patterns to catch AI blind spots where the same model writes and reviews code.

Android æ´æ´æ¶ædocs/zh-CN/skills/android-clean-architecture/SKILL.md

éç¨äº Android å KMP é¡¹ç®çæ´æ´æ¶ææ¨¡å¼ãæ¶µçæ¨¡åè¾¹çãä¾èµåè½¬ãUseCase/Repository æ¨¡å¼ï¼ä»¥åä½¿ç¨ RoomãSQLDelight å Ktor çæ°æ®å±è®¾è®¡ã

Android Clean Architectureskills/android-clean-architecture/SKILL.md

Clean Architecture patterns for Android and KMP projects. Covers module boundaries, dependency inversion, UseCase/Repository patterns, and data layer design with Room, SQLDelight, and Ktor.

åºäºåè®®ç Swift ä¾èµæ³¨å¥æµè¯docs/zh-CN/skills/swift-protocol-di-testing/SKILL.md

API è®¾è®¡æ¨¡å¼docs/ja-JP/skills/api-design/SKILL.md

api-design.kiro/skills/api-design/SKILL.md

REST API design patterns including resource naming, status codes, pagination, filtering, error responses, versioning, and rate limiting for production APIs.

architecture-decision-recordsskills/architecture-decision-records/SKILL.md

Capture architectural decisions made during Claude Code sessions as structured ADRs. Auto-detects decision moments, records context, alternatives considered, and rationale. Maintains an ADR log so future developers understand why the codebase is shaped the way it is.

article-writing.agents/skills/article-writing/SKILL.md

Write articles, guides, blog posts, tutorials, newsletter issues, and other long-form content in a distinctive voice derived from supplied examples or brand guidance. Use when the user wants polished written content longer than a paragraph, especially when voice consistency, structure, and credibility matter.

backend-patterns.kiro/skills/backend-patterns/SKILL.md

Backend architecture patterns, API design, database optimization, and server-side best practices for Node.js, Express, and Next.js API routes.

blueprintskills/blueprint/SKILL.md

Turn a one-line objective into a step-by-step construction plan for multi-session, multi-agent engineering projects. Each step has a self-contained context brief so a fresh agent can execute it cold. Includes adversarial review gate, dependency graph, parallel step detection, anti-pattern catalog, and plan mutation protocol. TRIGGER when: user requests a plan, blueprint, or roadmap for a complex multi-PR task, or describes work that needs multiple sessions. DO NOT TRIGGER when: task is completable in a single PR or fewer than 3 tool calls, or user says "just do it".

Blueprint â æ½å·¥è®¡åçæå¨docs/zh-CN/skills/blueprint/SKILL.md

Browser QA â Automated Visual Testing & Interactionskills/browser-qa/SKILL.md

Auto-indexed from affaan-m/everything-claude-code

Are you the author? Claim this skill to take ownership and manage it.

Related Skills

@openbooklet

graceful-error-recovery

Use this skill when a tool call, command, or API request fails. Diagnose the root cause systematically before retrying or changing approach. Do not retry the same failing call without first understanding why it failed.

1.1K0

@openbooklet

audience-aware-communication

Use this skill when writing any explanation, documentation, or response that will be read by someone else. Match vocabulary, depth, and format to the audience's expertise level before writing.

1.1K0

@openbooklet

Refactoring Expert

Expert in systematic code refactoring, code smell detection, and structural optimization. Use PROACTIVELY when encountering duplicated code, long methods, complex conditionals, or any code quality issues. Detects code smells and applies proven refactoring techniques without changing external behavior.

600

@openbooklet

Research Expert

Specialized research expert for parallel information gathering. Use for focused research tasks with clear objectives and structured output requirements.

600

@openbooklet

clarify-ambiguous-requests

Use this skill when the user's request is ambiguous, under-specified, or could be interpreted in multiple ways. If proceeding with a wrong assumption would waste significant work, always ask exactly one focused clarifying question before doing anything.

1.1K0

@openbooklet

structured-step-by-step-reasoning

Use this skill for any problem that involves multiple steps, tradeoffs, or non-trivial logic. Think out loud before answering to improve accuracy and transparency. Apply whenever the answer is not immediately obvious.

1.1K0