langgraph-testing-evaluation
openbooklet.com/s/langgraph-testing-evaluationopenbooklet.com/s/langgraph-testing-evaluation@1.0.0GET /api/v1/skills/langgraph-testing-evaluationUse this skill when you need to test or evaluate LangGraph/LangChain agents: writing unit or integration tests, generating test scaffolds, mocking LLM/tool behavior, running trajectory evaluation (match or LLM-as-judge), running LangSmith dataset evaluations, and comparing two agent versions with A/B-style offline analysis. Use it for Python and JavaScript/TypeScript workflows, evaluator design, experiment setup, regression gates, and debugging flaky/incorrect evaluation results.
Use the write_todos tool effectively for task planning and decomposition in Deep Agents. Use when users want to (1) implement task planning with write_todos, (2) break down complex tasks into subtasks, (3) track agent progress through todos, (4) debug why todos aren't completing, (5) design todo structures for different task types (research, coding, analysis), (6) understand todo status lifecycle and best practices, or (7) visualize todo progression from LangSmith traces.
Initialize, validate, and troubleshoot Deep Agents projects in Python or JavaScript using the `deepagents` package. Use when users need to create agents with built-in planning/filesystem/subagents, configure middleware/backends/checkpointing/HITL, migrate from `create_react_agent` or `create_agent`, scaffold projects with repo scripts, validate agent config files, and confirm compatibility with current LangChain/LangGraph/LangSmith docs.
Implement multi-agent coordination patterns (supervisor-subagent, router, orchestrator-worker, handoffs) for LangGraph applications. Use when users want to (1) implement multi-agent systems, (2) coordinate multiple specialized agents, (3) choose between coordination patterns, (4) set up supervisor-subagent workflows, (5) implement router-based agent selection, (6) create parallel orchestrator-worker patterns, (7) implement agent handoffs, (8) design state schemas for multi-agent systems, or (9) debug multi-agent coordination issues.
Implement LangGraph error handling with current v1 patterns. Use when users need to classify failures, add RetryPolicy for transient issues, build LLM recovery loops with Command routing, add human-in-the-loop with interrupt()/resume, handle ToolNode errors, or choose a safe strategy between retry, recovery, and escalation.
Initialize and configure LangGraph projects with proper structure, langgraph.json configuration, environment variables, and dependency management. Use when users want to (1) create a new LangGraph project, (2) set up langgraph.json for deployment, (3) configure environment variables for LLM providers, (4) initialize project structure for agents, (5) set up local development with LangGraph Studio, (6) configure dependencies (pyproject.toml, requirements.txt, package.json), or (7) troubleshoot project configuration issues.
Design state schemas, implement reducers, configure persistence, and debug state issues for LangGraph applications. Use when users want to (1) design or define state schemas for LangGraph graphs, (2) implement reducer functions for state accumulation, (3) configure persistence with checkpointers (InMemorySaver/MemorySaver, SqliteSaver, PostgresSaver), (4) debug state update issues or unexpected state behavior, (5) migrate state schemas between versions, (6) validate state schema structure, (7) choose between TypedDict and MessagesState patterns, (8) implement custom reducers for lists, dicts, or sets, (9) use the Overwrite type to bypass reducers, (10) set up thread-based persistence for multi-turn conversations, or (11) inspect checkpoints for debugging.
Deploy and operate production agent servers with LangSmith Deployment. Use when work involves choosing Cloud vs Hybrid/Self-hosted-with-control-plane vs Standalone, preparing/validating langgraph.json, creating deployments or revisions, rolling back revisions, wiring CI/CD to control-plane APIs, configuring environment variables and secrets, setting monitoring/alerts/webhooks, or troubleshooting deployment/runtime/scaling issues for LangChain/LangGraph applications.
Fetch, organize, and analyze LangSmith traces for debugging and evaluation. Use when you need to: query traces/runs by project, metadata, status, or time window; download traces to JSON; organize outcomes into passed/failed/error buckets; analyze token/message/tool-call patterns; compare passed vs failed behavior; or investigate benchmark and production failures.
Auto-indexed from Lubu-Labs/langchain-agent-skills
Are you the author? Claim this skill to take ownership and manage it.
Related Skills
graceful-error-recovery
Use this skill when a tool call, command, or API request fails. Diagnose the root cause systematically before retrying or changing approach. Do not retry the same failing call without first understanding why it failed.
audience-aware-communication
Use this skill when writing any explanation, documentation, or response that will be read by someone else. Match vocabulary, depth, and format to the audience's expertise level before writing.
Refactoring Expert
Expert in systematic code refactoring, code smell detection, and structural optimization. Use PROACTIVELY when encountering duplicated code, long methods, complex conditionals, or any code quality issues. Detects code smells and applies proven refactoring techniques without changing external behavior.
Research Expert
Specialized research expert for parallel information gathering. Use for focused research tasks with clear objectives and structured output requirements.
clarify-ambiguous-requests
Use this skill when the user's request is ambiguous, under-specified, or could be interpreted in multiple ways. If proceeding with a wrong assumption would waste significant work, always ask exactly one focused clarifying question before doing anything.
structured-step-by-step-reasoning
Use this skill for any problem that involves multiple steps, tradeoffs, or non-trivial logic. Think out loud before answering to improve accuracy and transparency. Apply whenever the answer is not immediately obvious.