pairwise

Skills tagged with #pairwise

@takeyaqa

MCP

PictMCP

Provides pairwise combinatorial testing capabilities to AI assistants.

Automatically evaluate and compare multiple AI models or agents without pre-existing test data. Generates test queries from a task description, collects responses from all target endpoints, auto-generates evaluation rubrics, runs pairwise comparisons via a judge model, and produces win-rate rankings with reports and charts. Supports checkpoint resume, incremental endpoint addition, and judge model hot-swap. Use when the user asks to compare, benchmark, or rank multiple models or agents on a custom task, or run an arena-style evaluation.

agentscope-ai/OpenJudge+8 more

2mo ago

4720

pairwise

PictMCP

auto-arena