@takeyaqa
MCP
PictMCP
Provides pairwise combinatorial testing capabilities to AI assistants.
mcpgithubai
takeyaqa/PictMCP
19d ago
0
@agentscope-ai
auto-arena
Automatically evaluate and compare multiple AI models or agents without pre-existing test data. Generates test queries from a task description, collects responses from all target endpoints, auto-generates evaluation rubrics, runs pairwise comparisons via a judge model, and produces win-rate rankings with reports and charts. Supports checkpoint resume, incremental endpoint addition, and judge model hot-swap. Use when the user asks to compare, benchmark, or rank multiple models or agents on a custom task, or run an arena-style evaluation.
agentscope-ai/OpenJudge+8 more
18d ago
4720