evidence

Skills tagged with #evidence

agent-spec-builder

Build a Prompt Hardener agent_spec.yaml from an existing codebase (from-code) or through an interactive interview (from-questions). Use when the user wants to create, generate, or scaffold an agent spec, or when they mention agent_spec.yaml creation. Generates agent_spec.yaml, evidence.md, and open_questions.md with confidence tracking and evidence trails.

cybozu/prompt-hardener

18d ago

470

@alchemiststudiosDOTai

agents-md-mapper

This skill should be used when creating, refreshing, or validating a repository `AGENTS.md` so it stays concise, current, and grounded in repository evidence. Use when `AGENTS.md` is missing or stale, after refactors or tooling changes, when new docs become the system of record, or when adding lightweight drift checks.

alchemiststudiosDOTai/harness-engineering+7 more

18d ago

490

@sequenzia

bug-investigator

Executes diagnostic investigation tasks to test debugging hypotheses. Runs tests, traces execution, checks git history, and reports evidence. (converted from agent)

sequenzia/agent-alchemy+31 more

19d ago

330

@basilisk-labs

agentplane-release-and-packaging-operator

Use when preparing, validating, publishing, or recovering an Agentplane release, especially package build ordering, version parity, npm publication, public install smoke tests, hosted publish evidence, or release CI failures.

basilisk-labs/agentplane+1 more

8d ago

390

@PCIRCLE-AI

Agentic Orchestration (Experimental Working-Model Protocol)

> **Status â experimental, instrumented, validation in progress.** This > skill is shipped to begin collecting evidence about whether a structured > verifiability-router protocol changes Claude's behavior in ways that > measurably help users. `memesh patterns` exposes a local counter so you > can

PCIRCLE-AI/memesh-llm-memory+2 more

Opentargets

Open Targets MCP server for targets, diseases, drugs, variants, and evidence

mcpgithub

nickzren/opentargets-mcp

19d ago

@jellewas

MCP

EU Audit Trail

Tamper-evident audit trail MCP server for EU AI Act & GDPR compliance.

mcpgithubai

jellewas/eu-audit-mcp

19d ago

@sibyllai

Governance layer for Claude Code Agent Teams — durable auditability, operational controls, and evidence trails for AI-assisted development.

Governance framework for AI agent team coordination, audit trails, and boundary enforcement.

sibyllai/khoregos

18d ago

@collaborative-deep-research

fact-check

Verify a specific claim by searching for evidence across web and academic sources. Use when the user asks to verify, fact-check, or confirm a statement.

collaborative-deep-research/agent-papers-cli+1 more

BrowseAI Dev

Evidence-backed web research for AI agents with citations and confidence scores.

mcpgithubaisearchweb

BrowseAI-HQ/BrowseAI-Dev

19d ago

@openai

agentic-legibility

Score a repository's agentic legibility from repo-visible evidence only. Use when Codex needs to audit how easy a codebase is for coding agents to discover, bootstrap, validate, and navigate, especially for harness-engineering reviews, developer-experience audits, repo cleanup, or before/after comparisons after improving docs, tooling, or architectural constraints.

acquiring-disk-image-with-dd-and-dcfldd

Create forensically sound bit-for-bit disk images using dd and dcfldd while preserving evidence integrity through hash verification.

forensicsdisk-imagingevidence-acquisitiondddcflddhash-verification

mukul975/Anthropic-Cybersecurity-Skills+242 more

19d ago

2390

@nextor2k

hyperfocus

ADHD-friendly output formatting for Codex. Restructures responses with evidence-based cognitive accessibility: chunking, visual hierarchy, front-loaded key points, and progressive disclosure. Three modes: clean, flow (default), zen. Use when user says "hyperfocus", "focus mode", "adhd mode", "adhd friendly", or invokes /hyperfocus.

nextor2k/hyperfocus

18d ago

@tkersey

codex-upcoming-features

Fetch and summarize upcoming unreleased Codex features using a durable local clone synced from GitHub, with source-file mining as primary evidence. Use when asked for latest upcoming/openai-codex features, what is coming next but not in the latest stable release, or a live release-gap summary with links and as-of timestamp.

tkersey/dotfiles+25 more

18d ago

450

@savvides

assessment-design

Evidence-based assessment design with rubrics, feedback strategies, and formative checkpoints. Aligns each assessment to learning objectives using Bloom's taxonomy. Applies Nicol's 7 principles of good feedback practice. Reads from /learning-objectives manifest and extends it with assessment specs. (idstack)

savvides/idstack+7 more

18d ago

@0verL1nk

agentic_search

Run a local-first research workflow by planning sub-queries, iterating retrieval, validating sources, and producing traceable evidence-backed conclusions.

0verL1nk/PaperSage+3 more

18d ago

320

@ecomfe

figma-design-to-code

Implement or update project-consistent UI code from a Figma selection or nodeId using TemPad Dev MCP. Use when the user wants visible Figma UI recreated, ported, or integrated into the target project's framework, styling system, tokens, and existing components when available. Do not use for design critique, product invention, generic code review, or for guessing hidden states, responsiveness, or behavior not shown in design or project evidence.

code-change-verification

Verify code changes by identifying correctness, regression, security, and performance risks from diffs or patches, then produce prioritized findings with file/line evidence and concrete fixes. Use when reviewing commits, PRs, and merged patches before/after release.

Pretorin Compliance

Access Pretorin compliance systems, controls, evidence, and narratives from your AI tools.

mcpgithubai

pretorin-ai/pretorin-cli.git

19d ago

@0xSero

evidence-heavy-evaluator

Generate an evidence-first, read-only repository evaluation report with deterministic scoring and actionable recommendations. Use when the user asks to assess readiness, maintainability, release-readiness, documentation gaps, or engineering health and wants auditable artifacts (`json` + `markdown` + raw command logs).

content-scoring

Score content against the 10 GEO criteria with evidence and prioritized fixes. Use when users ask to score, rate, evaluate, or estimate ranking strength.

mverab/eGEOagents+2 more

18d ago

650

@garrytan

browse

Fast headless browser for QA testing and site dogfooding. Navigate any URL, interact with elements, verify page state, diff before/after actions, take annotated screenshots, check responsive layouts, test forms and uploads, handle dialogs, and assert element states. ~100ms per command. Use when you need to test a feature, verify a deployment, dogfood a user flow, or file a bug with evidence. Use when asked to "open in browser", "test the site", "take a screenshot", or "dogfood this".

garrytan/gstack+18 more

18d ago

10.3K0

@genomoncology

biomcp

Search and retrieve biomedical data - genes, variants, clinical trials, articles, drugs, diseases, pathways, proteins, adverse events, pharmacogenomics, and phenotype-disease matching. Use for gene function, variant pathogenicity, trials, drug safety, pathway context, disease workups, and literature evidence.

buyer-eval

Structured B2B software vendor evaluation for buyers. Researches your company, asks domain-expert questions, engages vendor AI agents via the Salespeak Frontdoor API, scores vendors across 7 dimensions, and produces a comparative recommendation with evidence transparency. Use when asked to evaluate, compare, or research B2B software vendors.

salespeak-ai/buyer-eval-skill

18d ago

490

@harumiWeb

adr-drafter

Draft a new ExStruct ADR or propose an update to an existing ADR from an issue, PR, diff, tests, and specs. Use when an ADR is required or recommended and you need a structured draft with context, decision, consequences, and evidence.

harumiWeb/exstruct+3 more

19d ago

1290

@happier-dev

happier-diagnose

Diagnose a problem with a Happier session, the daemon, a provider (Claude/Codex/OpenCode), auth, or connectivity. Pulls the correct logs, finds a true root cause from evidence only, presents findings, and optionally uploads a private diagnostics bundle to Happier developers and/or files a sanitized public GitHub issue (the two are complementary). Use when the user reports a bug, says Happier is broken/stuck/misbehaving, asks to debug/diagnose/triage/troubleshoot Happier, or shares a Happier session ID and asks what went wrong.

happier-dev/happier+3 more

10d ago

7310

@Blockether

Presenter Reference

Generate self-contained HTML files for technical diagrams, visualizations, and data tables. Use `spel open` to preview and `spel screenshot` to capture evidence.

Blockether/spel+1 more

18d ago

170

@bitflight-devops

Evaluate Options

Do not present options to the user without evidence. Every recommendation must be grounded in research, not assertion.

bitflight-devops/hallucination-detector

18d ago

@mikkelkrogsholm

lab-review

Cross-reference lab results from sundhed.dk against current PubMed and medRxiv research on optimal ranges. Generates a report comparing your values to the latest evidence-based guidelines, meta-analyses, and preprints.

mikkelkrogsholm/ai-laegens-bord+2 more

18d ago

370

@diskd-ai

ccbox

Inspect local agent session logs via `ccbox` CLI and produce quick, evidence-based insights.

diskd-ai/ccbox+1 more

18d ago

320

@jfrog

jfrog

Interact with the JFrog Platform via the JFrog CLI and REST/GraphQL APIs. Use this skill when the user wants to manage Artifactory repositories, upload or download artifacts, manage builds, configure permissions, manage users and groups, work with access tokens, configure JFrog CLI servers, search artifacts, manage properties, set up replication, manage JFrog Projects, run security audits or scans, look up CVE details, query exposures scan results from JFrog Advanced Security, manage release bundles and lifecycle operations, aggregate or export platform data, or perform any JFrog Platform administration task. Also use when the user mentions jf, jfrog, artifactory, xray, distribution, evidence, apptrust, onemodel, graphql, workers, mission control, curation, advanced security, exposures, or any JFrog product name.

jfrog/jfrog-skills+1 more

18d ago

@vercel-labs

dogfood

Systematically explore and test a web application to find bugs, UX issues, and other problems. Use when asked to "dogfood", "QA", "exploratory test", "find issues", "bug hunt", "test this app/site/platform", or review the quality of a web application. Produces a structured report with full reproduction evidence -- step-by-step screenshots, repro videos, and detailed repro steps for every issue -- so findings can be handed directly to the responsible teams.

vercel-labs/agent-browser+2 more

18d ago

22.0K0

@chojondocho

vowline

General operating skill for AI agents handling meaningful work across domains: ambiguous requests, multi-step execution, tool use, coding, debugging, research, writing, artifacts, planning, review, decisions, visual work, prompt work, and handoff. Use when intent inference, safe action, evidence, verification, concise reporting, or completion criteria matter, including alongside narrower active skills. Skip only trivial one-shot replies.

chojondocho/vowline

8d ago

120