RAGScore
Generate QA datasets & evaluate RAG systems. Privacy-first, any LLM, local or cloud.
Hugging Science
Hugging Science is a curated, LLM-friendly index of scientific datasets, models, blog posts, and interactive demos for ML researchers. Use it when a scientific ML question lands in front of you â it's much higher signal than generic search and the entries are pre-filtered for quality and openness.
phoenix-cli
Debug LLM applications using the Phoenix CLI. Fetch traces, analyze errors, review experiments, inspect datasets, and query the GraphQL API. Use when debugging AI/LLM applications, analyzing trace data, working with Phoenix observability, or investigating LLM performance issues.
Mcp
List datasets, schemas, run APL queries, and use prompts for exploration, anomalies, and monitoring.
data-designer
Use when the user wants to create a dataset, generate synthetic data, or build a data generation pipeline.
alterlab-arboreto
Infer gene regulatory networks (GRNs) from gene expression data using scalable algorithms (GRNBoost2, GENIE3). Use when analyzing transcriptomics data (bulk RNA-seq, single-cell RNA-seq) to identify transcription factor-target gene relationships and regulatory interactions. Supports distributed computation for large-scale datasets. Part of the AlterLab Academic Skills suite.
MCP Superset
Apache Superset MCP server — 135+ tools for dashboards, charts, datasets, SQL Lab, and security
remote-training
Manages remote training infrastructure on Nebius VMs. Use for building/pushing Docker images, starting/stopping VM machines (train, train2, train3), running training jobs, dataset generation, and starting inference servers.
OECD MCP Server
OECD economic and statistical data via SDMX API. Access 5,000+ datasets across 17 categories.
Io.Github.AdonaiVera/Fiftyone Mcp Server
Control FiftyOne computer vision datasets through AI assistants using 80+ operators.
Aquaview Mcp
AQUAVIEW MCP Server - Search and access global oceanographic and environmental datasets.
CSV Data Profiler
Analyzes CSV datasets to produce column-level statistics, missing value reports, type inference, and data quality scores.
byob
Create custom LLM evaluation benchmarks using the BYOB decorator framework. Use when the user wants to (1) create a new benchmark from a dataset, (2) pick or write a scorer, (3) compile and run a BYOB benchmark, (4) containerize a benchmark, or (5) use LLM-as-Judge evaluation. Triggers on mentions of BYOB, custom benchmark, bring your own benchmark, scorer, or benchmark compilation.
PowerBI MCP Server
PowerBI REST API integration. Query workspaces, datasets, and execute DAX queries via MCP.
sync-demo-seeder
Synchronizes scripts/seed-demo.sql with the current Drizzle schema and migrations. Use this skill whenever a new migration is added or src/lib/db/schema.ts changes, to keep the demo dataset consistent with the database structure.
book-sft-pipeline
This skill should be used when the user asks to "fine-tune on books", "create SFT dataset", "train style model", "extract ePub text", or mentions style transfer, LoRA training, book segmentation, or author voice replication.
Dataset Insight Report
Profiles a CSV dataset, generates analytical SQL queries, and produces chart specifications for key findings.
dataclaw
Export Claude Code, Codex, Gemini CLI, OpenCode, and OpenClaw conversation history to Hugging Face. Use when the user asks about exporting conversations, uploading to Hugging Face, configuring DataClaw, reviewing PII/secrets in exports, or managing their dataset.
Nova Scotia Data Explorer
Query and explore Nova Scotia open datasets via the Socrata SODA API.
Synthetic Data Generation
Generate synthetic data, run a flow, create training data, produce datasets, or author custom flow YAMLs using sdg_hub
Everyrow MCP Server
Give your AI a research team. Forecast, score, classify, or research every row of a dataset.
hf-mcp
Use Hugging Face Hub via MCP server tools. Search models, datasets, Spaces, papers. Get repo details, fetch documentation, run compute jobs, and use Gradio Spaces as AI tools. Available when connected to the HF MCP server.
spss-academic-workflow
Use when the user wants a complete SPSS or SPSS-MCP empirical research workflow: organize source data, prepare datasets, design variables and models, run SPSS analyses, export Chinese result paragraphs and tables, write a Chinese LaTeX paper, and compile the final PDF.
lance-user-guide
Guide Code Agents to help Lance users write/read datasets and build/choose indices. Use when a user asks how to use Lance (Python/Rust/CLI), how to write_dataset/open/scan, how to build vector indexes (IVF_PQ, IVF_HNSW_*), how to build scalar indexes (BTREE, BITMAP, LABEL_LIST, NGRAM, INVERTED, BLOOMFILTER, RTREE, etc.), how to combine filters with vector search, or how to debug indexing and scan performance.
Intelligence Aeternum
AI training dataset marketplace: 2M+ museum artworks with Golden Codex enrichment
evolve-skill
Run the full skill evolution pipeline -- harvest sessions, discover signals, build golden dataset, eval baseline, evolve via DSPy, compare scores
Ragnarok Dataset Workflow
Detailed reference for ragnarok's dataset-backed generation mode, which combines retrieval and answer generation in a single pipeline.
Mirror â Portrait, Audit & Interview Skill
Surface what the AI knows about the user â as a person, not as a dataset.
Opik MCP Server
Interact with Opik prompts, traces, datasets and metrics through the Model Context Protocol.