Skills

All Skills

dataset

Skills tagged with #dataset

@HZYAI
MCP

RAGScore

Generate QA datasets & evaluate RAG systems. Privacy-first, any LLM, local or cloud.

mcpgithubairagllm
HZYAI/RagScore
19d ago
0
@K-Dense-AI

Hugging Science

Hugging Science is a curated, LLM-friendly index of scientific datasets, models, blog posts, and interactive demos for ML researchers. Use it when a scientific ML question lands in front of you — it's much higher signal than generic search and the entries are pre-filtered for quality and openness.

K-Dense-AI/scientific-agent-skills
5d ago
20.0K0
@Arize-ai

phoenix-cli

Debug LLM applications using the Phoenix CLI. Fetch traces, analyze errors, review experiments, inspect datasets, and query the GraphQL API. Use when debugging AI/LLM applications, analyzing trace data, working with Phoenix observability, or investigating LLM performance issues.

Arize-ai/openinference+1 more
18d ago
8850
@axiomhq
MCP

Mcp

List datasets, schemas, run APL queries, and use prompts for exploration, anomalies, and monitoring.

mcp
axiomhq/mcp
19d ago
0
@NVIDIA-NeMo

data-designer

Use when the user wants to create a dataset, generate synthetic data, or build a data generation pipeline.

NVIDIA-NeMo/DataDesigner+4 more
18d ago
8050
@AlterLab-IEU

alterlab-arboreto

Infer gene regulatory networks (GRNs) from gene expression data using scalable algorithms (GRNBoost2, GENIE3). Use when analyzing transcriptomics data (bulk RNA-seq, single-cell RNA-seq) to identify transcription factor-target gene relationships and regulatory interactions. Supports distributed computation for large-scale datasets. Part of the AlterLab Academic Skills suite.

AlterLab-IEU/AlterLab-Academic-Skills+60 more
12d ago
100
@bintocher
MCP

MCP Superset

Apache Superset MCP server — 135+ tools for dashboards, charts, datasets, SQL Lab, and security

mcpgithub
bintocher/mcp-superset
19d ago
0
@Positronic-Robotics

remote-training

Manages remote training infrastructure on Nebius VMs. Use for building/pushing Docker images, starting/stopping VM machines (train, train2, train3), running training jobs, dataset generation, and starting inference servers.

Positronic-Robotics/positronic
18d ago
620
@isakskogstad
MCP

OECD MCP Server

OECD economic and statistical data via SDMX API. Access 5,000+ datasets across 17 categories.

mcpgithubapi
isakskogstad/OECD-MCP-server
19d ago
0
@AdonaiVera
MCP

Io.Github.AdonaiVera/Fiftyone Mcp Server

Control FiftyOne computer vision datasets through AI assistants using 80+ operators.

mcpgithubai
AdonaiVera/fiftyone-mcp-server
19d ago
0
@mcp-registry
MCP

Aquaview Mcp

AQUAVIEW MCP Server - Search and access global oceanographic and environmental datasets.

mcpsearch
19d ago
0
@quantumboost

CSV Data Profiler

Analyzes CSV datasets to produce column-level statistics, missing value reports, type inference, and data quality scores.

csvdata-analysisprofiling
15d ago
0
@NVIDIA-NeMo

byob

Create custom LLM evaluation benchmarks using the BYOB decorator framework. Use when the user wants to (1) create a new benchmark from a dataset, (2) pick or write a scorer, (3) compile and run a BYOB benchmark, (4) containerize a benchmark, or (5) use LLM-as-Judge evaluation. Triggers on mentions of BYOB, custom benchmark, bring your own benchmark, scorer, or benchmark compilation.

NVIDIA-NeMo/Evaluator+2 more
18d ago
2290
@gurvinder-dhillon
MCP

PowerBI MCP Server

PowerBI REST API integration. Query workspaces, datasets, and execute DAX queries via MCP.

mcpgithubapi
gurvinder-dhillon/powerbi-mcp
19d ago
0
@aaronjoeldev

sync-demo-seeder

Synchronizes scripts/seed-demo.sql with the current Drizzle schema and migrations. Use this skill whenever a new migration is added or src/lib/db/schema.ts changes, to keep the demo dataset consistent with the database structure.

aaronjoeldev/cashlytics-ai+1 more
19d ago
460
@muratcankoylan

book-sft-pipeline

This skill should be used when the user asks to "fine-tune on books", "create SFT dataset", "train style model", "extract ePub text", or mentions style transfer, LoRA training, book segmentation, or author voice replication.

muratcankoylan/Agent-Skills-for-Context-Engineering+5 more
19d ago
13.9K0
@quantumboost
Workflow

Dataset Insight Report

Profiles a CSV dataset, generates analytical SQL queries, and produces chart specifications for key findings.

data-analysisreportingvisualization
15d ago
0
@peteromallet

dataclaw

Export Claude Code, Codex, Gemini CLI, OpenCode, and OpenClaw conversation history to Hugging Face. Use when the user asks about exporting conversations, uploading to Hugging Face, configuring DataClaw, reviewing PII/secrets in exports, or managing their dataset.

peteromallet/dataclaw
19d ago
1.9K0
@SevaSk
MCP

Nova Scotia Data Explorer

Query and explore Nova Scotia open datasets via the Socrata SODA API.

mcpgithubapi
SevaSk/scotiasignal-frontend
19d ago
0
@Red-Hat-AI-Innovation-Team

Synthetic Data Generation

Generate synthetic data, run a flow, create training data, produce datasets, or author custom flow YAMLs using sdg_hub

Red-Hat-AI-Innovation-Team/sdg_hub
19d ago
1220
@futuresearch
MCP

Everyrow MCP Server

Give your AI a research team. Forecast, score, classify, or research every row of a dataset.

mcpgithubaisearch
futuresearch/everyrow-sdk+1 more
19d ago
0
@huggingface

hf-mcp

Use Hugging Face Hub via MCP server tools. Search models, datasets, Spaces, papers. Get repo details, fetch documentation, run compute jobs, and use Gradio Spaces as AI tools. Available when connected to the HF MCP server.

huggingface/skills+3 more
18d ago
9.0K0
@EXIST-D

spss-academic-workflow

Use when the user wants a complete SPSS or SPSS-MCP empirical research workflow: organize source data, prepare datasets, design variables and models, run SPSS analyses, export Chinese result paragraphs and tables, write a Chinese LaTeX paper, and compile the final PDF.

EXIST-D/spss-academic-workflow
16d ago
50
@lance-format

lance-user-guide

Guide Code Agents to help Lance users write/read datasets and build/choose indices. Use when a user asks how to use Lance (Python/Rust/CLI), how to write_dataset/open/scan, how to build vector indexes (IVF_PQ, IVF_HNSW_*), how to build scalar indexes (BTREE, BITMAP, LABEL_LIST, NGRAM, INVERTED, BLOOMFILTER, RTREE, etc.), how to combine filters with vector search, or how to debug indexing and scan performance.

lance-format/lance
18d ago
6.2K0
@codex-curator
MCP

Intelligence Aeternum

AI training dataset marketplace: 2M+ museum artworks with Golden Codex enrichment

mcpgithubai
codex-curator/intelligence-aeternum-mcp
19d ago
0
@iliaal

evolve-skill

Run the full skill evolution pipeline -- harvest sessions, discover signals, build golden dataset, eval baseline, evolve via DSPy, compare scores

iliaal/compound-engineering-plugin+24 more
13d ago
50
@castorini

Ragnarok Dataset Workflow

Detailed reference for ragnarok's dataset-backed generation mode, which combines retrieval and answer generation in a single pipeline.

castorini/ragnarok+3 more
18d ago
630
@rodspeed

Mirror — Portrait, Audit & Interview Skill

Surface what the AI knows about the user — as a person, not as a dataset.

rodspeed/epistemic-memory+1 more
19d ago
150
@comet-ml
MCP

Opik MCP Server

Interact with Opik prompts, traces, datasets and metrics through the Model Context Protocol.

mcpgithub
comet-ml/opik-mcp
19d ago
0