Skills

audio

Skills tagged with #audio

developing-with-prism

Guide for developing with Prism PHP package - a Laravel package for integrating LLMs. Activate or use when working with Prism features including text generation, structured output, embeddings, image generation, audio processing, streaming, tools/function calling, or any LLM provider integration (OpenAI, Anthropic, Gemini, Mistral, Groq, XAI, DeepSeek, OpenRouter, Ollama, VoyageAI, ElevenLabs). Activate for any Prism-related development tasks.

prism-php/prism

audio

Unity audio system â AudioMixer groups, snapshots, spatial audio, audio source pooling, compression per platform.

XeldarAlz/everything-claude-unity+3 more

mcp-video

Use mcp-video for guarded video editing, FFmpeg operations, media analysis, subtitles, audio workflows, Hyperframes rendering, repurposing packages, and release checkpoints through an MCP server, Python client, or CLI. Trigger when an agent needs to inspect, edit, render, validate, or package local media safely.

KyaniteLabs/mcp-video

MiniMax Multi-Modal Toolkit

Generate voice, music, video, and image content via MiniMax APIs â the unified entry for **MiniMax multimodal** use cases (audio + music + video + image). Includes voice cloning & voice design for custom voices, image generation with character reference, and FFmpeg-based media tools for audio/vide

poco-ai/poco-claw

SlideMaster

Create AI presentation videos with slides, narration, TTS audio, and MP4 export from any topic.

Whisper Mcp

Local audio transcription using whisper.cpp. Transcribe with OpenAI Whisper models.

jwulff/whisper-mcp

nlm-cli-skill

Expert guide for the NotebookLM CLI (`nlm`) - a command-line interface for Google NotebookLM. Use this skill when users want to interact with NotebookLM programmatically, including: creating/managing notebooks, adding sources (URLs, YouTube, text, Google Drive), generating content (podcasts, reports, quizzes, flashcards, mind maps, slides, infographics, videos, data tables), conducting research, chatting with sources, or automating NotebookLM workflows. Triggers on mentions of "nlm", "notebooklm", "notebook lm", "podcast generation", "audio overview", or any NotebookLM-related automation task.

jacob-bd/notebooklm-cli

dspy-adapters-multimodal

This skill should be used when the user asks to "choose a DSPy adapter", "use JSONAdapter", "use XMLAdapter", "enable native function calling", "send images, audio, or files to DSPy", mentions `dspy.ChatAdapter`, `dspy.JSONAdapter`, `dspy.XMLAdapter`, `dspy.Image`, `dspy.Audio`, `dspy.File`, structured outputs, or multimodal DSPy signatures.

OmidZamani/dspy-skills+5 more

audio-plugin-coder

Noizefield/audio-plugin-coder+7 more

ffmpeg-mixing

Mix, trim, and concatenate video clips with ffmpeg without audio/video desync. Use when stitching generated clips into an original video, inserting scenes at timestamps, or any ffmpeg filter_complex work involving trim/concat with a continuous audio track.

vargHQ/sdk+1 more

MixMake

Transcript-based audio editing: transcribe audio, edit by word ID, export edited audio.

@anzy-renlab-ai

pronounce-word â speak the word out loud

**Purpose.** When the user asks how to pronounce an English word â and especially a project, product, or programmer-jargon name (`kubectl`, `nginx`, `Pydantic`, `LaTeX`, `JSON`, ...) â don't just respond in text. Play the audio so they can hear the *community* reading, then add a short text capt

anzy-renlab-ai/pronounce

@DoIT-Artificial-Intelligence

youtube-to-docs

(Kitchen Sink) Process a YouTube video with all features (summary, Q&A, infographic, audio, and video).

DoIT-Artificial-Intelligence/youtube-to-docs

audio-hooks

Use whenever the user asks to install, configure, snooze, mute, test, troubleshoot, or change settings for the claude-code-audio-hooks audio notification system. Trigger phrases include "audio hooks", "audio notifications", "snooze audio", "mute claude", "claude is too loud", "test audio", "switch audio theme", "rate limit alerts", "audio webhook", "TTS", "focus flow", and the slash command /audio-hooks. Also use when diagnosing why Claude Code is silent (or noisy) for the user.

ChanMeng666/claude-code-audio-hooks

Doubao TTS â è±åè¯é³åæ

Generate high-quality speech audio from text using Volcengine's Doubao TTS API. Supports short-form (real-time) and long-form (async, up to 100K characters) synthesis.

xvirobotics/metabot+6 more

Qwen3 ASR â Voice Transcription

Transcribe speech from audio files to text.

second-state/qwen3_asr_rs

Io.Github.Fjnunezp75/Gpu Bridge

30 GPU-powered AI services as MCP tools. LLM, image, video, audio, embeddings & more.

gpu-bridge/mcp-server

2d-animation-pipeline

Define authoring, import, and state machine rules for frame-by-frame and skeletal 2D animations.

MRCalderon3D/everything-game-dev-code+42 more

higgsfield-ugc-prompt

Generate complete, detailed Higgsfield AI Marketing Studio UGC video prompts for product advertising. Use when the user wants to create a UGC video ad prompt for Higgsfield, mentions Higgsfield, wants a marketing video prompt, or provides product/shop reference images and asks for a video prompt. Generates second-by-second prompts with full audio, camera, outfit, and character descriptions in English with Turkish dialogue.

msk3d0ut/claude-skill-ugc-prompt

@Orange-Sky-Software-Inc

Io.Github.Matthew B Simpson/Echosaw

Media intelligence analysis for audio, video, and images via the Echosaw MCP server.

Orange-Sky-Software-Inc/echosaw-com+1 more

acestep

Use ACE-Step API to generate music, edit songs, and remix music. Supports text-to-music, lyrics generation, audio continuation, and audio repainting. Use this skill when users mention generating music, creating songs, music production, remix, or audio continuation.

ace-step/ACE-Step-1.5+5 more

Audio Jingle Skill

Three sub-modes. The active project's `audioKind` decides which one runs:

nexu-io/open-design+19 more

Audio Transcription with Whisper

Transcribe audio files locally using faster-whisper (CPU, int8 quantization). Supports all common audio formats (wav, mp3, m4a, flac, ogg, webm).

muinyc/istota+21 more

check-codesign

Check macOS code signature, hardened runtime, entitlements, and notarization of audio plugin bundles (.vst3, .component, .clap, .app/.appex). Use when user says "check code signing", "check codesign", "check signature", "verify signing", "check notarization", "why won't plugin load", "hardened runtime", "check entitlements", or a plugin fails to load in a signed DAW.

iPlug3/audio-plugin-dev-skills+5 more

video-podcast-maker

Use when user provides a topic and wants an automated video podcast created - handles research, script writing, TTS audio synthesis, Remotion video creation, and final MP4 output with background music

Agents365-ai/video-podcast-maker

notebooklm-research

Full-autopilot AI research agent powered by Google NotebookLM (notebooklm-py v0.3.4). Ingests sources (URL, text, PDF, DOCX, YouTube, Google Drive), runs deep web research, asks cited questions, and generates 10 native artifact types (audio podcast, video, cinematic video, slide deck, report, quiz, flashcards, mind map, infographic, data table, study guide). Produces original content drafts via Claude, with optional publishing to social platforms via threads-viral-agent integration. Use this skill when the user mentions: NotebookLM, research with sources, create notebook, generate podcast from articles, turn research into content, trending topic research, research pipeline, source-based analysis, cited research answers, generate slides, generate quiz, make flashcards, deep web research, create infographic, compare sources, research report, study guide, source analysis, or knowledge synthesis.

jakubs2623/notebooklm-skill

alicloud-ai-audio-asr

Transcribe non-realtime speech with Alibaba Cloud Model Studio Qwen ASR models (`qwen3-asr-flash`, `qwen-audio-asr`, `qwen3-asr-flash-filetrans`). Use when converting recorded audio files to text, generating transcripts with timestamps, or documenting DashScope/OpenAI-compatible ASR request and response fields.

cinience/alicloud-skills+61 more

ctf-forensics

Provides digital forensics and signal analysis techniques for CTF challenges. Use when analyzing disk images, memory dumps, event logs, network captures, cryptocurrency transactions, steganography, PDF analysis, Windows registry, Volatility, PCAP, Docker images, coredumps, side-channel power traces, DTMF audio spectrograms, packet timing analysis, CD audio disc images, or recovering deleted files and credentials.

ljagiello/ctf-skills+7 more

transcribee

Transcribe YouTube videos and local audio/video files with speaker diarization. Use when user asks to transcribe a YouTube URL, podcast, video, or audio file. Outputs clean speaker-labeled transcripts ready for LLM analysis.

itsfabioroma/transcribee

@mightyhuman101

seedance-prompt-en

Write effective prompts for Jimeng Seedance 2.0 multimodal AI video generation. Use when users want to create video prompts using text, images, videos, and audio inputs with the @ reference system. Covers camera movements, effects replication, video extension, editing, music beat-matching, e-commerce ads, short dramas, and educational content.

mightyhuman101/seedance2-skill

302ai-api-integration

ALWAYS use this skill when user needs ANY API functionality (AI models, image generation, video, audio, text processing, etc.). Automatically search 302.AI's 1400+ APIs and generate integration code. Use proactively whenever APIs or AI capabilities are mentioned.

302ai/302AI-API-Integration-Skill

nlm-skill

Expert guide for the NotebookLM CLI (`nlm`) and MCP server - interfaces for Google NotebookLM. Use this skill when users want to interact with NotebookLM programmatically, including: creating/managing notebooks, adding sources (URLs, YouTube, text, Google Drive), generating content (podcasts, reports, quizzes, flashcards, mind maps, slides, infographics, videos, data tables), conducting research, chatting with sources, or automating NotebookLM workflows. Triggers on mentions of "nlm", "notebooklm", "notebook lm", "podcast generation", "audio overview", or any NotebookLM-related automation task.

jacob-bd/notebooklm-mcp-cli

audio-quality-check

Analyze audio recording quality - echo detection, loudness, speech intelligibility, SNR, spectral analysis. Use when the user wants to check a recording's quality, detect echo or duplication in audio files, measure speech clarity, compare original vs processed audio, diagnose why a recording sounds bad, or analyze audio tracks from Blackbox or any call recording app. Triggers on audio quality, recording analysis, echo detection, check recording, sound quality, analyze audio, speech quality, PESQ, STOI, loudness, SNR, audio diagnostics, recording sounds bad, echo in recording, audio duplication.

tenequm/skills+29 more

Audioscrape Audio Intelligence

The audio intelligence layer. Search podcast transcripts, speakers, and entities across 250K+ shows.

Io.Github.BrightWayAI/Video Analyzer

Analyze videos: extract frames, transcribe audio, generate storyboard breakdowns.

BrightWayAI/video-analyzer

NotebookLM MCP

Automate Google NotebookLM — Q&A with citations, audio, video, content generation

roomi-fields/notebooklm-mcp

Apple Voice Memo Mcp

Access Apple Voice Memos on macOS. List, get audio, extract and generate transcripts.

jwulff/apple-voice-memo-mcp

seedance-20

Generate and direct cinematic AI videos with Seedance 2.0 (ByteDance/Dreamina/Jimeng). Covers text-to-video, image-to-video, video-to-video, and reference-to-video workflows with @Tag asset references, multi-character scenes, audio design, and post-processing. Use when making AI video, writing Seedance prompts, directing a scene, fixing generation errors, or building an AI short film, product ad, or music video.

ai-videofilmmakingbytedanceseedancemultimodallip-sync

Emily2040/seedance-2.0+23 more

analyze-video

Adds visual descriptions to transcripts by extracting and analyzing video frames with ffmpeg. Creates visual transcript with periodic visual descriptions of the video clip. Use when all files have audio transcripts present (transcript) but don't yet have visual transcripts created (visual_transcript).

barefootford/buttercut+3 more

NotebookLM AI Plugin

Supports: - Chat with Notebook AI (source-grounded Q&A with citations) - Slide Deck generation (PDF/PPTX) - Audio Overview (M4A -- deep dive, brief, critique, debate formats) - Video Overview (MP4 -- classic, whiteboard, kawaii, anime, watercolor styles) - Mind Map (HTML) - Flashcards (HTML/JSON) -

proyecto26/notebooklm-ai-plugin

Multimodal

Multi-provider media generation — images, video, audio, and transcription via a unified interface

rsmdt/multimodal-mcp

Workflows MCP Collections Publishers Docs