Skills

All Skills

Multimodal

Skills tagged with #Multimodal

@poco-ai

MiniMax Multi-Modal Toolkit

Generate voice, music, video, and image content via MiniMax APIs — the unified entry for **MiniMax multimodal** use cases (audio + music + video + image). Includes voice cloning & voice design for custom voices, image generation with character reference, and FFmpeg-based media tools for audio/vide

poco-ai/poco-claw
18d ago
1.3K0
@alex-feel
MCP

Io.Github.Alex Feel/Mcp Context Server

An MCP server that provides persistent multimodal context storage for LLM agents.

mcpgithubragllm
alex-feel/mcp-context-server
19d ago
0
@Orchestra-Research

awq-quantization

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner.

OptimizationAWQQuantization4-BitActivation-AwareMemory Optimization
Orchestra-Research/AI-research-SKILLs+46 more
18d ago
5.0K0
@francis-ros
MCP

Rostro

Turn any LLM multimodal; generate images, voices, videos, 3D models, music, and more.

mcpllm
francis-ros/rostro-mcp-server
19d ago
0
@zai-org

glmv-caption

Generate captions (descriptions) for images, videos, and documents using ZhiPu GLM-V multimodal model series. Use this skill whenever the user wants to describe, caption, summarize, or interpret the content of images, videos, or files. Supports single/multiple inputs, URLs, local paths, and base64 (images only).

zai-org/GLM-V+2 more
18d ago
2.2K0
@nextlevelbuilder

ckm:banner-design

Design banners for social media, ads, website heroes, creative assets, and print. Multiple art direction options with AI-generated visuals. Actions: design, create, generate banner. Platforms: Facebook, Twitter/X, LinkedIn, YouTube, Instagram, Google Display, website hero, print. Styles: minimalist, gradient, bold typography, photo-based, illustrated, geometric, retro, glassmorphism, 3D, neon, duotone, editorial, collage. Uses ui-ux-pro-max, frontend-design, ai-artist, ai-multimodal skills.

nextlevelbuilder/ui-ux-pro-max-skill+5 more
18d ago
41.4K0
@mightyhuman101

seedance-prompt-en

Write effective prompts for Jimeng Seedance 2.0 multimodal AI video generation. Use when users want to create video prompts using text, images, videos, and audio inputs with the @ reference system. Covers camera movements, effects replication, video extension, editing, music beat-matching, e-commerce ads, short dramas, and educational content.

mightyhuman101/seedance2-skill
18d ago
20
@google-gemini

gemini-interactions-api

Use this skill when writing code that calls the Gemini API for text generation, multi-turn chat, multimodal understanding, image generation, streaming responses, background research tasks, function calling, structured output, or migrating from the old generateContent API. This skill covers the Interactions API, the recommended way to use Gemini models and agents in Python and TypeScript.

google-gemini/gemini-skills+2 more
18d ago
2.2K0
@Emily2040

seedance-20

Generate and direct cinematic AI videos with Seedance 2.0 (ByteDance/Dreamina/Jimeng). Covers text-to-video, image-to-video, video-to-video, and reference-to-video workflows with @Tag asset references, multi-character scenes, audio design, and post-processing. Use when making AI video, writing Seedance prompts, directing a scene, fixing generation errors, or building an AI short film, product ad, or music video.

ai-videofilmmakingbytedanceseedancemultimodallip-sync
Emily2040/seedance-2.0+23 more
18d ago
4360
@rsmdt
MCP

Multimodal

Multi-provider media generation — images, video, audio, and transcription via a unified interface

mcpgithub
rsmdt/multimodal-mcp
19d ago
0