multimodal

Skills tagged with #multimodal

MiniMax Multi-Modal Toolkit

Generate voice, music, video, and image content via MiniMax APIs â the unified entry for **MiniMax multimodal** use cases (audio + music + video + image). Includes voice cloning & voice design for custom voices, image generation with character reference, and FFmpeg-based media tools for audio/vide

poco-ai/poco-claw

Io.Github.Alex Feel/Mcp Context Server

An MCP server that provides persistent multimodal context storage for LLM agents.

mcpgithubragllm

alex-feel/mcp-context-server

19d ago

@Orchestra-Research

awq-quantization

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner.

OptimizationAWQQuantization4-BitActivation-AwareMemory Optimization

Orchestra-Research/AI-research-SKILLs+46 more

Rostro

Turn any LLM multimodal; generate images, voices, videos, 3D models, music, and more.

mcpllm

francis-ros/rostro-mcp-server

19d ago

@zai-org

glmv-caption

Generate captions (descriptions) for images, videos, and documents using ZhiPu GLM-V multimodal model series. Use this skill whenever the user wants to describe, caption, summarize, or interpret the content of images, videos, or files. Supports single/multiple inputs, URLs, local paths, and base64 (images only).

ckm:banner-design

Design banners for social media, ads, website heroes, creative assets, and print. Multiple art direction options with AI-generated visuals. Actions: design, create, generate banner. Platforms: Facebook, Twitter/X, LinkedIn, YouTube, Instagram, Google Display, website hero, print. Styles: minimalist, gradient, bold typography, photo-based, illustrated, geometric, retro, glassmorphism, 3D, neon, duotone, editorial, collage. Uses ui-ux-pro-max, frontend-design, ai-artist, ai-multimodal skills.

nextlevelbuilder/ui-ux-pro-max-skill+5 more

19d ago

41.4K0

@mightyhuman101

seedance-prompt-en

Write effective prompts for Jimeng Seedance 2.0 multimodal AI video generation. Use when users want to create video prompts using text, images, videos, and audio inputs with the @ reference system. Covers camera movements, effects replication, video extension, editing, music beat-matching, e-commerce ads, short dramas, and educational content.

mightyhuman101/seedance2-skill

19d ago

@google-gemini

gemini-interactions-api

Use this skill when writing code that calls the Gemini API for text generation, multi-turn chat, multimodal understanding, image generation, streaming responses, background research tasks, function calling, structured output, or migrating from the old generateContent API. This skill covers the Interactions API, the recommended way to use Gemini models and agents in Python and TypeScript.

google-gemini/gemini-skills+2 more

18d ago

2.2K0

@Emily2040

seedance-20

Generate and direct cinematic AI videos with Seedance 2.0 (ByteDance/Dreamina/Jimeng). Covers text-to-video, image-to-video, video-to-video, and reference-to-video workflows with @Tag asset references, multi-character scenes, audio design, and post-processing. Use when making AI video, writing Seedance prompts, directing a scene, fixing generation errors, or building an AI short film, product ad, or music video.

ai-videofilmmakingbytedanceseedancemultimodallip-sync

Emily2040/seedance-2.0+23 more

Multimodal

Multi-provider media generation — images, video, audio, and transcription via a unified interface

mcpgithub

rsmdt/multimodal-mcp

19d ago