developing-with-prism
Guide for developing with Prism PHP package - a Laravel package for integrating LLMs. Activate or use when working with Prism features including text generation, structured output, embeddings, image generation, audio processing, streaming, tools/function calling, or any LLM provider integration (OpenAI, Anthropic, Gemini, Mistral, Groq, XAI, DeepSeek, OpenRouter, Ollama, VoyageAI, ElevenLabs). Activate for any Prism-related development tasks.
cuda-kernels
Provides guidance for writing and benchmarking optimized CUDA kernels for NVIDIA GPUs (H100, A100, T4) targeting HuggingFace diffusers and transformers libraries. Supports models like LTX-Video, Stable Diffusion, LLaMA, Mistral, and Qwen. Includes integration with HuggingFace Kernels Hub (get_kernel) for loading pre-compiled kernels. Includes benchmarking scripts to compare kernel performance against baseline implementations.
sidecar
Spawn conversations with other LLMs (Gemini, GPT, ChatGPT, Codex, o3, DeepSeek, Qwen, Grok, Mistral, etc.) and fold results back into your context. TRIGGER when: user asks to talk to, chat with, use, call, or spawn another LLM or model; user mentions Gemini, GPT, ChatGPT, Codex, o3, DeepSeek, Claude (as a sidecar target), Qwen, Grok, Mistral, or any non-current model by name; user asks to get a second opinion from another model; user wants parallel exploration with a different model; user says "sidecar", "fork", or "fold". CRITICAL RULES: (1) ALWAYS launch sidecar CLI commands with Bash tool's run_in_background: true. Never run sidecar start/resume/continue in the foreground. (2) The fold summary returns on stdout when the user clicks Fold in the GUI or the headless agent finishes. Use TaskOutput to read it when the background task completes. (3) Use --prompt for the start command (NOT --briefing). --briefing is only for subagent spawn. (4) NEVER use o3 or o3-pro unless the user explicitly asks for it by name. These models are extremely expensive ($10-60+ per request). If the user asks for o3, warn them about the cost before proceeding. Default to gemini for most tasks. (5) When the user asks to query MULTIPLE LLMs simultaneously (e.g., "ask Gemini AND ChatGPT", "compare Gemini vs GPT"), ALWAYS use --no-ui (headless) for all of them unless the user explicitly requests interactive. Opening multiple Electron windows at once is disruptive. Launch them all in parallel with run_in_background: true.
Io.Github.Timmx7/Styx Mcp Server
MCP server for Styx — intelligent AI routing across OpenAI, Anthropic, Google, and Mistral.
LiteLLM Multi-Model API
Access 100+ LLMs with one API: GPT-4, Claude, Gemini, Mistral, and more.