Quantization

Skills tagged with #Quantization

awq-quantization

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner.

OptimizationAWQQuantization4-BitActivation-AwareMemory Optimization

Orchestra-Research/AI-research-SKILLs+46 more

18d ago

5.0K0

@muinyc

Audio Transcription with Whisper

Transcribe audio files locally using faster-whisper (CPU, int8 quantization). Supports all common audio formats (wav, mp3, m4a, flac, ogg, webm).

muinyc/istota+21 more

14d ago

@AlexsJones

llmfit-advisor

Detect local hardware (RAM, CPU, GPU/VRAM) and recommend the best-fit local LLM models with optimal quantization, speed estimates, and fit scoring.

AlexsJones/llmfit

18d ago

16.6K0