@Orchestra-Research
awq-quantization
Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner.
OptimizationAWQQuantization4-BitActivation-AwareMemory Optimization
Orchestra-Research/AI-research-SKILLs+46 more
18d ago
5.0K0
@muinyc
Audio Transcription with Whisper
Transcribe audio files locally using faster-whisper (CPU, int8 quantization). Supports all common audio formats (wav, mp3, m4a, flac, ogg, webm).
muinyc/istota+21 more
14d ago
50
@AlexsJones
llmfit-advisor
Detect local hardware (RAM, CPU, GPU/VRAM) and recommend the best-fit local LLM models with optimal quantization, speed estimates, and fit scoring.
AlexsJones/llmfit
18d ago
16.6K0