Model Compression

Skills tagged with #Model Compression

@plasmate-labs

MCP

Io.Github.Plasmate Labs/Plasmate

Agent-native headless browser. HTML in, Semantic Object Model out. 10x token compression.

mcpgithubbrowser

plasmate-labs/plasmate

19d ago

@Orchestra-Research

awq-quantization

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner.

OptimizationAWQQuantization4-BitActivation-AwareMemory Optimization

Orchestra-Research/AI-research-SKILLs+46 more

18d ago

5.0K0

@gavdalf

total-recall

The only memory skill that watches on its own. No database. No vectors. No manual saves. Just an LLM observer that compresses your conversations into prioritised notes, consolidates when they grow, and recovers anything missed. Five layers of redundancy, zero maintenance. ~$0.00/month (using free-tier models). While other memory skills ask you to remember to remember, this one just pays attention.

gavdalf/total-recall

19d ago

2030