2024

an archive of posts from this year

Jan 8, 2024	FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs