Jan 8, 2024 FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs