mlx-bun
Local LLMs on Apple Silicon · No Python
Run a model on your Mac.
One command.
A local LLM server and TypeScript library built on MLX — OpenAI- and Anthropic-compatible, bit-exact to the Python reference, shipped as a single signed binary.
brew install joshuarossi/tap/mlx-bunor bunx mlx-bun · curl mlx-bun.dev/install.sh | sh · from source →
Why it’s different
Section titled “Why it’s different”Runtime
No Python
Bun + mlx-c over bun:ffi. No venv, no sidecar Python server, no segfaults on exit. One binary, one toolchain.
Correctness
Bit-exact
Matched bit-for-bit to the reference it’s run against — mlx-lm in bf16, mlx-optiq when quantized. Identical logits, not “approximately.”
Speed
Fastest to first token
Served over HTTP, the fastest TTFT and startup of any stack tested — 45–90 ms vs Python’s 220–330 ms — with near-zero server tax.
Protocols
OpenAI + Anthropic
Drop-in chat completions, Anthropic Messages, and OpenAI Responses. Point any SDK — or Claude Code — at a fully local backend.
Multimodal
Tools & vision
Native tool-calling parsed into OpenAI tool_calls; image input via native OS codecs (HEIC, AVIF, WebP…) on vision models.
Shape
Library-first
The server is one consumer of a TypeScript API. Import generation into a Bun process, or embed the binary as a Mac-app sidecar.
Measured, not vibes
Section titled “Measured, not vibes”Head-to-head on an M4 Pro (24 GB), same models, same day, against mlx-lm and mlx-optiq:
| mlx-bun | mlx-lm | optiq | |
|---|---|---|---|
| TTFT, served (warm) | 45–90 ms | 219–224 ms | 222–331 ms |
| server start → ready | 0.36–0.47 s | 0.76–0.98 s | 0.79–1.00 s |
| server tax vs direct decode | ≈ 0% | −5…−7% | ≈ 0% |
The honest negatives are in the same table — see Benchmarks.