Skip to content

mlx-bun

Local LLMs on Apple Silicon · No Python

Run a model on your Mac.
One command.

A local LLM server and TypeScript library built on MLX — OpenAI- and Anthropic-compatible, bit-exact to the Python reference, shipped as a single signed binary.

brew install joshuarossi/tap/mlx-bun

or bunx mlx-bun · curl mlx-bun.dev/install.sh | sh · from source →

bit-exactmlx-bun mlx-lm (bf16 KV)mlx-bun mlx-optiq (mixed-precision KV)— matched mode for mode, not “close”

Runtime

No Python

Bun + mlx-c over bun:ffi. No venv, no sidecar Python server, no segfaults on exit. One binary, one toolchain.

Correctness

Bit-exact

Matched bit-for-bit to the reference it’s run against — mlx-lm in bf16, mlx-optiq when quantized. Identical logits, not “approximately.”

Speed

Fastest to first token

Served over HTTP, the fastest TTFT and startup of any stack tested — 45–90 ms vs Python’s 220–330 ms — with near-zero server tax.

Protocols

OpenAI + Anthropic

Drop-in chat completions, Anthropic Messages, and OpenAI Responses. Point any SDK — or Claude Code — at a fully local backend.

Multimodal

Tools & vision

Native tool-calling parsed into OpenAI tool_calls; image input via native OS codecs (HEIC, AVIF, WebP…) on vision models.

Shape

Library-first

The server is one consumer of a TypeScript API. Import generation into a Bun process, or embed the binary as a Mac-app sidecar.

Head-to-head on an M4 Pro (24 GB), same models, same day, against mlx-lm and mlx-optiq:

mlx-bunmlx-lmoptiq
TTFT, served (warm)45–90 ms219–224 ms222–331 ms
server start → ready0.36–0.47 s0.76–0.98 s0.79–1.00 s
server tax vs direct decode≈ 0%−5…−7%≈ 0%

The honest negatives are in the same table — see Benchmarks.