mlx-bun

Local LLMs on Apple Silicon · No Python

Run a model on your Mac.
One command.

A local LLM server and TypeScript library built on MLX — OpenAI- and Anthropic-compatible, bit-exact to the Python reference, shipped as a single signed binary.

curl -fsSL mlx-bun.dev/install.sh | sh

or brew install joshuarossi/tap/mlx-bun · bunx mlx-bun · from source →

mlx-bun — zsh

$ mlx-bun

↓ MLX runtime ready · MiniCPM5-1B

↻ serving on http://localhost:8080

↗ opening /#/chat

› what runs on my machine here?

All of it. The model runs on your Mac’s GPU via Metal — no API key, no Python, nothing leaves the laptop.

bit-exactmlx-bun ≡ mlx-lm (bf16 KV)mlx-bun ≡ mlx-optiq (mixed-precision KV)— matched mode for mode, not “close”

Why it’s different

Runtime

No Python

Bun + mlx-c over bun:ffi. No venv, no sidecar Python server, no segfaults on exit. One binary, one toolchain.

Correctness

Bit-exact

Matched bit-for-bit to the reference it’s run against — mlx-lm in bf16, mlx-optiq when quantized. Identical logits, not “approximately.”

Speed

Fastest to first token

Served over HTTP, the fastest TTFT and startup of any stack tested — 45–90 ms vs Python’s 220–330 ms — with near-zero server tax.

Protocols

Drop-in for mlx_lm.server

Same port, endpoints, and request fields — plus Anthropic Messages and OpenAI Responses. Point any SDK, or Claude Code, at a local backend.

Multimodal

Tools & vision

Native tool-calling parsed into OpenAI tool_calls; image input via native OS codecs (HEIC, AVIF, WebP…) on vision models.

Shape

Library-first

The server is one consumer of a TypeScript API. Import generation into a Bun process, or embed the binary as a Mac-app sidecar.

Memory

Personal memory

A local, git-tracked Markdown wiki your assistant reads for durable context — yours, editable in Obsidian, never leaves the machine.

The lab

An open playground

LoRA/ORPO fine-tuning on your Mac, speculative decoding research, and an interactive sampling-curve designer at /curves.

Measured, not vibes

Head-to-head on an M4 Pro (24 GB), same models, same day, against mlx-lm and mlx-optiq:

	mlx-bun	mlx-lm	optiq
TTFT, served (warm)	45–90 ms	219–224 ms	222–331 ms
server start → ready	0.36–0.47 s	0.76–0.98 s	0.79–1.00 s
server tax vs direct decode	≈ 0%	−5…−7%	≈ 0%

The honest negatives are in the same table — see Benchmarks.

Keep reading

Installation Four ways in: Homebrew, curl, bunx, or source.

Quickstart First run, the chat UI, and the OpenAI-compatible API.

The HTTP API Chat completions, tools, vision, prompt caching, LoRA.

Personal memory A local wiki your assistant reads — durable, git-tracked, yours.

The lab Fine-tuning, speculative decoding, and the curve designer.

How it compares Where mlx-bun sits next to mlx-lm and mlx-optiq.