CLI reference
Commands are shown as mlx-bun <verb>. From a clone the identical command is
bun src/cli.ts <verb>. Model arguments are substring queries against the
registry (e4b, 26B, 12B-it); a query matching more than one model errors
out and lists the candidates — just make it more specific.
serve — run the server
Section titled “serve — run the server”Start the OpenAI/Anthropic-compatible server. Bare mlx-bun is an alias for
mlx-bun serve.
mlx-bun serve gemma --port 8090 # OpenAI-compatible servermlx-bun serve gemma --memory-budget 18 # ...with admission control (GB)mlx-bun serve e4b --no-open # don't open the browser chat UICommon flags (full list in Server configuration):
| Flag | Effect |
|---|---|
--port <n> | Listen port (default 8090) |
--memory-budget <GB> | Reject loads/requests that can’t fit the budget |
--no-open | Don’t auto-open the chat UI |
--no-kv-quant / --kv-bits <n> | Control mixed-precision KV |
--adapter id=dir | Mount a LoRA adapter at startup |
get — download a model
Section titled “get — download a model”Resumable, checksum-verified download into the standard Hugging Face cache.
mlx-bun get mlx-community/gemma-4-12B-it-OptiQ-4bitDownloads resume across interruption, every blob is sha-verified, and the layout
matches huggingface_hub exactly — an existing HF cache is picked up as-is.
scan — index your cache
Section titled “scan — index your cache”Index the models in your HF cache into the registry so ls, serve, and fit
can find them by substring.
mlx-bun scanls — list models
Section titled “ls — list models”mlx-bun ls # size, params, quant, capabilitiesmlx-bun ls --vision --max-size 10GB # filterfit — memory contract
Section titled “fit — memory contract”Deterministic memory assessment: does it fit, what’s the max context, predicted tok/s.
mlx-bun fit gemma --ctx 32768 # for this machinemlx-bun fit gemma --ctx 8192 --skus # across the Apple Silicon lineupSee Choosing a model for how it computes.
evals — recorded benchmark runs
Section titled “evals — recorded benchmark runs”mlx-bun evalsharness pi — connect pi
Section titled “harness pi — connect pi”Point your own pi install at the local server.
mlx-bun harness piBenchmarks
Section titled “Benchmarks”The head-to-head matrix against mlx-lm/optiq is a script (reboot first for clean
numbers; it’s preflight-gated and resumable, writing benchmarks-h2h-<date>.md):
./benchmark.sh