Quickstart
This assumes you’ve installed mlx-bun. Commands
below use mlx-bun; from a clone, the identical command is bun src/cli.ts <verb>.
1. Just run it
Section titled “1. Just run it”mlx-bunBare mlx-bun is an alias for mlx-bun serve. On a fresh machine it pulls the
MLX runtime (if not bundled), downloads the sub-GB MiniCPM5-1B starter, serves
it, and opens the chat UI at http://localhost:8090/#/chat. You’re chatting in
under a minute.
Pass --no-open to skip launching the browser.
2. Pick a specific model
Section titled “2. Pick a specific model”Name a model and mlx-bun substring-matches it against your downloaded models:
mlx-bun serve e4b --port 8090Don’t have it yet? Grab it first (resumable, checksum-verified):
mlx-bun get mlx-community/gemma-4-e4b-it-OptiQ-4bitmlx-bun scan # index your HF cache into the registrymlx-bun ls # list models with size, params, quant, capabilitiesNot sure what fits your Mac? fit gives a deterministic answer:
mlx-bun fit gemma --ctx 8192See Choosing a model for the full list.
3. Call the API
Section titled “3. Call the API”The server is OpenAI-compatible — any OpenAI client works. The request’s
model field is ignored; the loaded model is served and echoed back.
curl http://localhost:8090/v1/chat/completions \ -H 'Content-Type: application/json' \ -d '{ "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 128, "temperature": 0.7 }'Or with the OpenAI SDK — point baseURL at the server and use any non-empty
apiKey:
import OpenAI from "openai";const client = new OpenAI({ baseURL: "http://localhost:8090/v1", apiKey: "local" });
const res = await client.chat.completions.create({ model: "local", messages: [{ role: "user", content: "Hello!" }],});console.log(res.choices[0].message.content);Where to next
Section titled “Where to next”- The HTTP API — tools, vision, prompt caching, LoRA, Anthropic and Responses protocols.
- Using the library — drive generation directly in a Bun process, no server.
- CLI reference — every verb.