
Ollama, a runtime for running large language models on a local machine, has added compatibility with Apple’s open-source MLX machine-learning framework. Ollama also reports better caching performance and now supports Nvidia’s NVFP4 model-compression format, which can considerably reduce memory use for certain models.
Taken together, these changes should deliver noticeably improved performance on Macs with Apple Silicon (M1 or later)—and the timing is apt, since local models are starting to gain momentum beyond the research and hobbyist spheres.
OpenClaw’s recent breakout—which surged past 300,000 stars on GitHub, grabbed attention with experiments like Moltbook, and became a particular phenomenon in China in particular—has pushed many people to try running models locally.
As developers grow frustrated with rate limits and the high prices of premium plans for services such as Claude Code or ChatGPT Codex, experimentation with local coding models has intensified. (Ollama has also recently expanded its Visual Studio Code integration.)
The new capability is available in preview (Ollama 0.19) and currently supports only one model—the 35 billion-parameter variant of Alibaba’s Qwen3.5. The hardware requirements are steep for typical users: besides an Apple Silicon Mac, you need at least 32GB of RAM, according to Ollama’s announcement.