Ollama has added support for Apple's MLX framework, improved caching, and added NVFP4 compression, significantly boosting local AI model performance on Apple Silicon Macs. The update specifically benefits newer M5-series Macs with neural accelerators and currently supports Alibaba's Qwen3.5 35B model. This comes as local models gain popularity amid frustrations with cloud API limits and costs.
Background
Ollama is a popular runtime for running large language models locally, while MLX is Apple's open-source machine learning framework optimized for Apple Silicon chips.
- Source
- Ars Technica
- Published
- Apr 1, 2026 at 07:00 AM
- Score
- 6.0 / 10