Running local models on Macs gets faster with Ollama's MLX support

Ars Technica

Samuel Axon

Apr 1, 2026 at 07:00 AM6.0/10

Ollama has added support for Apple's MLX framework, improved caching, and added NVFP4 compression, significantly boosting local AI model performance on Apple Silicon Macs. The update specifically benefits newer M5-series Macs with neural accelerators and currently supports Alibaba's Qwen3.5 35B model. This comes as local models gain popularity amid frustrations with cloud API limits and costs.

Background

Ollama is a popular runtime for running large language models locally, while MLX is Apple's open-source machine learning framework optimized for Apple Silicon chips.

Source: Ars Technica
Published: Apr 1, 2026 at 07:00 AM
Score: 6.0 / 10

Read Original →