Microsoft has introduced two new text LLMs: MAI-Thinking-1 (35B parameters) for reasoning tasks, available to select partners, and MAI-Code-1-Flash (5B parameters) optimized for GitHub Copilot and VS Code. Notably, both models were trained on clean, commercially licensed data without third-party distillation, potentially setting a new standard for data licensing in AI development. The smaller parameter counts suggest a focus on efficiency and cost reduction, with MAI-Thinking-1 reportedly outperforming Sonnet 4.6 in human evaluations despite its relatively compact size.
Background
Large language models have been facing increasing scrutiny over training data licensing and copyright issues, with many models trained on web-scraped data of questionable legality. Microsoft's emphasis on 'clean and appropriately licensed data' represents a significant shift in addressing these concerns.
- Source
- Simon Willison
- Published
- Jun 3, 2026 at 06:21 AM
- Score
- 7.0 / 10