Mr. Chatterbox is a novel language model trained exclusively on 28,000 Victorian-era British texts from 1837-1899, making it a unique ethically-sourced AI with no modern data. The 340M parameter model produces charmingly antiquated but often incoherent responses, demonstrating both the possibilities and limitations of copyright-free training. While technically limited, it represents an important experiment in creating AI models without modern web-scraped data.
Background
Most modern language models are trained on vast amounts of web-scraped data with questionable copyright status, creating ethical and legal concerns. Researchers have been exploring alternative training approaches using public domain materials to create more transparent AI systems.
- Source
- Simon Willison
- Published
- Mar 30, 2026 at 10:28 PM
- Score
- 6.0 / 10