Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer

Simon WillisonMar 30, 2026 at 10:28 PM6.0/10

Mr. Chatterbox is a novel language model trained exclusively on 28,000 Victorian-era British texts from 1837-1899, making it a unique ethically-sourced AI with no modern data. The 340M parameter model produces charmingly antiquated but often incoherent responses, demonstrating both the possibilities and limitations of copyright-free training. While technically limited, it represents an important experiment in creating AI models without modern web-scraped data.

Background

Most modern language models are trained on vast amounts of web-scraped data with questionable copyright status, creating ethical and legal concerns. Researchers have been exploring alternative training approaches using public domain materials to create more transparent AI systems.

Source: Simon Willison
Published: Mar 30, 2026 at 10:28 PM
Score: 6.0 / 10

Read Original →