Researchers including Alec Radford (GPT co-creator) have released talkie, a 13B parameter language model trained exclusively on pre-1931 English text. The model offers both base and instruction-tuned versions under Apache 2.0 license, enabling research on historical prediction capabilities and copyright-free AI. This represents a significant advancement in 'vegan models' trained entirely on out-of-copyright data.
Background
Large language models typically train on modern web data, raising copyright concerns. Models trained exclusively on public domain content offer an alternative approach while enabling unique research into historical knowledge representation.
- Source
- Simon Willison
- Published
- Apr 28, 2026 at 10:47 AM
- Score
- 7.0 / 10