TAO OF GPT-2345M

early machine take on the tao te ching

what

renditions of the tao te ching by GPT-2: 8,100 raw samples generated in july 2019 with a 345M-parameter model fine-tuned on translations of the tao te ching, following the process documented by gwern in gpt-2 neural poetry.

this is not a translation. it is what a small, pre-instruction, pre-RLHF language model believed the tao te ching to be.

each of the 81 chapters draws a fresh reading every time you load it. the book is different every time you open it. if a passage finds you, its badge is a permanent link.

how

100 samples were generated for each of the 81 chapters, conditioned on that chapter's text. split at the model's own <|endoftext|> boundaries this yielded ~28,000 distinct passages, triaged into strata:

  • 13,097 clear readings: the main pool
  • 1,030 aphorisms: the short ones that escape whole
  • 3,942 deeper strata: abstract, fragmentary, half-dissolved
  • 572 noise-floor specimens: degenerate loops, kept as evidence

passages were lightly machine-cleaned (spacing, deduplication) but not edited: the words are the model's own, july 2019 vintage. truncated and headless fragments are kept where they still carry something.

the originals shown for comparison are james legge's 1891 public-domain translation.