Transfusion combines the language modeling loss function (next token prediction) with diffusion to train a single transformer over mixed modality sequences
Share this post
56. Transfusion: Predict the Next Token and…
Share this post
Transfusion combines the language modeling loss function (next token prediction) with diffusion to train a single transformer over mixed modality sequences