Transfusion combines the language modeling loss function (next token prediction) with diffusion to train a single transformer over mixed modality sequences
56. Transfusion: Predict the Next Token and…
Transfusion combines the language modeling loss function (next token prediction) with diffusion to train a single transformer over mixed modality sequences