Diffusion LLMs scale better than autoregressive models at inference
βIf you need to scale up these models and they are actually getting into production, the price per token or the what's needed per token becomes the key metrics that you care about. And so what we're seeing with the fusion language models is that they scale better than autoregressive models at inference time. They're cheaper to serve. They're faster. You get more tokens per GPU, which means that the price is actually lower.β
