1 episodes taggedApproximate match across all podcasts
Home/Tags/DEPLOY VOICE AGENTS

DEPLOY VOICE AGENTS

All podcast episode summaries matching DEPLOY VOICE AGENTS β€” aggregated across every podcast we track.

1 episodes Β· Page 1/1

Quotes & Clips tagged DEPLOY VOICE AGENTS

9 on this page

Discrete tokens break the geometry diffusion relies on

β€œBut if you think about text and you take two words, then it's not clear what's in between the meaning of two different words. Right? And so there is no real geometry to the space of possible tokens or possible words. And so that makes the idea of denoising much more challenging because there is it's not clear what it means to perturb, add noise to to text.”

β€” Stefano Ermon - Stanford professor, Inception Labs CEO

Causal attention masks block reuse of pretrained autoregressive weights

β€œThe the real challenge is that you're like the the attention mask that you use in a traditional autoregressive model is causal. So the model only knows how to use context to the left as it figures out how to what to do next. And in a diffusion language model, you really wanna be able to have access to the context to the left and to the right as you decide what to change. It's like one of the key properties that make these models potentially much higher quality than compared to autoregressive models.”

β€” Stefano Ermon - Stanford professor, Inception Labs CEO

Big labs face high switching costs to adopt diffusion

β€œMy sense is that, you know, there is a big switching cost. Like, they're very, very focused on Gemini on the on their main model. And so, you know, it could be a that that's kind of, like, the issue with these big labs is that, you know, they're only in one direction, and then it's hard for them to really focus on on an alternative direction. As a start up, we're in much better positions to do that because we, you know, we're laser focused on one thing, and we can really deliver and and build everything that's needed to get that, technology to succeed.”

β€” Stefano Ermon - Stanford professor, Inception Labs CEO

Diffusion models enable controllable generation through external constraints

β€œA diffusion model, at least for images, diffusion models are are known to be, much more suitable for controllable generation. And the reason is that because the object, let's say the image that you're generating is, sort of, like, available to the model from the very beginning, it's very easy for the model to check whether or not this object that it's generating is consistent with, say, some constraints or some kind of, some kind of, like, control signal that you wanna use to to make sure that the output is consistent with whatever you want the model to generate. So I was on some papers where we're doing medical imaging, and and the idea is that, you know, when you do a CT scan, you're basically taking some projections of your body cross section, and then, you know, you're trying to reconstruct what your body looks like from some measurements that you get from the machine.”

β€” Stefano Ermon - Stanford professor, Inception Labs CEO

Masking tokens replaces noise in diffusion text models

β€œOne that works pretty well is basically one where you, mask out tokens. So you you kind of like, hide them. You you take a sentence and then you remove some of the tokens. You hide them from the neural network, and then you ask the neural network, can you predict what those tokens were? And so it's similar in some sense to next token prediction, except that things were done out of order, and the network needs to be able to use information from you needs to use context to the left and to the right and combine it in some interesting ways to figure out how to predict all these missing tokens from the from the sentence.”

β€” Stefano Ermon - Stanford professor, Inception Labs CEO

Diffusion LLMs scale better than autoregressive models at inference

β€œIf you need to scale up these models and they are actually getting into production, the price per token or the what's needed per token becomes the key metrics that you care about. And so what we're seeing with the fusion language models is that they scale better than autoregressive models at inference time. They're cheaper to serve. They're faster. You get more tokens per GPU, which means that the price is actually lower.”

β€” Stefano Ermon - Stanford professor, Inception Labs CEO

Mercury 2 matches frontier speed-tier quality 5-10x faster

β€œThe latest model that we announced this week, Mercury two, is actually matching in quality, some of the best speed optimized models from Frontier Labs. So we'll think about the Haiku models, the flash models, mini models from OpenAI. So it's the at that quality level. But, again, it's about five to 10 x faster in terms of, like, the time it takes you to get an answer, using a diffusion model versus an autoregressive model.”

β€” Stefano Ermon - Stanford professor, Inception Labs CEO

Existing serving engines cannot run diffusion language models

β€œI think one of the reasons why, there are still no other providers that are able to serve diffusion language models, in production today, you cannot run a diffusion language model on existing serving engines. So if you think about BLLM, SG Lang, TensorRT, these frameworks that exist and and not even open source, and and they are really, really good at serving, other aggressive LLMs very efficiently. The space for diffusion language models is much, much, less developed, so we had to build our own serving engine.”

β€” Stefano Ermon - Stanford professor, Inception Labs CEO

Voice agents and fast agentic loops are killer use cases

β€œWe're already seeing, a lot of usage. I mean, you nailed the two main ones that we're seeing, voice, a lot of voice, customer support, the educational kinda like agents. People love the speed of the of diffusion language models. They always have this issue that they would wanna be able to use a thinking model, like a reasoning model, but usually, the latency is just not enough. And so maybe they use unless they use specialized AI inference chips, but that's too expensive and they cannot scale to large volumes. So we had a bunch of, customers that are building voice agents on top of the fusion language models.”

β€” Stefano Ermon - Stanford professor, Inception Labs CEO

More clips tagged DEPLOY VOICE AGENTS?

Get a daily email of the best quotes & audio clips from the top podcasts.

Subscribe for daily Quicklets