3 episodes taggedApproximate match across all podcasts
Home/Tags/OPTIMIZE LATENCY

OPTIMIZE LATENCY

All podcast episode summaries matching OPTIMIZE LATENCY — aggregated across every podcast we track.

3 episodes · Page 1/1

Quotes & Clips tagged OPTIMIZE LATENCY

22 on this page

Naive customers need blueprints; savvy customers need developer kits

So, I think, customers that are new on the multi agentic journey are one of our more naive users. So we have to offer them prebuilt blueprints that they take and run for their use cases. There are savvy customers who know what they want to do, and you offer them those developer kits. So we have all of these different ranges that we offer to the to our customers depending on where they come from, what their use case is, and how can they get to the fastest path to production in the least, constrained manner as possible.

Rashmi Shetty - senior director at Capital One

Content-addressed caching creates network effects in data engineering

The main savings are coming from the fact that you ran it, you get your job done, and you moved on. Then somebody else in some department you don't know existed runs the same task, but on a newer version. Right now, in most of the organizations, you can't even find out about it so that you can't even measure that you're spending that time twice. Here, if everybody's entangled, that's detected automatically and detected that the output is the same.

Mikhail Parakhin

Specialization through fine-tuning and distillation beats generic reasoning models

When you, you are the most successful when you can offer two things, reasoning and specialization. So reasoning capabilities with our agentic platforms, a agentic frameworks in the platform, we are bringing that to the fore. Specialization is something that is very, very crucial. So this can be achieved primarily using, specialized models and fine tuning. Student teachers student distillation gives you that control you need to have on, providing personalized experiences as well as providing, having some control over your latency metrics.

Rashmi Shetty - senior director at Capital One

Simulated customer data provides a massive competitive moat

Shopify has decades of history of how people made changes and what there is, the what it resulted in terms of sales. Now what we can do is we have this it's not it's a noisy data. It is a small, usually, websites. But if you aggregate and general, like, everything together and you apply, denoising and collaborating filtering like approach, you can extract a very clear signal. And then you can optimize your agents.

Mikhail Parakhin

Liquid AI architectures outperform transformers for low-latency search

Liquid neural networks are you can think of them as a next step, like, sort of, state space model square. It's non transformer architecture that's more complicated than state state space and really difficult to code if you if I'm being honest, but it's, very efficient. It's, sublin sub quadratic in in length of your context. It's very compact way to represent things.

Mikhail Parakhin

AI tool usage has reached near-universal internal adoption

This is number of daily active workers. You know, think of, DAO, basically, daily active users of AI tool as a percentage of all the people in the company. And, you could see that it approaches really a 100% by now. It's hard not to do your job now without interacting deeply at least with one tool.

Mikhail Parakhin

Reviewing code is the primary bottleneck in agentic workflows

The real problem is not in spending time waiting for PR. It's real problem is since there's so much more code, then probability of at least some tests failing going up. And then you, like, keep failing, then you have to find the offending PR, evicted, retest it without that PR. And so deployment cycle becomes much longer.

Mikhail Parakhin

Content-addressed caching creates network effects in data engineering

The main savings are coming from the fact that you ran it, you get your job done, and you moved on. Then somebody else in some department you don't know existed runs the same task, but on a newer version. Right now, in most of the organizations, you can't even find out about it so that you can't even measure that you're spending that time twice. Here, if everybody's entangled, that's detected automatically and detected that the output is the same.

Mikhail Parakhin

Auto-research loops outperform human optimization through sheer volume

If I were doing 400 experiments myself, my betting average would have been much higher, I'm sure. But, also, it first of all, it would take me, like, three years to do 400 experiments. And, I didn't have to do them. Like, the machines were just the price of electricity did that. And I got one improvement, that in my honestly, when I was starting that experiment, my thinking was to go and show that, hey, Andre. Maybe you just don't know how to optimize.

Mikhail Parakhin

Simulated customer data provides a massive competitive moat

Shopify has decades of history of how people made changes and what there is, the what it resulted in terms of sales. Now what we can do is we have this it's not it's a noisy data. It is a small, usually, websites. But if you aggregate and general, like, everything together and you apply, denoising and collaborating filtering like approach, you can extract a very clear signal. And then you can optimize your agents.

Mikhail Parakhin

Policy-bound agent operations prevent costly chatbot mistakes like rogue discounts

I'm thinking about an example that we, you know, we've all heard about. I forget the the very specific scenario, but, a business had a chatbot. I think it happened in Canada, had a chatbot on their website, and the customer asked for a discount, and the the chatbot basically gave them a discount. You know, this would clearly be disastrous in a, in a car dealer type of scenario. Like, how do you, make, you know, generative AI and agents safe, you know, for, you know, these dealers that have a lot at stake?

Sam Charrington - host of TWIML AI Podcast

Treat agentic AI as a system, not isolated models

I think the core is that, you know, you really need to treat agentic AI as a system. It's truly a system. You have to start with governed data. You have to kind of put in that risk controls baked into multiple layers of your, of your application or your system. You have to look at, latency as something that needs to be optimized end to end. And, understanding that your biggest gains do come from postproduction telemetry is also critical.

Rashmi Shetty - senior director at Capital One

Liquid AI architectures outperform transformers for low-latency search

Liquid neural networks are you can think of them as a next step, like, sort of, state space model square. It's non transformer architecture that's more complicated than state state space and really difficult to code if you if I'm being honest, but it's, very efficient. It's, sublin sub quadratic in in length of your context. It's very compact way to represent things.

Mikhail Parakhin

Chat Concierge handles car buying before customers reach the dealership

Chat Concierge for us was, our beachhead initiative around deploying multi agentic deploying a multi agentic solution. So, Chet concierge is essentially a a auto deal dealership, project or application that was deployed out to our auto dealers to basically bridge that experience between dealers and their customers and make it very seamless. And we need to understand we are moving to a world where the car buying experience doesn't start at the dealership. It starts before. It starts when they go to their website and try to figure out, okay, what's the inventory?

Rashmi Shetty - senior director at Capital One

Observability must replay agent reasoning across every tool invocation

All the more important that observability comes to the forefront in stochastic systems like a multigenic application. All the more important for us to be able to replay agentic actions and try to understand how it function. Agent behavior needs observability along many different dimensions in terms of what are the tools involved, what was the reasoning mechanism that led to that tool invocation. And, overall, what was this context that passed across systems?

Rashmi Shetty - senior director at Capital One

Latency is now a product feature, not a non-functional requirement

So what in the past, what used to be thought of as non functional requirements such as latency today is product feature. It is it is baked into the experience of a developer. So these are some things that, you know, we are seeing a paradigm shift in terms of what we need to bring to the fore to the developer experience to keep in mind when you're when you're implementing your systems.

Rashmi Shetty - senior director at Capital One

AI tool usage has reached near-universal internal adoption

This is number of daily active workers. You know, think of, DAO, basically, daily active users of AI tool as a percentage of all the people in the company. And, you could see that it approaches really a 100% by now. It's hard not to do your job now without interacting deeply at least with one tool.

Mikhail Parakhin

Reviewing code is the primary bottleneck in agentic workflows

The real problem is not in spending time waiting for PR. It's real problem is since there's so much more code, then probability of at least some tests failing going up. And then you, like, keep failing, then you have to find the offending PR, evicted, retest it without that PR. And so deployment cycle becomes much longer.

Mikhail Parakhin

Multi-agent makes sense only when goals require complex decomposition

We moved from a classic ML world to a world where we have LLMs, generating responses. And now we want to move on to a world where actions need to be taken, specific goal oriented actions need to be taken. And when the problem that we are working on is a complex one, with multifaceted, aspects associated with it, That's where multiagentic comes into place. So, basically, we have a large complex goal which we have to break down into specific steps, and each step is basically narrowed to a specific agent.

Rashmi Shetty - senior director at Capital One

Unlimited token budgets prioritize high-quality model critique loops

It's not about just consuming tokens. You can consume tokens, and and, in fact, the anti pattern is running multiple agents too many agents in parallel that don't communicate with each other. That's almost useless, compared to just, fewer agents and burns tokens very efficiently. Setting up the right critique loop, especially with the high quality models where one agent does something, the other one, ideally with a different model, critiques it, suggests ways to improve it.

Mikhail Parakhin

Auto-research loops outperform human optimization through sheer volume

If I were doing 400 experiments myself, my betting average would have been much higher, I'm sure. But, also, it first of all, it would take me, like, three years to do 400 experiments. And, I didn't have to do them. Like, the machines were just the price of electricity did that. And I got one improvement, that in my honestly, when I was starting that experiment, my thinking was to go and show that, hey, Andre. Maybe you just don't know how to optimize.

Mikhail Parakhin

Unlimited token budgets prioritize high-quality model critique loops

It's not about just consuming tokens. You can consume tokens, and and, in fact, the anti pattern is running multiple agents too many agents in parallel that don't communicate with each other. That's almost useless, compared to just, fewer agents and burns tokens very efficiently. Setting up the right critique loop, especially with the high quality models where one agent does something, the other one, ideally with a different model, critiques it, suggests ways to improve it.

Mikhail Parakhin

More clips tagged OPTIMIZE LATENCY?

Get a daily email of the best quotes & audio clips from the top podcasts.

Subscribe for daily Quicklets