
Cursor's Third Era: Cloud Agents
Quotes & Clips
10 clipsCloud agents now test their own code before delivering PRs
“The big new thing here is that the agent will test its changes. So you can see here it worked for half an hour. That is because it not only took time to write the tokens of code, it also took time to test them end to end. So it started dev servers, iterate when needed. One of the other intuition pumps we use there is if a human gave you PR, asked you to review it, and you hadn't they hadn't tested it, you'd also be annoyed because you'd be like, only ask me for a review once it's actually ready.”
Video demos replace diff reviews as the new entry point
“Pillar two is the model coming back with a video of what it did. We have found that in this new world where agents can end to end write much more code, reviewing the code is one of these new bottlenecks that crop up. And so reviewing a video is not a substitute for reviewing code, but it is an entry point that is much, much easier to start with than glancing at some giant diff.”
Slash repro turns hard-to-reproduce bugs into 90-second merges
“Here's another example that we found really cool, which is we've actually turned since into a slash command as well, slash repro, where for bugs in particular, the model having full access to the to its own VM, it can first reproduce the bug, Make a video of the bug reproducing, fix the bug, make a video of the bug being fixed. And that has been the single category that has gone from, like, these types of bugs, really hard to reproduce and takes you tons of time locally, to when this happens, you'll merge it in ninety seconds or something like that.”
Slack is becoming the new IDE for collaborative agent work
“One other shift that I've noticed as our cloud agents have really taken off internally has been a shift from primarily individually driven development to almost this collaborative nature of development. For us, Slack is actually almost like a development an IDE, basically. We will have these issue channels or just, like, this product discussion channels where people are always at cursoring, and that kicks off a cloud agent. Oftentimes, I will kick off an investigation, and then sometimes I even ask it to get blamed and then tag people who should be brought in because it can tag people in Slack.”
Best-of-N runs reveal synergy between competing model providers
“And there was an interesting learning that's relevant for these different model providers. It was something that would run a bunch of best of ends, but then synthesize and basically run like a synthesizer layer of models. And what we found was that at the time at least, there were strengths to using models from different model providers as the base level of this process. Like, basically, you could get almost like a synergistic output that was better than having a very unified, like, bottom model tier.”
Grind mode forces alignment before agents work for days
“Internally, we call it grind mode. There's a specific you have to start out by aligning, and there's, like, a planning stage where it will work with you, and it will not get, like, start grind execution mode until it's decided that the plan is amenable to both of you. We found that it's really important where people would give, like, very underspecified prompt and then expect it to come back with magic. If it's gonna go off and work for three minutes, that's one thing. When it's gonna go off and work for three days, probably should spend, like, a few hours upfront making sure that you have communicated what you actually want.”
Throughput, not speed, is the next big coding unlock
“We think that over the coming months, the big unlock is not going to be one person with a model getting more done, like the water flowing faster, and we'll be making the pipe much wider. And so paralyzing more, whether that's swarms of agents or parallel agents, both of those are things that contribute to getting much more done in the same amount of time, but any one of those tasks doesn't necessarily need to get done that quickly.”
Individual developer spend will reach thousands per month
“Phase one, tab autocomplete. People paid, like, $20 bucks a month, and that was great. Phase two, where you were iterating with these local models today, people pay, like, hundreds of dollars a month. I think as we think about these highly parallel kind of agents running off for long times in their own VM system, we are already at that point where people will be spending thousands of dollars a month per human, and I think potentially tens of thousands beyond.”
Cloud agents will surpass 2x local agent volume by year-end
“I have a prediction for you. I predict that by the end of the year, I think it will take longer than people think and longer than we think for cloud and agents working in their own boxes to surpass local agents, but I think that crossover will happen before the end of the year. And probably by the end of the year, agents running in the cloud will be a multi like, more than two x the volume of local agents.”
Self-aware agents that know their own limits feel smarter
“Self awareness broadly has been a really big thing. The agent should understand how its environment works. It should understand how secrets work. Like, it needs to be self aware about its own harness and its environment. This is, like, one of the first things I learned at Dot when we launched was that I we had made the product work very well at a certain number of things, but didn't have complete self awareness of, like, its own boundaries. So people would be like, hey. Can you do this thing? And the thing was there and could be done. And the the product would be like, oh, no. Users will often attribute increased intelligence to a system that is more highly self aware.”
Want to hear more clips?
Get a daily email of the best quotes & audio clips from the top podcasts.