AI models today are frozen like the amnesiac in Memento
“The main protagonist, Leonor Shelby, has a form of this amnesia where he cannot form new memories. So he goes about his life, with kind of, like, this cut off date after which point he has kind of these long term memories, but really cannot retain anything new that he experiences. And so what he does is he uses the sticky notes where he writes some of the notes to himself. He pulls out his Polaroid camera to capture the moments as he goes on about his life. And, I mean, he even tattoos some of the, memories that he wants to imprint in his memory.”
“Any honest argument about continual learning pretty much has to start with in context learning because it genuinely works. We see that with examples like Karpathy's auto research project. Kind of like the other examples we give in the article is OpenCloud. Like, the underlying model was available to anyone, but, what's really made it a special magical, moment is, this kind of like orchestration of the context.”
Adversarial security requires updating weights, not prompts
“The first one is essentially in adversarial security. Like, imagine there is a new jailbreak attack. You have your model deployed in the wild, and it's being used. Imagine you try to update your system prompt to say, like, don't do this. Like, it's not going to work. Right? Because all of the, like, parameters in the model have learned to be helpful to the users. So you really have to encompass that kind of knowledge in the weights where the attackers don't have access to.”
Library version changes expose limits of context-only learning
“Imagine your favorite JavaScript library, like, let's say, React. Right, you you learn through all of your pretraining data that there is a function called x. But at some point, a new version of React comes out and turns out that it's a breaking change, and all of a sudden, x function doesn't exist. It's now a y function. No matter how much you say it in the context, you cannot just override what's the most intuitive throughout all of the model parameters to basically say x.”
Learning happens across context, modules, and weights
“We make this, like, very high level framework in terms of just the three buckets of the context, the modules, and also the weights. And the distinction that like, the one callout that I think is important is all of these are learning mechanisms. And even in context learning, it's still a form of continual learning. But context is essentially what we call nonparametric learning, where we don't actually update the weights.”
The ultimate test is models that learn on the job like humans
“We humans are not AGI, but we still learn on the job. We learn from experience, and that's what makes kind of humans kind of unique. And so that's kind of like the ultimate test. Like, how do we define that that we got to continue learning? It's like, well, is there a system that is able to learn on the job and get better through use just like humans?”
Out-of-distribution learning post-deployment is the key milestone
“The test that some people use currently is pretty simple. You basically you train a model that is learned on x y z data. And once you deploy, you just want to check whether it learns something out of distribution, something that it hasn't seen before. And we are starting to see some examples like the test time training done by Yusan, with the discover paper that kind of makes some of the novel inventions.”
Claude Design wins at marketing landing pages, slides, and creative redesigns
“So those are my three plot design use cases. Import your design system and create a new, more marketing style landing page. Use content plus your design system to create really, really beautiful slides, and then go ham on an ugly redesign of your website, or ugly or beautiful, depending on what you think.”
Figma still wins on speed of iteration without LLM calls
“And I have to say this is where Figma really wins, just your ability to drag things, change things, change the font without having to wait for an LLM call, without having to top up your credits. Immediately, there does not need to be a model in the loop. I think we underestimate how nice that is from a speed of iteration perspective when you're building and designing things.”
Hitting Claude limits forced a $200 top-up mid-demo
“Okay. BRB. The number one problem we all have with anthropic products is there's too many limits. I immediately hit my limit. I've only done about two or three things in claud design, and I am blocked until Saturday. And it is Tuesday. So I went and paid my Claude chits to the anthropic gods. Let's see if it'll let me try again. Yes. $200 later. We are back.”
“And then here's another really cool function of claw design. It gives you variations, which as somebody who has spent so much time in AD testing in her career, I think is so fun. And I think this is really smart from a design tool perspective because often, those cycles of no, make it better or or make it different can be very slow, and most people coming to claw design probably don't have the ability to articulate exactly the changes they make. So it's really smart to put three separate ideas in front of the consumers of claw design and let you pick what you like.”
Removing the design system unlocks Claude's wildest creative work
“And then my final very important claw design use case, gets very fun to imagine crazy versions of your web experience or website. When you give Claude Design no design system, it does the best. So I asked Claude to make a nineties Geocities style version of the Lenny's newsletter homepage. It did that. It is called Lenny's product ZO loan, and it is pretty incredible. And even the tweaks component that we had that was so nice in the polished landing page version is now real ugly. I'm going with bricks. I'm going with Comic Sans, of course, and I want all the things.”
GPT Images 2.0 finally nails typography and layout
“We looked at the new GPT model, GPT image two. Two things I think it's really, really good at, layout and typography. It has really nailed those things and has also just up leveled the quality of design because this is the first image model that really does some thinking.”
“So I loaded up some images from Midjourney here that I think we use more in our brand kit, and I say, that's not really us. Here are some reference images. Update the brand kit. Oh my gosh. I actually really love it. It makes me so happy. As you can see, it took in this, like, sort of pixelated landscape y images that we've been using for chat PRD. It is certainly more pink, and it has given me kind of a new idea of what we could do with this brand kit.”
A photo-based color analysis could replace a paid stylist
“And then finally, one last fun use case that just finished finished over here, which is taking my image and doing my color analysis for for the boys in the room. This is something that the ladies often do to figure out what colors look best on what they're wearing. So if you're looking for a fun Mother's Day gift out there, take a photo of your lady, one where she's looking good, and throw it in here and ask for her color analysis and then buy that person some jewelry.”
Agents are the next generation of power users for AI platforms
“You are an example of a company and a product that's going to get an inflection point because agents are going to become your users. Because agents don't get in their mind about being funny or not funny. They don't overthink. They just go straight to the tokens and YOLO something out. I think working with an agent as a user, especially in marketing, just reduces the friction across so many things. It helps you climb Cringe Mountain in a way that's very hard to do as a human.”
Memelord launched as a simple newsletter using Google Slides
“I started Memelord and we could get into this. I started just as a newsletter for $6.90 per month, sending you the newest memes and then I sent you to a Google Slides deck because I didn't know how to code. That's really the evolution of Memelord was from that. It's the same thesis of you just want to be on the current trends and remix them for your brand. The future, know UX is the best UX. Good news for Sam over here is he could use it from any agent now.”
Marketers should be empowered to code their own creative tools
“Let your marketers cook. You have no idea what they're capable of. Either let them cook and let them market their stuff or watch them leave your company. Obviously, I'm biased here, but the last company I was at, they didn't let me cook, and that's why I quit. And then I raised money and built my own company. And you're going to see a lot of that. And I think a lot of marketers and non-technical people are in a revenge mode right now, and they want to cook. So either let them cook and let them market their stuff, or watch them leave your company.”
Raspberry Pi hardware can solve the problem of late-night idea capture
“So I built that using Chat GPT and a Raspberry Pi. The keyboard is in the other room, otherwise I'd get it, but we got the Raspberry Pi here, the whole hookup. I've never built hardware before in my life. I've always wanted to, but besides the robotics kit when I was a kid, it just is a mini keyboard for $10. When I press Enter, it's essentially a keylogger. So I can lie in bed, write down an idea, press Enter. It sends an API request to Zapier, because again, I don't know how to code.”
Free tools outperform PDF downloads as effective marketing lead magnets
“I would recommend any startup, like there's no excuse anymore. Why do you have a PDF download? Build a tool. It takes actually less time to build a tool nowadays and nothing wrong with PDF downloads. Obviously, we do that occasionally, but it's very easy to just build a tool now and think about what weird tools they are, and then put them at the bottom of your site where people can try out different tools and weird galleries and even games. We've started screwing around making minigames. These are now just as easy to do as write an e-book, which is, if you're trying to collect more leads or e-mails for your newsletter, your business, etc., there's nothing better than building a mini tool that solves the first problem that gets people into the bigger problem that your actual company solves.”
AI models often perform better when pushed with aggressive prompting
“I'm mean, not going to lie. I'm like, AI is my slave. Like, not fronting here is like being mean to your AI. I don't know why people say thank you, it's a robot. And it performs better under pressure unlike men. But this is what I mean of the random. But like, yeah, I would say like, kind of like push your AI to like be more unhinged. Like, it's okay to curse. Like give it like, like AI is kind of like, you know, somebody on their first day of the job where they're like, they don't really know you.”
“Now I'm glad it's more efficient because it is expensive. GPT 5.5 is $5 per million input tokens and $30 for output tokens. And GPT 5.5 pro, which has powered all this work that I've been doing, is 30 for a million input tokens and a $180 for output tokens. So this is a pricey one, but when I reflect on what I was able to achieve with this model in early testing, I'm gonna I'm gonna pay I'm gonna pay the intelligence tax because I think what I was able to achieve is really important.”
Model thinking time exceeds seventeen minutes for apps
“First out the gate, it's a thinker. So you can see here it thought for seventeen minutes twenty seven seconds about this. You were gonna have this experience with this model. This is gonna be a theme of this mini episode, this thing will think. And it planned a app for advanced subtraction, built the code, all this kind of stuff. Now here's my question. Do we need seventeen minutes of hyperintelligence thinking to build this app? Probably not.”
Security scans and threat assessments improve code quality
“The first thing that I did, which I'm not gonna show you for what will become very obvious reasons, is we used OpenAI's codec security product to run a threat assessment and security scan on the chat PRD code base. And it was pretty good. We're we're pretty secure. But it did come up with some low priority or low severity issues that we needed to remediate. And instead of taking those one by one, what I did is I downloaded the CSV of those issues, upload it to codex, and just said, can you please architecturally review these issues, group them if they're thematic, and then propose a change and then make those changes.”
Six hour autonomous runs solve complex data migrations
“This thing worked for six hours. It was actually five hours and, like, fifty seven minutes. Truly, it just banged its head against the wall for six hours. And I did not have to I zero prompts, zero follow ups, zero steering. I think I had to approve one, script call or something for it to have access to run-in its sandbox. But, otherwise, it just went for six hours. I have not seen personally, everybody says, oh, I'm getting my agent to run overnight. I have not seen it until GPT five point five in a very constrained use case.”
Bluetooth packet sniffing enabled hacking a proprietary speaker
“So what I did is I spent truly hours downloading a Bluetooth profiling profile on my phone for developer debugging. I then hooked it up to sorry, I'm crazy, hooked it up to a packet sniffer so that when I was using the app here on my phone and it sent an image to this computer, it would log and sniff the packets and tell me what Bluetooth was sending to this this little guy. I threw these logs and kind of all the information that I had at five point five, and let me show you what happened.”
Personality commands fix the default baked potato tone
“The only thing I will leave you with it it is is that it has the, as I call it, baked potato personality that we've all come to know and love from codex. It is a dull, dull, dullard. But I learned over the testing of this, if you do slash personality in codex, you're able to change that to something a little friendlier. And while some of my fellow early testers said it had too much of a Gen z personality, I said, I like to stay young. Give me that Gen z, GPT 5.5.”
The managed service provider market is lagging a decade behind modern tech
“The managed service provider market is essentially a hundred billion dollar industry that is stuck a decade behind modern technology. When you look at how most of these MSPs operate today, they are still using legacy ticketing systems and manual processes that haven't changed since the early 2010s. We saw an opportunity to come in and rebuild that stack from the ground up using AI to handle the rote tasks that consume so much human time.”
Pure play software struggles to capture value in services categories
“Pure play software often struggles in these services categories because the friction of adoption is just too high for the end user. You can't just hand a traditional IT shop a new AI tool and expect them to re-engineer their entire workflow overnight. By being the service provider ourselves, we can bake the software directly into the delivery model and prove the value through better margins and faster response times.”
Vertical integration is the primary advantage for modern service firms
“Vertical integration is really the secret sauce when you are trying to disrupt a legacy services industry. By owning the full stack—from the software that routes the tickets to the technicians who actually fix the servers—you eliminate the finger-pointing that happens between vendors. This allows us to capture the full economic benefit of the automation we build, rather than just selling a license and hoping someone uses it correctly.”
Forward deployed engineers signal how companies will adopt AI tools
“The rise of the forward deployed engineers tells us a lot about how AI adoption is actually going to happen in the enterprise. It’s not just about shipping code; it’s about having engineers who sit with the customer to understand the specific edge cases of their business. We are applying that same philosophy to IT services, where the AI handles the standard stuff and our best people focus on the complex, bespoke problems.”
OpenAI ends Microsoft's exclusive model licensing agreement
“OpenAI and Microsoft announced a fundamental rewrite of the partnership that has powered most of the consumer AI revolution. Microsoft loses its exclusive license to OpenAI's models, and OpenAI is now free to sell its products on AWS, Google Cloud, and basically anywhere it wants. So Microsoft was kind of holding this back for quite a while here, and I think OpenAI hated this because you saw Anthropic get massive adoption with enterprise.”
Robots use world models to simulate physical consequences
“Cortex two is going to think first before it sees and and makes an action. It runs possible actions through a learning model of physics and object behavior. So, you know, it's like looking at a stack of books, and it's like, if I move forward too fast and bump that over, this is what's gonna happen. If I move my arm to grab this book, this is what's gonna happen. So it's like predicting what its actions will, you know, what what course what things will happen.”
David Silver raises record seed for non-language AI
“Silver thinks that the most important part of intelligence is what he's calling ineffable, meaning you you literally can't capture them in language. So LLMs, which are kind of next token prediction models. Right? These are instead of, you know, human text or something. He's saying that those are gonna hit a ceiling, and his bet is a different path entirely. He's betting on massive scale reinforcement learning agents that learn from their own experience, world models, and what he and Rich Sutton wrote up last year as the, quote, era of experience.”
Musk versus Altman lawsuit proceeds to federal trial
“Judge Gonzales Rogers could effectively pause OpenAI's for profit conversion or rewrite the rules for how nonprofit AI labs become commercial entities, which I think, like, there's a lot of cascading impacts that could happen onto anthropic and I think onto the next kind of gen of these AI models. Over on x, Kara Swisher called it, quote, the case that determines whether AI's foundational nonprofit promise means anything.”
Microsoft loses exclusive moat for Azure cloud growth
“The exclusive sale channels, was, you know, Microsoft's actual moat. Equity in OpenAI is awesome, but it doesn't really protect Azure's enterprise pipeline. And the entire reason that Azure outgrew AWS over the last two years was the OpenAI lock in. So many companies were forced to move over to Azure if they wanted to get the latest and greatest from OpenAI because that was the only place it was going. AWS now gets to compete on price and surface area with the exact same model.”
“And second, I think, this is kind of the formal end of the standoff that leaked in OpenAI internal memo on April 13, where OpenAI's revenue chief, Dennis Dresser, basically told all of the staff that the Microsoft partnership had, quote, limited our ability to reach enterprise customers. And I think that demand for the Amazon Bedrock offering was, quote, frankly, staggering. So today's announcement is essentially OpenAI legalizing what they were already doing.”
“The model itself, basically OpenAI's positioning on this is that it is quote unquote a fully agentic model, meaning that it's designed to complete all these multi-step computer tasks with minimal human direction. They specifically highlighted five different categories, analyzing data, writing and debugging code, operating software directly, researching online, creating documents and spreadsheets autonomously.”
SpaceX is manufacturing custom GPUs for internal workloads
“SpaceX is telling its investors it wants to start building its own GPUs. This is a massive deal, because I think this is really just showing us how tight the compute market has gotten. You have like a rocket company that is looking at building its own silicone. I think this is what is, what this is telling us right now is that we've basically reached the point where the biggest tech companies in the world no longer trust the GPU supply chain enough to just be customers.”
Anthropic Mythos leads the new AI cybersecurity race
“Microsoft is actually integrating that, but it's just a preview. It's directly inside of their secure coding framework, and the idea is that Clawed gets used for threat detection, vulnerability scanning, and incident response inside of Microsoft's DevTooling. And at the same time, OpenAI has been briefing US federal agencies, state governments, and also the Five Eyes intelligence partners on a version of their model called GPT 5.5 Cyber.”
Google Cloud launches a dedicated enterprise agent platform
“Google just had their big cloud event in Las Vegas this week. They rolled out the Gemini Enterprise Agent Platform. I think this is basically their attempt to take a swing at OpenAI and Anthropic in the enterprise for the AI agent race. I think first, access to over 200 models through Model Garden is something that they're going to be rolling out. So enterprises aren't going to be locked into just Gemini.”
Tech giants leverage AI to restructure white-collar workforces
“Meta has just sent a memo to all of their employees, announcing that they're cutting about 10% of their workforce. This represents about 8,000 people. Zuckerberg basically told everyone this at the start of the year, when he said 2026 would be quote, the year AI starts to dramatically change the way we work. I think this is a story about how AI is starting to really reshape payroll at the biggest companies on the planet.”
Anthropic valuation surpasses OpenAI on the secondary market
“On the secondaries, OpenAI's valuation, what people are buying the shares for right now, it's kind of interesting because it's like an indicator of kind of where the stock will go sometimes. What OpenAI's shares are selling for right now is $850 billion. Anthropic's shares, $1 trillion. So Anthropic is being priced and valued at higher possible, you know, possible part of that is maybe not even where they are today, but where people can see them going in the future.”
GPT 5.5 pricing introduces a high-tier pro subscription
“The pricing, I think, is also where this gets a lot, gets very interesting. The standard GPT-55 is $5 per million input tokens and $30 per million output tokens. That doubles GPT-54 on paper, but OpenAI is claiming that the model uses tokens more efficiently, so the real world cost per task should be roughly flat or better. GPT 5.5 Pro is $30 in and $180 out, which is very expensive.”
“Tim Cook is going to step into a new role as executive chairman. He's not leaving entirely. But John Ternus, Apple's senior vice president of hardware engineering and a longtime Apple guy, will become the next CEO.”
John Ternus will lead Apple as hardware-focused CEO
“I would not be surprised if under Ternus, they just lean into being a hardware company and maybe scale back on some of these other bets, these software projects, Apple TV, the sort of last year but less profitable parts of their business, I would not be surprised if they really double down on being the hardware company and continuing to make the best hardware that all, you know, all the other software can run on.”
“At the same time, like, every day now, I use AI apps that just do things for me on my phone that seem clearly like things Siri should be able to do. Right? Because Siri is integrated at that operating system level. It already has the access that it needs, and I wind up having to do all these workarounds just to do these things that are now possible through the state of the art. So there is a huge missed opportunity there.”
Universal basic income is seeing a major resurgence
“I just noticed that various players in the AI space, some of whom are, opposed to each other in various ways, seem to all be coming around to UBI at the same time. So Elon Musk did a post about this on x saying he endorsed some form of UBI. He called it universal high income.”
“We need to tax AI and then start distributing the gains as quickly and broadly to the American people as we can. Poverty should be an artifact of the past. GDP is going to roar past a $100,000 ahead. And at that point, you should be able to put more into people's hands.”
Meta is surveilling employee keystrokes for AI training
“This tool, which is called model capability initiative, will run on work related apps and websites on US based employees' computers and will also take occasional snapshots of the content on employees' screens. This is part of a broad initiative to build AI agents that can perform work tasks autonomously, the company told staffers in internal memos seen by Reuters.”
“On Tuesday, SpaceX posted on x that it had reached an agreement with Cursor to either be able to acquire the company later this year for $60,000,000,000 or just pay it $10,000,000,000 for their work together.”
“They signed a three year lease for a store. They put a $100,000 in a bank account, and they handed a debit card to Luna, which is powered by Claude Sonnet 4.6, and just told it, hey. Turn a profit. So there are a few things that have gone awry, Kevin. One of them, they made a bunch of strange inventory choices, including ordering a thousand toilet seat covers for the employee bathroom, then listed them as merchandise.”
“The model itself, basically OpenAI's positioning on this is that it is quote unquote a fully agentic model, meaning that it's designed to complete all these multi-step computer tasks with minimal human direction. They specifically highlighted five different categories, analyzing data, writing and debugging code, operating software directly, researching online, creating documents and spreadsheets autonomously.”
SpaceX is manufacturing custom GPUs for internal workloads
“SpaceX is telling its investors it wants to start building its own GPUs. This is a massive deal, because I think this is really just showing us how tight the compute market has gotten. You have like a rocket company that is looking at building its own silicone. I think this is what is, what this is telling us right now is that we've basically reached the point where the biggest tech companies in the world no longer trust the GPU supply chain enough to just be customers.”
Anthropic Mythos leads the new AI cybersecurity race
“Microsoft is actually integrating that, but it's just a preview. It's directly inside of their secure coding framework, and the idea is that Clawed gets used for threat detection, vulnerability scanning, and incident response inside of Microsoft's DevTooling. And at the same time, OpenAI has been briefing US federal agencies, state governments, and also the Five Eyes intelligence partners on a version of their model called GPT 5.5 Cyber.”
Google Cloud launches a dedicated enterprise agent platform
“Google just had their big cloud event in Las Vegas this week. They rolled out the Gemini Enterprise Agent Platform. I think this is basically their attempt to take a swing at OpenAI and Anthropic in the enterprise for the AI agent race. I think first, access to over 200 models through Model Garden is something that they're going to be rolling out. So enterprises aren't going to be locked into just Gemini.”
Tech giants leverage AI to restructure white-collar workforces
“Meta has just sent a memo to all of their employees, announcing that they're cutting about 10% of their workforce. This represents about 8,000 people. Zuckerberg basically told everyone this at the start of the year, when he said 2026 would be quote, the year AI starts to dramatically change the way we work. I think this is a story about how AI is starting to really reshape payroll at the biggest companies on the planet.”
Anthropic valuation surpasses OpenAI on the secondary market
“On the secondaries, OpenAI's valuation, what people are buying the shares for right now, it's kind of interesting because it's like an indicator of kind of where the stock will go sometimes. What OpenAI's shares are selling for right now is $850 billion. Anthropic's shares, $1 trillion. So Anthropic is being priced and valued at higher possible, you know, possible part of that is maybe not even where they are today, but where people can see them going in the future.”
GPT 5.5 pricing introduces a high-tier pro subscription
“The pricing, I think, is also where this gets a lot, gets very interesting. The standard GPT-55 is $5 per million input tokens and $30 per million output tokens. That doubles GPT-54 on paper, but OpenAI is claiming that the model uses tokens more efficiently, so the real world cost per task should be roughly flat or better. GPT 5.5 Pro is $30 in and $180 out, which is very expensive.”
“What hasn't changed is what customers are seeking for, which is outcomes, right? Outcomes and return on their investment in order to get the things done, right? And of course, now AI is an amazing technology that again helps to get more things done in the enterprise, right? And then that is actually what SAP is standing for, right?”
AI adoption lags behind rapid technological innovation
“And we believe that still will continue, right? Because this is exactly what we're also seeing right now with, of course, there's still, of course, there's tremendous progress, but we also see that the AI adoption in the enterprise is still not where we want to see it, right? Like there's this Gardner curve, right? Where say like there's this AI innovation race, and then there's this AI outcome race, right? Then the gap almost increases, right? Versus getting narrow.”
AI re-engineers software through UI, processes, and data
“With AI, the same is happening. It is happening on three levels. It happens, of course, on the UI side. ... Then the second one is, of course, the business processes like an order to cash in the past. ... And then, of course, below that, you have the whole data layer, right? The whole data layer of bringing, of course, SAP has a lot of super valuable data for a company.”
“But SAP and these large customers, right? They always have a problem of scale. Okay, what do you know with 100 documents? Well, it becomes a little harder. A thousand documents becomes a deeper engineering challenge. ... we have 20,000 APIs, right? So it becomes just like because it's so huge, right? There's so much things. So it becomes this problem of scale, right?”
“The most important thing from a development perspective is actually people start writing their evals. That is, I was on this tour for a very long time because the problem, why does agenda coding work so well, Sarah, is of course, you can verify the outcome, right? You can either say, hey, is the program compiling, or are you unit tests, right?”
Agent mining captures tribal knowledge from decision traces
“Now we call it agent mining because we record all these decision traces, these contexts, what the users are entering into the system. And then you can either use it to say like, hey, wait a minute, this is actually an anomaly. The folks in, I don't know, in UK from our company or the folks in Australia shouldn't do this because the standard operating procedure is this. Or you say like, oh, that's actually a very good improvement.”
LLMs are insufficient for predictive tabular data analysis
“Now, the problem is, of course, still today, if we look at these predictive questions, right? ... the challenge is large language models are not made for this, right? In a way, how they generate just one token after another essentially in a sequence to sequence modeling, I mean, they're language models, right? And they do this phenomenally well. But if you still want to do these predictors where you have to go back to these classical machine learning approaches...”
“What we are focusing on is the optimization domains, obviously, and then if you go into things like logistics, traveling salesman problems, knapsack problems, like all these kind of usual hard problems in computer science, these are interesting problems where we believe that could be interesting for the future, for maybe a different kind of computing paradigm to solve for.”