
Evals, Feedback Loops, and the Engineering That Makes AI Work
Key Takeaways
- •
Model Convergence The performance gap between proprietary and open-source models is narrowing as engineering efficiencies begin to rival the advantages of raw compute scaling.
- •
Chinese AI Efficiency Chinese models are demonstrating rapid advancement that outpaces their relative capital expenditure, signaling a shift toward highly optimized architectural engineering.
- •
Agentic Benchmarking The Bash vs. SQL benchmark highlights that giving agents raw computer access is less effective than structured data interaction, necessitating a shift in how developers build autonomous systems.
Episode Description
Martin Casado speaks with Ankur Goyal, founder and CEO of Braintrust, about where engineering actually matters in AI and where it doesn't. They cover the open source vs closed source model cycle, why Chinese models are gaining ground faster than spending suggests, whether AI demand will eventually saturate, and the Bash vs SQL benchmark that challenges the "just give it a computer" approach to agents.