Evals, Feedback Loops, and the Engineering That Makes AI Work

WATCH OPEN SOURCEWATCH CHINESE AIWATCH AI INFRAWATCH AGENTIC DESIGN

Key Takeaways

•
Model Convergence The performance gap between proprietary and open-source models is narrowing as engineering efficiencies begin to rival the advantages of raw compute scaling.
•
Chinese AI Efficiency Chinese models are demonstrating rapid advancement that outpaces their relative capital expenditure, signaling a shift toward highly optimized architectural engineering.
•
Agentic Benchmarking The Bash vs. SQL benchmark highlights that giving agents raw computer access is less effective than structured data interaction, necessitating a shift in how developers build autonomous systems.

Episode Description

Martin Casado speaks with Ankur Goyal, founder and CEO of Braintrust, about where engineering actually matters in AI and where it doesn't. They cover the open source vs closed source model cycle, why Chinese models are gaining ground faster than spending suggests, whether AI demand will eventually saturate, and the Bash vs SQL benchmark that challenges the "just give it a computer" approach to agents.

Evals, Feedback Loops, and the Engineering That Makes AI Work

Key Takeaways

Episode Description

Featured in Category Feeds

More from AI + a16z

Stay in the Loop