FEB 17, 2026

Evals, Feedback Loops, and the Engineering That Makes AI Work

WATCH OPEN SOURCEWATCH CHINESE AIWATCH AI INFRAWATCH AGENTIC DESIGN

Key Takeaways

  • Model Convergence The performance gap between proprietary and open-source models is narrowing as engineering efficiencies begin to rival the advantages of raw compute scaling.

  • Chinese AI Efficiency Chinese models are demonstrating rapid advancement that outpaces their relative capital expenditure, signaling a shift toward highly optimized architectural engineering.

  • Agentic Benchmarking The Bash vs. SQL benchmark highlights that giving agents raw computer access is less effective than structured data interaction, necessitating a shift in how developers build autonomous systems.

Episode Description

Martin Casado speaks with Ankur Goyal, founder and CEO of Braintrust, about where engineering actually matters in AI and where it doesn't. They cover the open source vs closed source model cycle, why Chinese models are gaining ground faster than spending suggests, whether AI demand will eventually saturate, and the Bash vs SQL benchmark that challenges the "just give it a computer" approach to agents.

Featured in Category Feeds

Stay in the Loop

Get AI + a16z summaries and more, delivered free.