
Interfacing with AI, with Linus Lee of Notion AI
Quotes & Clips
10 clipsEdit text in meaning-space, not word-space
βThe kind of interface that I'm eventually building towards is a tool that lets you edit text or work through ideas, not in the native space of words and characters and tokens, but in the space of actual meaning or features, where features can be anything from, is this a question, is this a statement, is this uncertain or certain, to topical things like, is this about computers versus plans, or to probably other kinds of features that we don't really even have words for.β
Spectrograms inspire latent-space text editing interfaces
βThe closest analogy that I have is spectrograms when people are dealing with audio. Normally, sound is like a wave in space. It's just a single kind of, I imagine, like a single string vibrating back and forth over time. If you work with audio, that's like the base thing that you work with. But if you work professionally with audio, then you actually most of the time work in a different representation space, where you don't look at vibrations over time, but you look at space of like frequencies over time, or what's called a spectrogram.β
Build your own tools to bottleneck-bust research
βThe quality of the tools and how much you can iterate on the tools, I think bottlenecks how much you can iterate on the thing that you're working on with the tools. And so it pays to be able to quickly tweak the tool or add the functionality that you need to see something new, whether that's a tool that's for evaluating models or running models or visualizing things either in the outputs or in the training like behavior. And because of that, I think I've mostly defaulted to building my own little tools whenever I needed them.β
Copy-paste freely in research code without guilt
βOne of the things that I've learned in doing more research things over building product is that in research land, I just do not feel guilty about copy-pasting code because you have no idea how the thing is going to change. And it may be that copy-pasting is just going to like save you from not having to overgeneralize anything.β
Models are lazy and only learn when forced
βModels are very lazy about what it has to learn. And it only learns the thing that you want it to learn when it's run out of options. It's exhausted all the other options that it has to try to minimize its loss. And the only remaining option is to finally learn the thing they want it to learn. In language data broadly, I think it's so difficult to get to that point. Even if you think about the math proofs that occur naturally in the internet, for example, there are a bunch of proofs on the internet that are just incorrect.β
Notion needs cheaper, faster, instruction-following models first
βThe main ones that are always top of mind are, we want models that hallucinate less, we want models that are cheaper and faster, lower latency, and we want models that follow instructions better. There's a fourth one, which is a big one, but a very hard one, which is we want models that are better at general reasoning.β
Million-token context can't replace observable retrieval pipelines
βThere's a lot of benefits of retrieving limited context rather than just putting everything in a model window. Some of them include observability. So if you give the model 10,000 inputs and it gives you the right answer, and it gives you the wrong answer, how do you debug that? Where if you have a pipeline that gives you maybe that top 10 documents and has a language model answer that, if you've got it wrong, you could ask useful questions like, did the answer exist in the documents that it saw? Was it at the beginning or the end of the context?β
Schedule weekly meetings to stare at failure cases
βEventually what we've settled on for a lot of our features is instead, we have like the engineers have scheduled time on our calendar every week, where we go into our meeting room and we just stare at a Notion database of all the bad cases, like individual outputs that were bad, that were reported by our users, and we ask ourselves for each input, what is the exact step in the pipeline where this failed? What category does this belong in? We kind of treat it like a software bug.β
Package AI to amplify agency, not replace it
βI'm generally a pretty optimistic person about technology, as long as the way we package these things is more humanist, rather than just automate all of the things. You see companies situated at different points in the spectrum between, you want models to automate things in a way that takes away agency, i.e. replacement, or you want models that amplify. I think OpenAI is very much on the replacement side. Literally, their definition of, I think, AGI is something like a thing that can take over a single full human's job, where if you look at a company like Runway, a lot of their framing of usefulness is about extending that agency of what you want to express.β
Every AI model from now on is the worst it'll ever be
βEverything monotonically improves from here, right? I think that's the scary part. Omneky has this good video on Sora where he occurs this phrase of like, this is the worst that this technology is going to be from here on out. And I think that's a really succinct way of expressing the fact that like, okay, you may maybe you think GPT-4 is like not super, super, super smart. But like this is like, if you look back at the history of smartphones, every phone when it came out is the worst that smartphones are ever going to be from that point on out.β
Want to hear more clips?
Get a daily email of the best quotes & audio clips from the top podcasts.