the following is a conversation with the<br>founding members of the cursor team<br>Michael truell swall oif Arvid lunark<br>and Aman Sanger cursor is a code editor<br>based on VSS code that adds a lot of<br>powerful features for AI assisted coding<br>it has captivated the attention and<br>excitement of the programming and AI<br>communities so I thought this is an<br>excellent opportunity to dive deep into<br>the role of AI in programming this is a<br>super technical conversation that is<br>bigger than just about one code editor<br>it's about the future of programming and<br>in general the future of human AI<br>collaboration in designing and<br>Engineering complicated and Powerful<br>systems this is Le Freedman podcast to<br>support it please check out our sponsors<br>in the description and now dear friends<br>here's Michael suale Arvid and<br>Aman all right this is awesome we have<br>Michael Aman suali Arvid here from the<br>cursor team first up big ridiculous<br>question what's the point of a code<br>editor so the the code editor is largely<br>the place where you build software and<br>today or for a long time that's meant<br>the place where you text edit uh a<br>formal programming language and for<br>people who aren't programmers the way to<br>think of a code editor is like a really<br>souped up word processor for programmers<br>where the reason it's it's souped up is<br>code has a lot of structure and so the<br>the quote unquote word processor the<br>code editor can actually do a lot for<br>you that word processors you know sort<br>of in the writing space haven't been<br>able to do for for people editing text<br>there and so you know that's everything<br>from giving you visual differentiation<br>of like the actual tokens in the code to<br>so you can like scan it quickly to<br>letting you navigate around the code<br>base sort of like you're navigating<br>around the internet with like hyperlinks<br>you're going to sort of definitions of<br>things you're using to error checking um<br>to you know to catch rudimentary B<br>um and so traditionally that's what a<br>code editor has meant and I think that<br>what a code editor is is going to change<br>a lot over the next 10 years um as what<br>it means to build software maybe starts<br>to look a bit different I I think also<br>code edor should just be fun yes that is<br>very important that is very important<br>and it's actually sort of an underated<br>aspect of how we decide what to build<br>like a lot of the things that we build<br>and then we we try them out we do an<br>experiment and then we actually throw<br>them out because they're not fun and and<br>so a big part of being fun is like being<br>fast a lot of the time fast is fun yeah<br>fast<br>is uh yeah that should be a<br>t-shirt like like<br>fundamentally I think one of the things<br>that draws a lot of people to to<br>building stuff on computers is this like<br>insane integration speed where you know<br>in other disciplines you might be sort<br>of gate capped by resources or or the<br>ability even the ability you know to get<br>a large group together and coding is<br>just like amazing thing where it's you<br>and the computer and uh that alone you<br>can you can build really cool stuff<br>really quickly so for people don't know<br>cursor is this super cool new editor<br>that's a fork of vs code it would be<br>interesting to get your kind of<br>explanation of your own journey of<br>editors how did you I think all of you<br>are were big fans of vs code with<br>co-pilot how did you arrive to VSS code<br>and how did that lead to your journey<br>with cursor yeah um<br>so I think a lot of us well all of us<br>originally Vim users pure pure VI pure<br>Vim yeah no neo just pure Vim in a<br>terminal and at Le at least for myself<br>it was around the time that C- pilot<br>came out so<br>2021 that I really wanted to try it so I<br>went into vs code the only platform the<br>only code editor in which it was<br>available<br>and even though I you know really<br>enjoyed using Vim just the experience of<br>co-pilot with with vs code was more than<br>good enough to convince me to switch and<br>so that kind of was the default until we<br>started working on cursor and uh maybe<br>we should explain what copala does it's<br>like a really nice<br>autocomplete it suggests as you start<br>writing a thing it suggests one or two<br>or three lines how to complete the thing<br>and there's a fun experience in that you<br>know like when you have a close<br>friendship and your friend completes<br>your<br>sentences like when it's done well<br>there's an intimate feeling uh there's<br>probably a better word than intimate but<br>there's a there's a cool feeling of like<br>holy it gets<br>me now and then there's an unpleasant<br>feeling when it doesn't get you uh and<br>so there's that that kind of friction<br>but I would say for a lot of people the<br>feeling that it gets me over powers that<br>it doesn't and I think actually one of<br>the underrated aspects of get up copet<br>is that even when it's wrong is it's<br>like a little bit annoying but it's not<br>that bad because you just type another<br>character and then maybe then it gets<br>you or you type another character and<br>then then it gets you so even when it's<br>wrong it's not that bad yeah you you can<br>sort of iterate iterate and fix it I<br>mean the other underrated part of uh<br>calot for me sort of was just the first<br>real real AI product it's like the first<br>language model consumer product so<br>copile was kind of like the first killer<br>app for LMS yeah and like the beta was<br>out in 2021 right okay mhm uh so what's<br>the the origin story of cursor so around<br>2020 the scaling loss papers came out<br>from from open Ai and that was a moment<br>where this looked like clear predictable<br>progress for the field where even if we<br>didn't have any more ideas looked like<br>you could make these models a lot better<br>if you had more computer and more data<br>uh by the way we'll probably talk uh for<br>three to four hours on on the topic of<br>scaling laws but just just to summarize<br>it's a paper and a set of papers and set<br>of ideas that say bigger might be better<br>for model size and data size in the in<br>the realm of machine learning it's<br>bigger and better but predictively<br>better okay this another topic of<br>conversation but anyway yeah so around<br>that time for some of us there were like<br>a lot of conceptual conversations about<br>what's this going to look like what's<br>the the story going to be for all these<br>different knowledge worker Fields about<br>how they're going to be um made better U<br>by this technology getting better and<br>then um I think there were a couple of<br>moments where like the theoretical gains<br>predicted in that paper uh started to<br>feel really concrete and it started to<br>feel like a moment where you could<br>actually go and not you know do a PhD if<br>you wanted to work on uh do useful work<br>in AI actually felt like now there was<br>this this whole set of systems one could<br>built that were really useful and I<br>think that the first moment we already<br>talked about a little bit which was<br>playing with the early bit of copell<br>like that was awesome and magical um I<br>think that the next big moment where<br>everything kind of clicked together was<br>actually getting early access to gbd4 so<br>sort of end of 2022 was when we were um<br>tinkering with that model and the Step<br>Up in capabilities felt enormous and<br>previous to that we had been working on<br>a couple of different projects we had<br>been um because of co-pilot because of<br>scaling laws because of our prior<br>interest in the technology we had been<br>uh tinkering around with tools for<br>programmers but things that are like<br>very specific so you know we were<br>building tools for uh Financial<br>professionals who have to work with in a<br>juper notebook or like you know playing<br>around with can you do static analysis<br>with these models and then the Step Up<br>in gbd4 felt like look that really made<br>concrete the theoretical gains that um<br>we had predicted before felt like you<br>could build a lot more just immediately<br>at that point in time and<br>also if we were being consistent it<br>really felt like um this wasn't just<br>going to be a point solution thing this<br>was going to be all of programming was<br>going to flow through these models it<br>felt like that demanded a different type<br>of programming environment to different<br>type of programming and so we set off to<br>build that that sort of larger Vision<br>around then there's one that I<br>distinctly remember so my roommate is an<br>IMO gold winner and uh there's a<br>competition in the US called of putam<br>which is sort of the IMO for college<br>people and it's it's this math<br>competition is he's exceptionally good<br>so Shang Tong and Aman I remember it<br>sort of June June of<br>2022 had this bet on whether the mo like<br>2024 June or July you were going to win<br>a gold medal in the Imo with the with<br>like models IMO is international math<br>Olympiad uh yeah IMO is international<br>math Olympiad and so Arvid and I are<br>both of you know also competed in it so<br>was sort of personal and uh and I I<br>remember thinking Matt is just this is<br>not going to happen this was like it un<br>like even though I I sort of believed in<br>progress I thought you know I'm a girl<br>just like Aman is just delusional that<br>was the that was the and and to be<br>honest I mean I I was to be clear it<br>very wrong but that was maybe the most<br>preent bet in the group so the the new<br>results from Deep Mind it turned out<br>that you were correct that's what well<br>it technically not technically incorrect<br>but one point awayan was very<br>enthusiastic about this stuff back then<br>and before Aman had this like scaling<br>loss t-shirt that he would walk around<br>with where it had like charts and like<br>the formulas on it oh so you like felt<br>the AI or you felt the scaling yeah I i<br>l remember there was this one<br>conversation uh I had with with Michael<br>where before I hadn't thought super<br>deeply and critically about scaling laws<br>and he kind of posed the question why<br>isn't scaling all you need or why isn't<br>scaling going to result in massive gains<br>in progress and I think I went through<br>like the like the stages of grief there<br>is anger denial and then finally at the<br>end just thinking about it uh acceptance<br>um and I think I've been quite hopeful<br>and uh optimistic about progress since I<br>think one thing I'll caveat is I think<br>it also depends on like which domains<br>you're going to see progress like math<br>is a great domain because especially<br>like formal theor improving because you<br>get this fantastic signal of actually<br>verifying if the thing was correct and<br>so this means something like RL can work<br>really really well and I think like you<br>could have systems that are perhaps very<br>superhuman in math and still not<br>technically have ai okay so can we take<br>it off all the way to cursor mhm and<br>what is cursor it's a fork of vs code<br>and VSS code is one of the most popular<br>editors for a long time like everybody<br>fell in love with it everybody left Vim<br>I left dmax for it<br>sorry<br>uh uh so it unified in some fun<br>fundamental way the uh the developer<br>community and then that you look at the<br>space of things you look at the scaling<br>laws AI is becoming amazing and you<br>decide decided okay it's not enough to<br>just write an extension Fe vs<br>code because there's a lot of<br>limitations to that we we need if AI is<br>going to keep getting better and better<br>and better we need to really like<br>rethink how the the AI is going to be<br>part of the editing process and so you<br>decided to Fork vs code and start to<br>build a lot of the amazing features<br>we'll be able to to to talk about but<br>what was that decision like because<br>there's a lot of extensions including<br>co-pilot of vs code that are doing so AI<br>type stuff what was the decision like to<br>just Fork vs code so the decision to do<br>an editor seemed kind of self-evident to<br>us for at least what we wanted to do and<br>Achieve because when we started working<br>on the editor the idea was these models<br>are going to get much better their<br>capabilities are going to improve and<br>it's going to entirely change how you<br>build software both in a you will have<br>big productivity gains but also radical<br>in how like the active building software<br>is going to change a lot and so you're<br>very limited in the control you have<br>over a code editor if you're a plugin to<br>an existing coding environment um and we<br>didn't want to get locked in by those<br>limitations we wanted to be able to um<br>just build the most useful stuff okay<br>well then the natural question<br>is you know VSS code is kind of with<br>copilot a competitor so how do you win<br>is is it basically just the speed and<br>the quality of the features yeah I mean<br>I think this is a space that is quite<br>interesting perhaps quite unique where<br>if you look at previous Tech waves<br>maybe there's kind of one major thing<br>that happened and unlocked a new wave of<br>companies but every single year every<br>single model capability uh or jump you<br>get model capabilities you now unlock<br>this new wave of features things that<br>are possible especially in programming<br>and so I think in AI programming being<br>even just a few months ahead let alone a<br>year ahead makes your product much much<br>much more useful I think the cursor a<br>year from now will need to make the<br>cursor of today look<br>Obsolete and I think you know Microsoft<br>has' done a number of like fantastic<br>things but I don't think they're in a<br>great place to really keep innovating<br>and pushing on this in the way that a<br>startup can just rapidly implementing<br>features and and push yeah like and and<br>kind of doing the research<br>experimentation<br>necessary um to really push the ceiling<br>I don't I don't know if I think of it in<br>terms of features as I think of it in<br>terms of like capabilities for for<br>programmers it's that like you know as<br>you know the new one model came out and<br>I'm sure there are going to be more more<br>models of different types like longer<br>context and maybe faster like there's<br>all these crazy ideas that you can try<br>and hopefully 10% of the crazy ideas<br>will make it into something kind of cool<br>and useful and uh we want people to have<br>that sooner to rephrase it's like an<br>underrated fact is we're making it for<br>oursel when we started cursor you really<br>felt this frustration that you know<br>models you could see models getting<br>better uh but the coall experience had<br>not changed it was like man these these<br>guys like the steing is getting higher<br>like why are they not making new things<br>like they should be making new things<br>they should be like you like like<br>where's where's where's all the alpha<br>features there there were no Alpha<br>features it was like uh I I'm sure it it<br>was selling well I'm sure it was a great<br>business but it didn't feel I I'm I'm<br>one of these people that really want to<br>try and use new things and was just<br>there's no new thing for like a very<br>long while yeah it's interesting uh I<br>don't know how you put that into words<br>but when you compare a cursor with<br>copilot copilot pretty quickly became<br>started to feel stale for some reason<br>yeah I think one thing that I think uh<br>helps us is that we're sort of doing it<br>all in one where we're developing the<br>the ux and the way you interact with the<br>model and at the same time as we're<br>developing like how we actually make the<br>model give better answers so like how<br>you build up the The Prompt or or like<br>how do you find the context and for a<br>cursor tab like how do you train the<br>model um so I think that helps us to<br>have all of it like sort of like the<br>same people working on the entire<br>experience on end yeah it's like the the<br>person making the UI and the person<br>training the model like sit to like 18<br>ft away so often the same person even<br>yeah often often even the same person so<br>you you can you create things that that<br>are sort of not possible if you're not<br>you're not talking you're not<br>experimenting and you're using like you<br>said cursor to write cursor of course oh<br>yeah yeah well let's talk about some of<br>these features let's talk about the all-<br>knowing the all powerful praise B to the<br>tab so the you know autocomplete on<br>steroids basically so what how does tab<br>work what is tab to highlight and<br>summarize it a high level I'd say that<br>there are two things that curser is<br>pretty good at right now there there are<br>other things that it does um but two<br>things it it helps programmers with one<br>is this idea of looking over your<br>shoulder and being like a really fast<br>colleague who can kind of jump ahead of<br>you and type and figure out what you're<br>what you're going to do next and that<br>was the original idea behind that was<br>kind kind of the kernel the idea behind<br>a good autocomplete was predicting what<br>you're going to do next you can make<br>that concept even more ambitious by not<br>just predicting the characters after<br>cursor but actually predicting the next<br>entire change you're going to make the<br>next diff the next place you're going to<br>jump to um and the second thing cursor<br>is is pretty good at right now too is<br>helping you sometimes jump ahead of the<br>AI and tell it what to do and go from<br>instructions to code and on both of<br>those we've done a lot of work on making<br>the editing experience for those things<br>ergonomic um and also making those<br>things smart and fast one of the things<br>we really wanted was we wanted the model<br>to be able to edit code for us uh that<br>was kind of a wish and we had multiple<br>attempts at it before before we had a<br>sort of a good model that could edit<br>code for<br>you U then after after we had a good<br>model I think there there have been a<br>lot of effort to you know make the<br>inference fast for you know uh having<br>having a good good<br>experience and uh we've been starting to<br>incorporate I mean Michael sort of<br>mentioned this like ability to jump to<br>different places and that jump to<br>different places I think came from a<br>feeling off you know once you once you<br>accept an edit um was like man it should<br>be just really obvious where to go next<br>it's like it's like I I made this change<br>the model should just know that like the<br>next place to go to is like 18 lines<br>down like uh if you're if you're a whim<br>user you could press 18 JJ or<br>whatever but like why why even why am I<br>doing this like the model the model<br>should just know it and then so so the<br>idea was you you just press tab it would<br>go 18 lines down and then make it would<br>show you show you the next edit and you<br>would press tab so it's just you as long<br>as you could keep pressing Tab and so<br>the internal competition was how many<br>tabs can we make them pressive once you<br>have like the idea uh more more uh sort<br>of abstractly the the thing to think<br>about is sort of like once how how how<br>are the edit sort of zero zero entropy<br>so once You' sort of expressed your<br>intent and the edit is there's no like<br>new bits of information to finish your<br>thought but you still have to type some<br>characters to like make the computer<br>understand what you're actually thinking<br>then maybe the model should just sort of<br>read your mind and and all the zero<br>entropy bits should just be like tabbed<br>away yeah that was that was sort of the<br>abstract there's this interesting thing<br>where if you look at language model loss<br>on on different domains um I believe the<br>bits per bite which is kind of character<br>normalized loss for code is lower than<br>language which means in general there<br>are a lot of tokens in code that are<br>super predictable lot of characters that<br>are super predictable um and this is I<br>think even magnified when you're not<br>just trying to autocomplete code but<br>predicting what the user is going to do<br>next in their editing of existing code<br>and so you know the gold cursor tab is<br>let's eliminate all the low entropy<br>actions you take inside of the editor<br>when the intent is effectively<br>determined let's just jump you forward<br>in time skip you forward well well<br>what's the intuition and what's the<br>technical details of how to do next<br>cursor prediction that jump that's not<br>that's not so intuitive I think to<br>people yeah I think I can speak to a few<br>of the details on how how to make these<br>things work they're incredibly low<br>latency so you need to train small<br>models on this on this task um in<br>particular they're incredibly pre-fill<br>token hungry what that means is they<br>have these really really long prompts<br>where they see a lot of your code and<br>they're not actually generating that<br>many tokens and so the perfect fit for<br>that is using a sparse model meaning Ane<br>model um so that was kind of one one<br>break one breakthrough we made that<br>substantially improved its performance<br>at longer context the other being um a<br>variant of speculative decoding that we<br>we kind of built out called speculative<br>edits um these are two I think important<br>pieces of what make it quite high<br>quality um and very fast okay soe<br>mixture of experts the input is huge the<br>output is small yeah okay so like what<br>what what else can you say about how to<br>make it like caching play a role in this<br>cashing plays a huge role M um because<br>you're dealing with this many input<br>tokens if every single keystroke that<br>you're typing in a given line you had to<br>rerun the model on all those tokens<br>passed in you're just going to one<br>significantly deg grade latency two<br>you're going to kill your gpus with load<br>so you need to you you need to design<br>the actual prompts use for the model<br>such that they're cach caching aware and<br>then yeah you need you need to re use<br>the KV cach across request just so that<br>you're spending less work less compute<br>uh again what are the things that tab is<br>supposed to be able to do kind of in the<br>near term just to like sort of Linger on<br>that generate code like fill empty<br>space Also edit code across multiple<br>lines yeah and then jump to different<br>locations inside the same file yeah and<br>then like hopefully jump to different<br>files also so if you make an edit in one<br>file and maybe maybe you have to go<br>maybe you have to go to another file to<br>finish your thought it should it should<br>go to the second file also yeah and then<br>the full generalization is like next<br>next action prediction like sometimes<br>you need to run a command in the<br>terminal and it should be able to<br>suggest the command based on the code<br>that you wrote too um or sometimes you<br>actually need to like it suggest<br>something but you you it's hard for you<br>to know if it's correct because you<br>actually need some more information to<br>learn like you need to know the type to<br>be able to verify that it's correct and<br>so maybe it should actually take you to<br>a place that's like the definition of<br>something and then take you back so that<br>you have all the requisite knowledge to<br>be able to accept the next completion Al<br>also providing the human the knowledge<br>yes right yeah can you integrate like I<br>just uh gotten to know a guy named Prime<br>Jen who I believe has an SS you can<br>order coffee via SSH<br>oh yeah oh we did that we did that uh so<br>can that also the model do that like<br>feed you and like yeah and provide you<br>with caffeine okay so that's the general<br>framework yeah and the the magic moment<br>would be<br>if it is programming is this weird<br>discipline where um sometimes the next<br>five minutes not always but sometimes<br>the next five minutes of what you're<br>going to do is actually predictable from<br>the stuff you've done recently and so<br>can you get to a world where that next 5<br>minutes either happens by you<br>disengaging and it taking you through or<br>maybe a little bit more of just you<br>seeing Next Step what it's going to do<br>and you're like okay that's good that's<br>good that's good that's good and you can<br>just sort of tap tap tap through these<br>big changes as we're talking about this<br>I should mention like one of the really<br>cool and noticeable things about cursor<br>is that there's this whole diff<br>interface situation going on so like the<br>model suggests with uh with the red and<br>the green of like here's how we're going<br>to modify the code and in the chat<br>window you can apply and it shows you<br>the diff and you can accept the diff so<br>maybe can you speak to whatever<br>direction of that we'll probably have<br>like four or five different kinds of<br>diffs uh so we we have optimized the<br>diff for for the autocomplete so that<br>has a different diff interface<br>than uh then when you're reviewing<br>larger blocks of code and then we're<br>trying to optimize uh another diff thing<br>for when you're doing multiple different<br>files uh and and sort of at a high level<br>the difference is for<br>when you're doing autocomplete it should<br>be really really fast to<br>read uh actually it should be really<br>fast to read in all situations but in<br>autocomplete it sort of you're you're<br>really like your eyes focused in one<br>area you you can't be in too many you<br>the humans can't look in too many<br>different places so you're talking about<br>on the interface side like on the<br>interface side so it currently has this<br>box on the side so we have the current<br>box and if it tries to delete code in<br>some place and tries to add other code<br>it tries to show you a box on the you<br>can maybe show it if we pull it up on<br>cursor. comom this is what we're talking<br>about so that it was like three or four<br>different attempts at trying to make<br>this this thing work where first the<br>attempt was like these blue crossed out<br>line so before it was a box on the side<br>it used to show you the code to delete<br>by showing you like uh like Google doc<br>style you would see like a line through<br>it then you would see the the new code<br>that was super distracting and then we<br>tried many different you know there was<br>there was sort of deletions there was<br>trying to Red highlight then the next<br>iteration of it which is sort of funny<br>Would you would hold the on Mac the<br>option button so it would it would sort<br>of highlight a region of code to show<br>you that there might be something coming<br>uh so maybe in this example like the<br>input and the value uh would get would<br>all get blue and the blue would to<br>highlight that the AI had a suggestion<br>for you uh so instead of directly<br>showing you the thing it would show you<br>that the AI it would just hint that the<br>AI had a suggestion and if you really<br>wanted to see it you would hold the<br>option button and then you would see the<br>new suggestion then if you release the<br>option button you would then see your<br>original code mhm so that's by the way<br>that's pretty nice but you have to know<br>to hold the option button yeah I by the<br>way I'm not a Mac User but I got it it<br>was it was it's a button I guess you<br>people<br>it's h you know it's again it's just<br>it's just nonintuitive I think that's<br>the that's the key thing and there's a<br>chance this this is also not the final<br>version of it I am personally very<br>excited for<br>um making a lot of improvements in this<br>area like uh we we often talk about it<br>as the verification problem where U<br>these diffs are great for small edits uh<br>for large edits or like when it's<br>multiple files or something it's um<br>actually<br>a little bit prohibitive to to review<br>these diffs and uh uh so there are like<br>a couple of different ideas here like<br>one idea that we have is okay you know<br>like parts of the diffs are important<br>they have a lot of information and then<br>parts of the diff um are just very low<br>entropy they're like exam like the same<br>thing over and over again and so maybe<br>you can highlight the important pieces<br>and then gray out the the not so<br>important pieces or maybe you can have a<br>model that uh looks at the the diff and<br>and sees oh there's a likely bug here I<br>will like Mark this with a little red<br>squiggly and say like you should<br>probably like review this part of the<br>diff um and ideas in in that vein I<br>think are exciting yeah that's a really<br>fascinating space of like ux design<br>engineering so you're basically trying<br>to guide the human programmer through<br>all the things they need to read and<br>nothing more yeah like optimally yeah<br>and you want an intelligent model to do<br>it like ly diffs Al diff algorithms are<br>they're like Al like they're just like<br>normal algorithms uh there's no<br>intelligence uh there's like<br>intelligence that went into designing<br>the algorithm but then there there's no<br>like you don't care if the if it's about<br>this thing or this thing uh and so you<br>want a model to to do this so I think<br>the the the general question is like M<br>these models are going to get much<br>smarter as the models get much smarter<br>uh the the changes they will be able to<br>propose are much bigger so as the<br>changes gets bigger and bigger and<br>bigger the humans have to do more and<br>more and more verification work it gets<br>more and more more hard like it's just<br>you need you need to help them out it<br>sort of I I don't want to spend all my<br>time reviewing<br>code uh can you say a little more across<br>multiple files div yeah I mean so GitHub<br>tries to solve this right with code<br>review when you're doing code review<br>you're reviewing multiple deaths cross<br>multiple files but like Arvid said<br>earlier I think you can do much better<br>than code review you know code review<br>kind of sucks like you spend a lot of<br>time trying to grock this code that's<br>often quite unfamiliar to you and it<br>often like doesn't even actually catch<br>that many bugs and I think you can<br>signific significantly improve that<br>review experience using language models<br>for example using the kinds of tricks<br>that AR had described of maybe uh<br>pointing you towards the regions that<br>matter<br>um I think also if the code is produced<br>by these language models uh and it's not<br>produced by someone else like the code<br>review experience is designed for both<br>the reviewer and the person that<br>produced the code in the case where the<br>person that produced the code is a<br>language model you don't have to care<br>that much about their experience and you<br>can design the entire thing around the<br>reviewer such that the reviewer's job is<br>as fun as easy as productive as possible<br>um and I think that that feels like the<br>issue with just kind of naively trying<br>to make these things look like code<br>review I think you can be a lot more<br>creative and and push the boundary and<br>what's possible just one one idea there<br>is I think ordering matters generally<br>when you review a PR you you have this<br>list of files and you're reviewing them<br>from top to bottom but actually like you<br>actually want to understand this part<br>first because that came like logically<br>first and then you want understand the<br>next part and um you don't want to have<br>to figure out that yourself you want a<br>model to guide you through the thing and<br>is the step of creation going to be more<br>and more natural language is the goal<br>versus with actual uh I think sometimes<br>I don't think it's going to be the case<br>that all of programming will be natural<br>language and the reason for that is you<br>know if I'm PR programming with swalla<br>and swall is at the computer and the<br>keyboard uh and sometimes if I'm like<br>driving I want to say to swallet hey<br>like implement this function and that<br>that works and then sometimes it's just<br>so annoying to explain to swalla what I<br>want him to do and so I actually take<br>over the keyboard and I show him I I<br>write like part of the example and then<br>it makes sense and that's the easiest<br>way to communicate and so I think that's<br>also the case for AI like sometimes the<br>easiest way to communicate with the AI<br>will be to show an example and then it<br>goes and does the thing everywhere else<br>or sometimes if you're making a website<br>for example the easiest way to show to<br>the a what you want is not to tell it<br>what to do but you know drag things<br>around or draw things um and yeah and<br>and like maybe eventually we will get to<br>like brain machine interfaces or<br>whatever and can of like understand what<br>you're thinking and so I think natural<br>language will have a place I think it<br>will not definitely not be the way most<br>people program most of the time I'm<br>really feeling the AGI with this editor<br>uh it feels like there's a lot of<br>machine learning going on underneath<br>tell tell me about some of the ml stuff<br>that makes it all work recursor really<br>works via this Ensemble of custom models<br>that that that we've trained alongside<br>you know the frontier models that are<br>fantastic at the reasoning intense<br>things and so cursor tab for example is<br>is a great example of where you can<br>specialize this model to be even better<br>than even Frontier models if you look at<br>evls on on the on the task we set it at<br>the other domain which it's kind of<br>surprising that it requires custom<br>models but but it's kind of necessary<br>and works quite well is in apply<br>um<br>so I think these models are like the<br>frontier models are quite good at<br>sketching out plans for code and<br>generating like rough sketches of like<br>the change but<br>actually creating diffs is quite hard um<br>for Frontier models for your training<br>models um like you try to do this with<br>Sonet with 01 any Frontier Model and it<br>it really messes up stupid things like<br>counting line numbers um especially in<br>super super large file<br>um and so what we've done to alleviate<br>this is we let the model kind of sketch<br>out this rough code block that indicates<br>what the change will be and we train a<br>model to then apply that change to the<br>file and we should say that apply is the<br>model looks at your code it gives you a<br>really damn good suggestion of what new<br>things to do and the seemingly for<br>humans trivial step of combining the two<br>you're saying is not so trivial contrary<br>to popular perception it is not a<br>deterministic algorithm yeah I I I think<br>like you see shallow copies of apply um<br>elsewhere and it just breaks like most<br>of the time because you think you can<br>kind of try to do some deterministic<br>matching and then it fails you know at<br>least 40% of the time and that just<br>results in a terrible product<br>experience um I think in general this<br>this regime of you are going to get<br>smarter models and like so one other<br>thing that apply lets you do is it lets<br>you use fewer tokens with the most<br>intelligent models uh this is both<br>expensive in terms of latency for<br>generating all these tokens um and cost<br>so you can give this very very rough<br>sketch and then have your smaller models<br>go and implement it because it's a much<br>easier task to implement this very very<br>sketched out code and I think that this<br>this regime will continue where you can<br>use smarter and SM models to do the<br>planning and then maybe the<br>implementation details uh can be handled<br>by the less intelligent ones perhaps<br>you'll have you know maybe 01 maybe<br>it'll be even more cap capable models<br>given an even higher level plan that is<br>kind of recursively uh applied by Sonet<br>and then the apply model maybe we should<br>we should talk about how to how to make<br>it fast yeah I feel like fast is always<br>an interesting detail fast good yeah how<br>do you make it fast yeah so one big<br>component of making it it fast is<br>speculative edits so speculative edits<br>are a variant of speculative decoding<br>and maybe be helpful to briefly describe<br>speculative decoding um with speculative<br>decoding what you do is you you can kind<br>of take advantage of the fact that you<br>know most of the time and I I'll add the<br>caveat that it would be when you're<br>memory Bound in in language model<br>Generation Um if you process multiple<br>tokens at once um it is faster than<br>generating one Tok at a time so this is<br>like the same reason why if you look at<br>tokens per second uh with prompt tokens<br>versus generated tokens it's much much<br>faster for prompt tokens um so what we<br>do is instead of using what specul<br>decoding normally does which is using a<br>really small model to predict these<br>draft tokens that your larger model<br>would then go in and and verify um with<br>code edits we have a very strong prior<br>of what the existing code will look like<br>and that prior is literally the same<br>exact code so you can do is you can just<br>feed chunks of the original code back<br>into the into the model um and then the<br>model will just pretty much agree most<br>of the time that okay I'm just going to<br>spit this code back out and so you can<br>process all of those lines in parallel<br>and you just do this with sufficiently<br>many chunks and then eventually you'll<br>reach a point of disagreement where the<br>model will now predict text that is<br>different from the ground truth original<br>code it'll generate those tokens and<br>then we kind of will decide after enough<br>tokens match<br>uh the original code to re start<br>speculating in chunks of code what this<br>actually ends up looking like is just a<br>much faster version of normal editing<br>code so it's just like it looks like a<br>much faster version of the model<br>rewriting all the code so just we we can<br>use the same exact interface that we use<br>for for diffs but it will just stream<br>down a lot faster and then and then the<br>advantage is that W wireless streaming<br>you can just also be reviewing start<br>reviewing the code exactly before before<br>it's done so there's no no big loading<br>screen uh so maybe that that is part of<br>the part of the advantage so the human<br>can start reading before the thing is<br>done I think the interesting riff here<br>is something like like speculation is a<br>fairly common idea nowadays it's like<br>not only in language models I mean<br>there's obviously speculation in CPUs<br>and there's there like speculation for<br>databases and like speculation all over<br>the place let me ask the sort of the<br>ridiculous question of uh which llm is<br>better at coding GPT Claude who wins in<br>the context of programming and I'm sure<br>the answer is much more Nuance because<br>it sounds like every single part of this<br>involves a different<br>model yeah I think they there's no model<br>that poo dominates uh others meaning it<br>is better in all categories that we<br>think matter the categories being<br>speed<br>um ability to edit code ability to<br>process lots of code long context you<br>know a couple of other things and kind<br>of coding<br>capabilities the one that I'd say right<br>now is just kind of net best is Sonet I<br>think this is a consensus opinion our<br>one's really interesting and it's really<br>good at reasoning so if you give it<br>really hard uh programming interview<br>style problems or lead code problems it<br>can do quite quite well on them um but<br>it doesn't feel like it kind of<br>understands your rough intent as well as<br>son it<br>does like if you look at a lot of the<br>other Frontier models um one qual I have<br>is it feels like they're not necessarily<br>over I'm not saying they they train in<br>benchmarks um but they perform really<br>well in benchmarks relative to kind of<br>everything that's kind of in the middle<br>so if you tried on all these benchmarks<br>and things that are in the distribution<br>of the benchmarks they're valuated on<br>you know they'll do really well but when<br>you push them a little bit outside of<br>that son's I think the one that that<br>kind of does best at at kind of<br>maintaining that same capability like<br>you kind of have the same capability in<br>The Benchmark as when you try to<br>instruct it to do anything with coding<br>what another ridiculous question is the<br>difference between the normal<br>programming experience versus what<br>benchmarks represent like where do<br>benchmarks fall short do you think when<br>we're evaluating these models by the way<br>that's like a really really hard it's<br>like like critically important detail<br>like how how different like benchmarks<br>are versus where is like real coding<br>where real<br>coding it's not interview style coding<br>it's you're you're doing these you know<br>humans are saying like half broken<br>English sometimes and sometimes you're<br>saying like oh do what I did<br>before sometimes you're saying uh you<br>know go add this thing and then do this<br>other thing for me and then make this UI<br>element and then you know it's it's just<br>like a lot of things are sort of context<br>dependent<br>you really want to like understand the<br>human and then do do what the human<br>wants as opposed to sort of this maybe<br>the the way to put it is sort of<br>abstractly is uh the interview problems<br>are<br>very wellp<br>specified they lean a lot on<br>specification while the human stuff is<br>less<br>specified yeah I think that this this SP<br>for question is both Complicated by what<br>um Sol just mentioned and then also to<br>what Aman was getting into is that even<br>if you like you know there's this<br>problem of like the skew between what<br>can you actually model in a benchmark<br>versus uh real programming and that can<br>be sometimes hard to encapsulate because<br>it's like real programming is like very<br>messy and sometimes things aren't super<br>well specified what's correct or what<br>isn't but then uh it's also doubly hard<br>because of this public Benchmark problem<br>and that's both because public<br>benchmarks are sometimes kind of Hill<br>climbed on then it's like really really<br>hard to also get the data from the<br>public benchmarks out of the models and<br>so for instance like one of the most<br>popular like agent benchmarks sweet<br>bench um is really really contaminated<br>in the training data of uh these<br>Foundation models and so if you ask<br>these Foundation models to do a sweet<br>bench problem you actually don't give<br>them the context of a codebase they can<br>like hallucinate the right file pass<br>they can hallucinate the right function<br>names um and so the the it's it's also<br>just the public aspect of these things<br>is tricky yeah like in that case it<br>could be trained on the literal issues<br>or pool request themselves and and maybe<br>the lives will start to do a better job<br>um or they've already done a good job at<br>decontaminating those things but they're<br>not going to emit the actual training<br>data of the repository itself like these<br>are all like some of the most popular<br>python repositories like simpai is one<br>example I don't think they're going to<br>handicap their models on Senpai and all<br>these popular P python repositories in<br>order to get uh true evaluation scores<br>in these benchmarks yeah I think that<br>given the dirs and benchmarks<br>um there have been like a few<br>interesting crutches that uh places that<br>build systems with these models or build<br>these models actually use to get a sense<br>of are they going in the right direction<br>or not and uh in a lot of places uh<br>people will actually just have humans<br>play with the things and give<br>qualitative feedback on these um like<br>one or two of the foundation model<br>companies they they have people who<br>that's that's a big part of their role<br>and you know internally we also uh you<br>know qualitatively assess these models<br>and actually lean on that a lot in<br>addition to like private evals that we<br>have it's like the live<br>the vibe yeah the vi the vibe Benchmark<br>human Benchmark the hum you pull in the<br>humans to do a Vibe check yeah okay I<br>mean that's that's kind of what I do<br>like just like reading online forums and<br>Reddit and X just like well I don't know<br>how<br>to properly load in people's opinions<br>because they'll say things like I feel<br>like Claude or gpt's gotten Dumber or<br>something they'll say I feel like<br>and then I sometimes feel like that too<br>but I wonder if it's the model's problem<br>or mine yeah with Claude there's an<br>interesting take I heard where I think<br>AWS has different chips um and I I<br>suspect they've slightly different<br>numerics than uh Nvidia gpus and someone<br>speculated that claud's deg degraded<br>performance had to do with maybe using<br>the quantise version that existed on AWS<br>Bedrock versus uh whatever was running<br>on on anthropics gpus I interview a<br>bunch of people that have conspiracy<br>theories so I'm glad spoke spoke to this<br>conspiracy well it's it's not not like<br>conspiracy theory as much as they're<br>just they're like they're you know<br>humans humans are humans and there's<br>there's these details and you know<br>you're<br>doing like these quzy amount of flops<br>and you know chips are messy and man you<br>can just have bugs like bugs are it's<br>it's hard to overstate how how hard bugs<br>are to avoid what's uh the role of a<br>good prompt in all this see you mention<br>that benchmarks have<br>really uh structured well formulated<br>prompts what what should a human be<br>doing to maximize success and what's the<br>importance of what the humans you wrote<br>a blog post on you called it prompt<br>design yeah uh I think it depends on<br>which model you're using and all of them<br>are likly different and they respond<br>differently to different prompts but um<br>I think the original gp4 uh and the<br>original sort of bre of models last last<br>year they were quite sensitive to the<br>prompts and they also had a very small<br>context window and so we have all of<br>these pieces of information around the<br>codebase that would maybe be relevant in<br>the prompt like you have the docs you<br>have the files that you add you have the<br>conversation history and then there's a<br>problem like how do you decide what you<br>actually put in the prompt and when you<br>have a a limited space and even for<br>today's models even when you have long<br>context filling out the entire context<br>window means that it's slower it means<br>that sometimes a model actually gets<br>confused and some models get more<br>confused than others and we have this<br>one system internally that we call preum<br>which helps us with that a little bit um<br>and I think it was built for the era<br>before where we had<br>8,000 uh token context Windows uh and<br>it's a little bit similar to when you're<br>making a website you you sort of you you<br>want it to work on mobile you want it to<br>work on a desktop screen and you have<br>this uh Dynamic information which you<br>don't have for example if you're making<br>like designing a print magazine you have<br>like you know exactly where you can put<br>stuff but when you have a website or<br>when you have a prompt you have these<br>inputs and then you need to format them<br>will always work even if the input is<br>really big then you might have to cut<br>something down uh and and and so the<br>idea was okay like let's take some<br>inspiration what's the best way to<br>design websites well um the thing that<br>we really like is is react and the<br>declarative approach where you um you<br>use jsx in in in JavaScript uh and then<br>you declare this is what I want and I<br>think this has higher priority or like<br>this has higher Z index than something<br>else um and<br>then you have this rendering engine in<br>web design it's it's like Chrome and uh<br>in our case it's a pre renderer uh which<br>then fits everything onto the page and<br>and so you declaratively decide what you<br>want and then it figures out what you<br>want um and and so we have found that to<br>be uh quite helpful and I think the role<br>of it has has sort of shifted over time<br>um where initially was to fit to these<br>small context Windows now it's really<br>useful because you know it helps us with<br>splitting up the data that goes into the<br>prompt and the actual rendering of it<br>and so um it's easier to debug because<br>you can change the rendering of the<br>prompt and then try it on Old prompts<br>because you have the raw data that went<br>into the prompt and then you can see did<br>my change actually improve it for for<br>like this entire evil set so do you<br>literally prompt with jsx yes yes so it<br>kind of looks like react there are<br>components like we have one component<br>that's a file component and it takes in<br>like the cursor<br>like usually there's like one line where<br>the cursor is in your file and that's<br>like probably the most important line<br>because that's the one you're looking at<br>and so then you can give priorities so<br>like that line has the highest priority<br>and then you subtract one for every line<br>that uh is farther away and then<br>eventually when it's render it to figure<br>out how many lines can I actually fit<br>and it centers around that thing that's<br>amazing yeah and you can do like other<br>fancy things where if you have lots of<br>code blocks from the entire code base<br>you could use uh retrieval um and things<br>like embedding and reranking scores to<br>add priorities for each of these<br>components so should humans when they<br>ask questions also use try to use<br>something like that like would it be<br>beneficial to write jsx in the in the<br>problem where the whole idea is should<br>be loose and messy I I think our goal is<br>kind of that you should just uh do<br>whatever is the most natural thing for<br>you and then we are job is to figure out<br>how do we actually like retrieve the<br>relative EV things so that your thing<br>actually makes sense well this is sort<br>of the discussion I had with uh Arvin of<br>perplexity is like his whole idea is<br>like you should let the person be as<br>lazy as he want but like yeah that's a<br>beautiful thing but I feel like you're<br>allowed to ask more of programmers right<br>so like if you say just do what you want<br>I mean humans are lazy there's a kind of<br>tension between just being lazy versus<br>like provide more is uh be prompted<br>almost like the system<br>pressuring you or inspiring you to be<br>articulate not in terms of the grammar<br>of the sentences but in terms of the<br>depth of thoughts that you convey inside<br>the uh the problems I think even as a<br>system gets closer to some level of<br>perfection often when you ask the model<br>for something you just are not not<br>enough intent is conveyed to know what<br>to do and there are like a few ways to<br>resolve that intent one is the simple<br>thing of having model just ask you I'm<br>not sure how to do these parts based in<br>your query could you clarify that um I<br>think the other could be<br>maybe if you there are five or six<br>possible Generations given the<br>uncertainty present in your query so far<br>why don't we just actually show you all<br>of those and let you pick<br>them how hard is it to for the model to<br>choose to speak talk back sort of versus<br>gener that's a that's hard sort of like<br>how to deal with the<br>uncertainty do I do I choose to ask for<br>more information to reduce the ambiguity<br>so I mean one of the things we we do is<br>um it's like a recent addition is try to<br>suggest files that you can add so and<br>while you're typing uh one can guess<br>what the uncertainty is and maybe<br>suggest that like you know maybe maybe<br>you're writing your API<br>and uh we can guess using the<br>commits uh that you've made previously<br>in the same file that the client and the<br>server is super useful and uh there's<br>like a hard technical problem of how do<br>you resolve it across all commits which<br>files are the most important given your<br>current prompt and we still sort of uh<br>initial version is ruled out and I'm<br>sure we can make it much more<br>accurate uh it's it's it's very<br>experimental but then the ideaas we show<br>you like do you just want to add this<br>file this file this file also to tell<br>you know the model to edit those files<br>for you uh because if if you're maybe<br>you're making the API like you should<br>also edit the client and the server that<br>is using the API and the other one<br>resolving the API and so that would be<br>kind of cool as both there's the phase<br>where you're writing the prompt and<br>there's before you even click enter<br>maybe we can help resolve some of the<br>uncertainty to what degree do you use uh<br>agentic approaches how useful are agents<br>we think agents are really really cool<br>like I I I think agents is like uh it's<br>like resembles sort of like a human it's<br>sort of like the like you can kind of<br>feel that it like you're getting closer<br>to AGI because you see a demo where um<br>it acts as as a human would and and it's<br>really really cool I think um agents are<br>not yet super useful for many things<br>they I think we're we're getting close<br>to where they will actually be useful<br>and so I think uh there are certain<br>types of tasks where having an agent<br>would be really nice like I would love<br>to have an agent for example if like we<br>have a bug where you sometimes can't<br>command C and command V uh inside our<br>chat input box and that's a task that's<br>super well specified I just want to say<br>like in two sentences this does not work<br>please fix it and then I would love to<br>have an agent that just goes off does it<br>and then uh a day later I I come back<br>and I review the the thing you mean it<br>goes finds the right file yeah it finds<br>the right files it like tries to<br>reproduce the bug it like fixes the bug<br>and then it verifies that it's correct<br>and this is could be a process that<br>takes a long time um and so I think I<br>would love to have that uh and then I<br>think a lot of programming like there is<br>often this belief that agents will take<br>over all of programming um I don't think<br>we think that that's the case because a<br>lot of programming a lot of the value is<br>in iterating or you don't actually want<br>to specify something upfront because you<br>don't really know what you want until<br>youve seen an initial version and then<br>you want to iterate on that and then you<br>provide more information and so for a<br>lot of programming I think you actually<br>want a system that's instant that gives<br>you an initial version instantly back<br>and then you can iterate super super<br>quickly uh what about something like<br>that recently came out rep agent that<br>does also like setting up the<br>development environment installing<br>software packages configuring everything<br>configuring the databases and actually<br>deploying the app yeah is that also in<br>the set of things you dream about I<br>think so I think that would be really<br>cool for for certain types of<br>programming uh it it would be really<br>cool is that within scope of cursor yeah<br>we're aren't actively working on it<br>right now um but it's definitely like we<br>want to make the programmer's life<br>easier and more fun and some things are<br>just really tedious and you need to go<br>through a bunch of steps and you want to<br>delegate that to an agent um and then<br>some things you can actually have an<br>agent in the background while you're<br>working like let's say you have a PR<br>that's both backend and front end and<br>you're working in the front end and then<br>you can have a background agent that<br>doesn't work and figure out kind of what<br>you're doing and then when you get to<br>the backend part of your PR then you<br>have some like initial piece of code<br>that you can iterate on um and and so<br>that that would also be really cool one<br>of the things we already talked about is<br>speed but I wonder if we can just uh<br>Linger on that some more in the the<br>various places that uh the technical<br>details involved in making this thing<br>really fast so every single aspect of<br>cursor most aspects of cursor feel<br>really fast like I mentioned the apply<br>is probably the slowest thing and for me<br>from sorry the<br>pain I know it's it's a pain it's a pain<br>that we're feeling and we're working on<br>fixing it uh<br>yeah I mean it says something that<br>something that feels I don't know what<br>it is like 1 second or two seconds that<br>feels slow that means that's actually<br>shows that everything else is just<br>really really fast um so is there some<br>technical details about how to make some<br>of these models so how to make the chat<br>fast how to make the diffs fast is there<br>something that just jumps to mind yeah I<br>mean so we can go over a lot of the<br>strategies that we use one interesting<br>thing is Cash Waring um and so what you<br>can is if as the user is typing you can<br>have yeah you're you're probably going<br>to use uh some piece of context and you<br>can know that before the user's done<br>typing so you know as we discussed<br>before reusing the KV cache results and<br>lower latency lower cost uh cross<br>requests so as a user starts type in you<br>can immediately warm the cache with like<br>let's say the current file contents and<br>then when theyve pressed enter uh<br>there's very few tokens it actually has<br>to to prefill and compute before<br>starting the generation this will<br>significantly lower ttf can you explain<br>how KV cach works yeah so the way<br>Transformers work um I like it I<br>mean like one one of the mechanisms that<br>allow Transformers to not just<br>independently like the mechanism that<br>allows Transformers to not just<br>independently look at each token but see<br>previous tokens are the keys and values<br>to tension and generally the way tension<br>works is you have at your current token<br>some query and then you've all the keys<br>and values of all your previous tokens<br>which are some kind of representation<br>that the model stores internally of all<br>the previous tokens in the prompt<br>and like by default when you're doing a<br>chat the model has to for every single<br>token do this forward pass through the<br>entire uh model that's a lot of Matrix<br>multiplies that happen and that is<br>really really slow instead if you have<br>already done that and you stored the<br>keys and values and you keep that in the<br>GPU then when I'm let's say I have<br>stored it for the last end tokens if I<br>now want to compute the the output token<br>for the N plus one token I don't need to<br>pass those first end tokens through the<br>entire model because I already have all<br>those keys and values and so you just<br>need to do the forward pass through that<br>last token and then when you're doing<br>attention uh you're reusing those keys<br>and values that have been computed which<br>is the only kind of sequential part um<br>or sequentially dependent part of the<br>Transformer is there like higher level<br>caching of like caching of the prompts<br>or that kind of stuff could help yeah<br>that that there's other types of caching<br>you can kind of do um one interesting<br>thing that you can do for cursor tab<br>is you can basically predict ahead as if<br>the user would have accepted the<br>suggestion and then trigger another uh<br>request<br>and so then you've cashed you've done<br>the speculative it's it's a mix of<br>speculation and caching right because<br>you're speculating what would happen if<br>they accepted it and then you have this<br>value that is cach this this uh<br>suggestion and then when they press tab<br>the next one would be waiting for them<br>immediately it's a it's a kind of clever<br>heuristic slash trick uh that uses a<br>higher level caching and and can give uh<br>the it feels fast despite there not<br>actually being any changes in the in the<br>model and if you can make the KV cach<br>smaller one of the advantages you get is<br>like maybe maybe you can speculate even<br>more maybe you can get seriously 10<br>things that you know could be useful I<br>like uh like predict the next 10 and and<br>then like it's possible the user hits<br>the the one of the 10 it's like much<br>higher chance than the user hits like<br>the exact one that you show them uh<br>maybe they typeing another character and<br>and he sort of hits hits something else<br>in the cache yeah so there's there's all<br>these tricks where um the the general<br>phenomena here is uh I think it's it's<br>also super useful for RL is you know may<br>maybe a single sample from the model<br>isn't very good but if you<br>predict like 10 different things uh<br>turns out that one of the 10 uh that's<br>right is the probability is much higher<br>there's these passid key curves and you<br>know part of RL like what what RL does<br>is you know you can you can exploit this<br>passid K phenomena to to make many<br>different predictions and and uh one one<br>way to think about this the model sort<br>of knows internally has like has some<br>uncertainty over like which of the key<br>things is correct or like which of the<br>key things does the human want when we<br>ARL our uh you know cursor Tab model one<br>of the things we're doing is we're<br>predicting which like which of the<br>hundred different suggestions the model<br>produces is more amendable for humans<br>like which of them do humans more like<br>than other things uh maybe maybe like<br>there's something with the model can<br>predict very far ahead versus like a<br>little bit and maybe somewhere in the<br>middle and and you just and then you can<br>give a reward to the things that humans<br>would like more and and sort of punish<br>the things that it would like and sort<br>of then train the model to Output the<br>suggestions that humans would like more<br>you you have these like RL Loops that<br>are very useful that exploit these<br>passive K curves um Oman maybe can can<br>go into even more detail yeah it's a<br>little it is a little different than<br>speed um but I mean like technically you<br>tie it back in because you can get away<br>with the smaller model if you are all<br>your smaller model and it gets the same<br>performance as the bigger one um that's<br>like and SW I was mentioning stuff about<br>KV about reducing the size of your KV<br>cach there there are other techniques<br>there as well that are really helpful<br>for Speed um so kind of back in the day<br>like all the way two years ago uh people<br>mainly use multi-ad attention um and I<br>think there's been a migration towards<br>more uh efficient attention schemes like<br>group query um or multiquery attention<br>and this is really helpful for then uh<br>with larger batch sizes being able to<br>generate the tokens much faster the<br>interesting thing here is um this now<br>has no effect on that uh time to First<br>token pre-fill speed uh the thing this<br>matters for is uh now generating tokens<br>and and why is that because when you're<br>generating tokens instead of uh being<br>bottlenecked by doing the super<br>realizable Matrix multiplies across all<br>your tokens you're bottleneck by how<br>quickly it's for long context um with<br>large batch sizes by how quickly you can<br>read those cache keys and values um and<br>so then how that that's memory bandwidth<br>and how can we make this faster we can<br>try to compress the size of these keys<br>and values so multiquery attention is<br>the most aggressive of these um where<br>normally with multi-head attention you<br>have some number of quote unquote<br>attention heads um and some number of<br>kind of query query heads U multiquery<br>just preserves the query heads gets rid<br>of all the key value heads um so there's<br>only one kind of key value head and<br>there's all the remaining uh query heads<br>with group query um you instead you know<br>preserve all the query heads and then<br>your keys and values are kind of in<br>there are fewer heads for the keys and<br>values but you're not reducing it to<br>just one um but anyways like the whole<br>point here is you're just reducing the<br>size of your KV cache and then there is<br>MLA yeah multi- latent um that's a<br>little more complicated and the way that<br>this works is it kind of turns the<br>entirety of your keys and values across<br>all your heads into this kind of one<br>latent Vector that is then kind of<br>expanded in frence time but MLA is from<br>this company uh called Deep seek um it's<br>it's quite an interesting algorithm uh<br>maybe the key idea is sort of uh in both<br>mqa uh and in other places what you're<br>doing is sort of reducing the uh num<br>like the number of KV heads the<br>advantage you get from that is is you<br>know there's less of them but uh maybe<br>the theory is that you actually want a<br>lot of different uh like you want each<br>of the the keys and values to actually<br>be different so one way to reduce the<br>size is you keep<br>uh one big shared Vector for all the<br>keys and values and then you have<br>smaller vectors for every single token<br>so that when you m you can you can store<br>the only the smaller thing as some sort<br>of like low rank reduction and the low<br>rank reduction with that and at the end<br>of the time when you when you eventually<br>want to compute the final thing uh<br>remember that like your memory bound<br>which means that like you still have<br>some some compute left that you can use<br>for these things and so if you can<br>expand the um the latent vector<br>back out and and somehow like this is<br>far more efficient because just like<br>you're reducing like for example maybe<br>like you're reducing like 32 or<br>something like the size of the vector<br>that you're keeping yeah there's perhaps<br>some richness in having a separate uh<br>set of keys and values and query that<br>kind of pawise match up versus<br>compressing that all into<br>one and that interaction at least okay<br>and all of that is dealing with um being<br>memory bound yeah<br>and what I mean ultimately how does that<br>map to the user experience trying to get<br>the yeah the the two things that it maps<br>to is you can now make your cash a lot<br>larger because you've less space<br>allocated for the KB cash you can maybe<br>cash a lot more aggressively and a lot<br>more things do you get more cash hits<br>which are helpful for reducing the time<br>to First token for the reasons that were<br>kind of described earlier and then the<br>second being when you start doing<br>inference with more and more requests<br>and larger and larger batch sizes you<br>don't see much of a Slowdown in as it's<br>generating the tokens the speed of that<br>what it also allows you to make your<br>prompt bigger for certain yeah yeah so<br>like the basic the size of your KV cache<br>is uh both the size of all your prompts<br>multiply by the number of prompts being<br>processed in parallel so you could<br>increase either those Dimensions right<br>the batch size or the size of your<br>prompts without degrading the latency of<br>generating tokens Arvid you wrote a blog<br>post Shadow workspace iterating on code<br>in the background yeah so what's going<br>on uh so to be clear we want there to be<br>a lot of stuff stuff happening in the<br>background and we're experimenting with<br>a lot of things uh right now uh we don't<br>have much of that happening other than<br>like the the cash warming or like you<br>know figuring out the right context to<br>that goes into your command PRS for<br>example uh but the idea is if you can<br>actually spend computation in the<br>background then you can help um help the<br>user maybe like at a slightly longer<br>time Horizon than just predicting the<br>next few lines that you're going to make<br>but actually like in the next 10 minutes<br>what are you're going to make and by<br>doing it in background you can spend<br>more comp computation doing that and so<br>the idea of the Shadow workspace that<br>that we implemented and we use it<br>internally for like experiments um is<br>that to actually get advantage of doing<br>stuff in the background you want some<br>kind of feedback signal to give give<br>back to the model because otherwise like<br>you can get higher performance by just<br>letting the model think for longer um<br>and and so like o1 is a good example of<br>that but another way you can improve<br>performance is by letting the model<br>iterate and get feedback and and so one<br>very important piece of feedback when<br>you're a programmer is um the language<br>server which is uh this thing it exists<br>uh for most different languages and<br>there's like a separate language Ser per<br>language and it can tell you you know<br>you're using the wrong type appear and<br>then gives you an error or it can allow<br>you to go to definition and sort of<br>understands the structure of your code<br>so language servers are extensions<br>developed by like there's a typescript<br>language Ser developed by the typescript<br>people a rust language Ser developed by<br>the rust people and then they all inter<br>interface over the language server<br>protocol to vs code so that vs code<br>doesn't need to have all of the<br>different languages built into vs code<br>but rather uh you can use the existing<br>compiler infrastructure for linting<br>purposes what it's for it's for linting<br>it's for going to definition uh and for<br>like seeing the the right types that<br>you're using uh so it's doing like type<br>checking also yes type checking and and<br>going to references um and that's like<br>when you're working in a big project you<br>you kind of need that if you if you<br>don't have that it's like really hard to<br>to code in a big project can you say<br>again how that's being used inside<br>cursor the the language server protocol<br>communication thing so it's being used<br>in cursor to show to the programmer just<br>like nvs could but then the idea is you<br>want to show that same information to<br>the models the I models um and you want<br>to do that in a way that doesn't affect<br>the user because you wanted to do it in<br>background and so the idea behind the<br>chadow workspace was okay like one way<br>we can do this is um we spawn a separate<br>window of cursor that's hidden and so<br>you can set this flag and electron is<br>hidden there is a window but you don't<br>actually see it and inside of this<br>window uh the AI agents can modify code<br>however they want um as long as they<br>don't save it because it's still the<br>same folder um and then can get feedback<br>from from the lters and go to definition<br>and and iterate on their code so like<br>literally run everything in the<br>background like as if right yeah maybe<br>even run the code so that's the eventual<br>version okay that's what you want and a<br>lot of the blog post is actually about<br>how do you make that happen because it's<br>a little bit tricky you want it to be on<br>the user's machine so that it exactly<br>mirrors the user's environment<br>and then on Linux you can do this cool<br>thing where you can actually mirror the<br>file system and have the AI make changes<br>to the files and and it thinks that it's<br>operating on the file level but actually<br>that's stored in in memory and you you<br>can uh create this kernel extension to<br>to make it work um whereas on Mac and<br>windows it's a little bit more difficult<br>uh and and uh but it's it's a fun<br>technical problems that's way one one<br>maybe hacky but interesting idea that I<br>like is holding a lock on saving and so<br>basically you can then have the language<br>model kind of hold the lock on on saving<br>to disk and then instead of you<br>operating in the ground truth version of<br>the files uh that are save to dis you<br>you actually are operating what was the<br>shadow workspace before and these<br>unsaved things that only exist in memory<br>that you still get Lind erors for and<br>you can code in and then when you try to<br>maybe run code it's just like there's a<br>small warning that there's a lock and<br>then you kind of will take back the lock<br>from the language server if you're<br>trying to do things concurrently or from<br>the the shadow workspace if you're<br>trying to do things concurrently that's<br>such an exciting feuture by the way it's<br>a bit of a tangent but like to allow a<br>model to change files it's scary for<br>people but like it's really cool to be<br>able to just like let the agent do a set<br>of tasks and you come back the next day<br>and kind of observe like it's a<br>colleague or something like that yeah<br>yeah and I think there may be different<br>versions of like runability<br>where for the simple things where you're<br>doing things in the span of a few<br>minutes on behalf of the user as they're<br>programming it makes sense to make<br>something work locally in their machine<br>I think for the more aggressive things<br>where you're making larger changes that<br>take longer periods of time you'll<br>probably want to do this in some sandbox<br>remote environment and that's another<br>incredibly tricky problem of how do you<br>exactly reproduce or mostly reproduce to<br>the point of it being effectively<br>equivalent for running code the user's<br>environment which is remote remote<br>sandbox I'm curious what kind of Agents<br>you want for for coding oh do you want<br>them to find bugs do you want them to<br>like Implement new features like what<br>agents do you want so by the way when I<br>think about agents I don't think just<br>about coding uh I think so for the<br>practic this particular podcast there's<br>video editing and a lot of if you look<br>in Adobe a lot there's code behind uh<br>it's very poorly documented code but you<br>can interact with premiere for example<br>using code and basically all the<br>uploading everything I do on YouTube<br>everything as you could probably imagine<br>I do all of that through code and so and<br>including translation and overdubbing<br>all this so I Envision all those kinds<br>of tasks so automating many of the tasks<br>that don't have to do directly with the<br>editing so that okay that's what I was<br>thinking about but in terms of coding I<br>would be fundamentally thinking about<br>bug<br>finding like many levels of kind of bug<br>finding and also bug finding like<br>logical bugs not logical like spiritual<br>bugs or<br>something one's like sort of big<br>directions of implementation that kind<br>of stuff that's Bine on Buck finding<br>yeah I mean it's really interesting that<br>these models are so bad at bug finding<br>uh when just naively prompted to find a<br>bug they're incredibly poorly calibrated<br>even the the smartest models exactly<br>even o even 01 how do you explain that<br>is there a good<br>intuition I think these models are a<br>really strong reflection of the<br>pre-training distribution and you know I<br>do think they they generalize as the<br>loss gets lower and lower but I don't<br>think the the loss and the scale is<br>quite or the loss is low enough such<br>that they're like really fully<br>generalizing in code like the things<br>that we use these things for uh the<br>frontier models that that they're quite<br>good at are really code generation and<br>question answering these things exist in<br>massive quantities and pre-training with<br>all of the code on GitHub on the scale<br>of many many trillions of tokens and<br>questions and answers on things like<br>stack Overflow and maybe GitHub issues<br>and so when you try to push some of<br>these things that really don't exist uh<br>very much online like for example the<br>cursor tap objective of predicting the<br>next edit given the edit's done so far<br>uh the brittleness kind of shows and<br>then bug detection is another great<br>example where there aren't really that<br>many examples of like actually detecting<br>real bugs and then proposing fixes um<br>and the models just kind of like really<br>struggle at it but I think it's a<br>question of transferring the model like<br>in the same way that you get this<br>fantastic transfer um from pre-trained<br>Models uh just on code in general to the<br>cursor tab objective uh you'll see a<br>very very similar thing with generalized<br>models that are really good to code to<br>bug detection it just takes like a<br>little bit of kind of nudging in that<br>direction like to be clear I think they<br>sort of understand code really well like<br>while they're being pre-trained like the<br>representation that's being built up<br>like almost certainly like you know<br>Somewhere In The Stream there's the<br>model knows that maybe there's there's<br>some SK something sketchy going on right<br>it sort of has some sketchiness but<br>actually eliciting this the sketchiness<br>to uh like actually like part part of it<br>is that humans are really calibrated on<br>which bugs are really important it's not<br>just actually it's not just actually<br>saying like there's something sketchy<br>it's like it's just sketchy trivial it's<br>the sketchy like you're going to take<br>the server down it's like like part of<br>it is maybe the cultural knowledge of uh<br>like why is a staff engineer a staff<br>engineer a staff engineer is is good<br>because they know that three years ago<br>like someone wrote a really you know<br>sketchy piece of code that took took the<br>server down and as opposed to like as<br>supposed to maybe it's like you know you<br>just this thing is like an experiment so<br>like a few bugs are fine like you're<br>just trying to experiment and get the<br>feel of the thing and so if the model<br>gets really annoying when you're writing<br>an experiment that's really bad but if<br>you're writing something for super<br>production you're like writing a<br>database right you're you're writing<br>code in post scripts or Linux or<br>whatever like your lineus tals you're<br>you're it's sort of unacceptable to have<br>even a edge case and just having the<br>calibration of<br>like how paranoid is the user like but<br>even then like if you're putting in a<br>maximum paranoia it still just like<br>doesn't quite get it yeah yeah yeah I<br>mean but this is hard for humans too to<br>understand what which line of code is<br>important which is not it's like you I<br>think one of your principles on a<br>website says if if if a code can do a<br>lot of<br>damage one should add a comment that say<br>this this this line of code is is<br>dangerous and all<br>caps 10 times no you say like for every<br>single line of code inside the function<br>you have to and that's quite profound<br>that says something about human beings<br>because the the engineers move on even<br>the same person might just forget how it<br>can sync the Titanic a single function<br>like you don't you might not in it that<br>quite clearly by looking at the single<br>piece of code yeah and I think that that<br>one is also uh partially also for<br>today's AI models where uh if you<br>actually write dangerous dangerous<br>dangerous in every single line like uh<br>the models will pay more attention to<br>that and will be more likely to find<br>bucks in that region that's actually<br>just straight up a really good practice<br>of a labeling code of how much damage<br>this can do yeah I mean it's<br>controversial like some people think<br>it's ugly uh swall well I actually think<br>it's it's like in fact I actually think<br>this one of the things I learned from AR<br>is you know like I sort of aesthetically<br>I don't like it but I think there's<br>certainly something where like it's it's<br>useful for the models and and humans<br>just forget a lot and it's really easy<br>to make a small mistake and cause<br>like bring down you know like just bring<br>down the server and like you like of<br>course we we like test a lot and<br>whatever but there there's always these<br>things that you have to be very careful<br>yeah like with just normal dock strings<br>I think people will often just skim it<br>when making a change and think oh this I<br>I know how to do this um and you kind of<br>really need to point it out to them so<br>that that doesn't slip through<br>yeah you have to be reminded that you<br>could do a lot of<br>damage that's like we don't really think<br>about that like yeah you think about<br>okay how do I figure out how this work<br>so I can improve it you don't think<br>about the other direction that could<br>until until we have formal verification<br>for everything then you can do whatever<br>you want and you you know for certain<br>that you have not introduced a bug if<br>the proof passes but concretely what do<br>you think that future would look like I<br>think um people will just write tests<br>anymore and um the model will suggest<br>like you write a function the model will<br>suggest a spec and you review the spec<br>and uh in the meantime a smart reasoning<br>model computes appr proof that the<br>implementation follows the spec um and I<br>think that happens for for most<br>functions don't you think this gets at a<br>little bit some of the stuff you were<br>talking about earlier with the<br>difficulty of specifying intent for what<br>you want with software um where<br>sometimes it might be because the intent<br>is really hard to specify it's also then<br>going to be really hard to prove that<br>it's actually matching whatever your<br>intent is like you think that spec is<br>hard to<br>generate yeah or just like for a given<br>spec maybe you can I think there is a<br>question of like can you actually do the<br>formal verification like that's like is<br>that possible I think that there's like<br>more to dig into there but then also<br>even if you have this spe if you have<br>this spe how do you you have the spec is<br>the spec written in natural<br>language the spec spec would be formal<br>but how easy would that be so then I<br>think that you care about things that<br>are not going to be easily well<br>specified in the spec language I see I<br>see would be um yeah maybe an argument<br>against formal verification is all you<br>need yeah the worry is there's this<br>massive document replacing replacing<br>something like unitest sure yeah yeah um<br>I think you can probably also evolve the<br>the spec languages to capture some of<br>the things that they don't really<br>capture right now um but yeah I don't<br>know I think it's very exciting and<br>you're speaking not just about like<br>single functions you're speaking about<br>entire code bases I think entire code<br>bases is harder but that that is what I<br>would love to have and I think it should<br>be possible and because you can even<br>there there's like a lot of work<br>recently where uh you can prove formally<br>verify down to the hardware so like<br>through the you formally verify the C<br>code and then you formally verify<br>through the GCC compiler and then<br>through the VAR log down to the hardware<br>um and that's like incredibly big system<br>but it actually works and I think big<br>code bases are are sort of similar in<br>that they're like multi-layered system<br>and um if you can decompose it and<br>formally verify each part then I think<br>it should be possible I think the<br>specification problem is a real problem<br>but how do you handle side effects or<br>how do you handle I guess external<br>dependencies like calling the stripe API<br>maybe stripe would write a spec for<br>their you can't do this for everything<br>like can you do this for everything you<br>use like how do you how do you do it for<br>if there's language mod like maybe maybe<br>like people use language models as<br>Primitives in the programs they write<br>and there's like a dependence on it and<br>like how how do you now include that I<br>think you might be able to prove prove<br>that still prove what about language<br>models I think it it feels possible that<br>you could actually prove that a language<br>model is aligned for example or like you<br>can prove that it actually gives the the<br>right answer um that's the dream yeah<br>that is I mean that's if it's possible<br>your I Have a Dream speech if it's<br>possible that that will certainly help<br>with you know uh making sure your code<br>doesn't have bugs and making sure AI<br>doesn't destroy all of human<br>civilization so the the full spectrum of<br>AI safety to just bug finding uh so you<br>said the models struggle with bug<br>finding what's the Hope You Know My Hope<br>initially is and and I can let Michael<br>Michael chime into to it but was like<br>this<br>um it should you know first help with<br>the stupid bugs like it should very<br>quickly catch the stupid bugs like off<br>by one erors like sometimes you write<br>something in a comment and do the other<br>way it's like very common like I do this<br>I write like less than in a comment and<br>like I maybe write it greater than or<br>something like that and the model is<br>like yeah it looks sketchy like you sure<br>you want to do that uh but eventually it<br>should be able to catch 100 bucks too<br>yeah and I think that it's also<br>important to note that this is having<br>good bug finding models feels necessary<br>to get to the highest reaches of having<br>AI do more and more programming for you<br>where you're going to you know if the AI<br>is building more and more of the system<br>for you you need to not just generate<br>but also verify and without that some of<br>the problems that we've talked about<br>before with programming with these<br>models um will just become untenable um<br>so it's not just for humans like you<br>write a bug I write a bug find the bug<br>for me but it's also being able to to<br>verify the AI code and check it um is<br>really important yeah and then how do<br>you actually do this like we have had a<br>lot of contentious dinner discussions of<br>how do you actually train a bug model<br>but one very popular idea is you know<br>it's kind of potentially easy to<br>introduce a bug than actually finding<br>the bug and so you can train a model to<br>introduce bugs in existing code um and<br>then you can train a reverse bug model<br>then that uh can find find bugs using<br>this synthetic data so that's like one<br>example um but yeah there are lots of<br>ideas for how to also um you can also do<br>a bunch of work not even at the model<br>level of taking the biggest models and<br>then maybe giving them access to a lot<br>of information that's not just the code<br>like it's kind of a hard problem to like<br>stare at a file and be like where's the<br>bug and you know that's that's hard for<br>humans often right and so often you have<br>to to run the code and being able to see<br>things like traces and step through a<br>debugger um there's another whole<br>another Direction where it like kind of<br>tends toward that and it could also be<br>that there are kind of two different<br>product form factors here it could be<br>that you have a really specialty model<br>that's quite fast that's kind of running<br>in the background and trying to spot<br>bugs and it might be that sometimes sort<br>of to arvid's earlier example about you<br>know some nefarious input box bug might<br>be that sometimes you want to like<br>there's you know there's a bug you're<br>not just like checking hypothesis free<br>you're like this is a problem I really<br>want to solve it and you zap that with<br>tons and tons and tons of compute and<br>you're willing to put in like $50 to<br>solve that bug or something even more<br>have you thought about integrating money<br>into this whole thing like I would pay<br>probably a large amount of money for if<br>you found a bug or even generated a code<br>that I really appreciated like I had a<br>moment a few days ago when I started<br>using C were<br>generated uh<br>perfect uh like perfect three functions<br>for interacting with the YouTube API to<br>update captions and uh for localization<br>like different in different languages<br>the API documentation is not very good<br>and the code across like if I I Googled<br>it for a while I couldn't find exactly<br>there's a lot of confusing information<br>and cursor generated perfectly and I was<br>like I just said back I read the code I<br>was like this is correct I tested it<br>it's correct I was like I want a tip on<br>a on a button that goes yeah here's $5<br>one that's really good just to support<br>the company and support what the the<br>interface is and the other is that<br>probably sends a strong signal like good<br>job right so there much stronger signal<br>than just accepting the code right you<br>just actually send like a strong good<br>job that and for bug finding obviously<br>like there's a lot of people<br>you know that would pay a huge amount of<br>money for a bug like a bug bug Bounty<br>thing right is that you guys think about<br>that yeah it's a controversial idea<br>inside the the company I think it sort<br>of depends on how much uh you believe in<br>humanity almost you know like uh I think<br>it would be really cool if like uh you<br>spend nothing to try to find a bug and<br>if it doesn't find a bug you you spend Z<br>and then if it does find a bug uh and<br>you click accept then it also shows like<br>in parenthesis like $1 and so you spend<br>$1 to accept a bug uh and then of course<br>there's worry like okay we spent a lot<br>of computation like maybe people will<br>just copy paste um I think that's a<br>worry um and then there is also the<br>worry that like introducing money into<br>the product makes it like kind of you<br>know like it doesn't feel as fun anymore<br>like you have to like think about money<br>and and you all you want to think about<br>is like the code and so maybe it<br>actually makes more sense to separate it<br>out and like you pay some fee like every<br>month and then you get all of these<br>things for free but there could be a<br>tipping component which is not like it<br>it it still has that like dollar symbol<br>I think it's fine but I I also see the<br>point where like maybe you don't want to<br>introduce it yeah I was going to say the<br>moment that feels like people do this is<br>when they share it when they have this<br>fantastic example they just kind of<br>share it with their friends there is<br>also a potential world where there's a<br>technical solution to this like honor<br>System problem too where if we can get<br>to a place where we understand the<br>output of the system more I mean to the<br>stuff we were talking about with like<br>you know error checking with the LSP and<br>then also running the code but if you<br>could get to a place where you could<br>actually somehow verify oh I have fixed<br>the bug maybe then the the bounty system<br>doesn't need to rely on the honor System<br>Too how much interaction is there<br>between the terminal and the code like<br>how much information is gained from if<br>you if you run the code in the terminal<br>like can you use can you do like a a<br>loop where it runs runs the code and<br>suggests how to change the code if if<br>the code and runtime gives an error is<br>right now there're separate worlds<br>completely like I know you can like do<br>control K inside the terminal to help<br>you write the code you you can use<br>terminal contacts as well uh inside of<br>Jack man kind of everything um we don't<br>have the looping part yet though we<br>suspect something like this could make a<br>lot of sense there's a question of<br>whether it happens in the foreground too<br>or if it happens in the background like<br>what we've been discussing sure the<br>background is pretty cool like we do<br>running the code in different ways plus<br>there's a database side to this which<br>how do you protect it from not modifying<br>the database but<br>okay I mean there's there's certainly<br>cool Solutions there uh there's this new<br>API that is being developed for it's<br>it's not in AWS uh but you know it's it<br>certainly it's I think it's in Planet<br>scale I don't know if Planet scale was<br>the first one you added it's the ability<br>sort of add branches to a database uh<br>which is uh like if you're working on a<br>feature and you want to test against the<br>prod database but you don't actually<br>want to test against the pr database you<br>could sort of add a branch to the<br>database in the way to do that is to add<br>a branch to the WR ahead log uh and<br>there's obviously a lot of technical<br>complexity in doing it correctly I I<br>guess database companies need need need<br>new things to do uh because they have<br>they have they have good databases now<br>uh and and I I think like you know turbo<br>buffer which is which is one of the<br>databases we use as is is going to add<br>hope maybe braning to the to the rad log<br>and and so so maybe maybe the the AI<br>agents will use we'll use branching<br>they'll like test against some branch<br>and it's sort of going to be a<br>requirement for the database to like<br>support branching or something it would<br>be really interesting if you could<br>Branch a file system right yeah I feel<br>like everything needs branching it's<br>like that yeah yeah like that's the<br>problem with the Multiverse<br>[Music]<br>right like if you branch on everything<br>that's like a lot I mean there's there's<br>obviously these like super clever<br>algorithms to make sure that you don't<br>actually sort of use a lot of space or<br>CPU or whatever okay this is a good<br>place to ask about infrastructure so you<br>guys mostly use AWS what what are some<br>interesting details what are some<br>interesting challenges why' you choose<br>AWS why is why is AWS still winning<br>hashtag AWS is just really really good<br>it's really good like um whenever you<br>use an AWS product you just know that<br>it's going to work like it might be<br>absolute hell to go through the steps to<br>set it up um why is the interface so<br>horrible because it's just so good it<br>doesn't need to the nature of<br>winning I think it's exactly it's just<br>nature they winning yeah yeah but AWS<br>you can always trust like it will always<br>work and if there is a problem it's<br>probably your<br>problem yeah okay is there some<br>interesting like challenges to you guys<br>have pretty new startup to get scaling<br>to like to so many people and yeah I<br>think that they're uh it has been an<br>interesting Journey adding you know each<br>extra zero to the request per second you<br>run into all of these with like you know<br>the general components you're using for<br>for caching and databases run into<br>issues as you make things bigger and<br>bigger and now we're at the scale where<br>we get like you know int overflows on<br>our tables and things like that um and<br>then also there have been some custom<br>systems that we've built like for<br>instance our Ral system for um Computing<br>a semantic index of your codebase and<br>answering questions about a codebase<br>that have continually I feel like been<br>one of the the trickier things to scale<br>I I have a few friends who are who are<br>super super senior engineers and one of<br>their sort of lines is like it's it's<br>very hard to predict where systems will<br>break when when you scale them you you<br>you can sort of try to predict in<br>advance but like there's there's always<br>something something weird that's going<br>to happen when when you add this extra Z<br>and you you thought you thought through<br>everything but you didn't actually think<br>through everything uh but I think for<br>that particular system<br>we've so what the the for concrete<br>details the thing we do is obviously we<br>upload um when like we chunk up all of<br>your code and then we send up sort of<br>the code for for embedding and we embed<br>the code and then we store the<br>embeddings uh in a in a database but we<br>don't actually store any of the code and<br>then there's reasons around making sure<br>that<br>we don't introduce client bugs because<br>we're very very paranoid about client<br>bugs we store uh uh much of the details<br>on the server uh like everything is sort<br>of<br>encrypted so one one of the technical<br>challenges is is always making sure that<br>the local index the local codebase state<br>is the same as the state that is on the<br>server and and the way sort of<br>technically we ended up doing that is so<br>for every single file you can you can<br>sort of keep this hash and then for<br>every folder you can sort of keep a hash<br>which is the hash of all of its children<br>and you can sort of recursively do that<br>until the top and why why do something<br>something complicated uh one thing you<br>could do is you could keep a hash for<br>every file then every minute you could<br>try to download the hashes that are on<br>the server figure out what are the files<br>that don't exist on the server maybe<br>just created a new file maybe you just<br>deleted a file maybe you checked out a<br>new branch and try to reconcile the<br>state between the client and the<br>server but that introduces like<br>absolutely ginormous Network overhead<br>both uh both on the client side I mean<br>nobody really wants us to hammer their<br>Wi-Fi all the time if you're using<br>cursor uh but also like I mean it would<br>introduce like ginormous overhead in the<br>database it would sort of be reading<br>this uh tens of terabyte database sort<br>of approaching like 20 terabyt or<br>something database like every second<br>that's just just kind of crazy you<br>definitely don't want to do that so what<br>you do you sort of you just try to<br>reconcile the single hash which is at<br>the root of the project and then if if<br>something mismatches then you go you<br>find where all the things disagree maybe<br>you look at the children and see if the<br>hashes match and if the hashes don't<br>match go look at their children and so<br>on but you only do that in the scenario<br>where things don't match and for most<br>people most of the time the hashes match<br>so it's a kind of like hierarchical<br>reconciliation yeah something like that<br>yeah it's called the Merkel tree yeah<br>Merkel yeah I mean so yeah it's cool to<br>see that you kind of have to think<br>through all these problems and I mean<br>the the point of like the reason it's<br>gotten hard is just because like the<br>number of people using it and you know<br>if some of your customers have really<br>really large code bases uh to the point<br>where we you know we we originally<br>reordered our code base which is which<br>is big but I mean just just not the size<br>of some company that's been there for 20<br>years and sort of has to train enormous<br>number of files and you sort of want to<br>scale that across programmers there's<br>there's all these details where like<br>building the simple thing is easy but<br>scaling it to a lot of people like a lot<br>of companies is is obviously a difficult<br>problem which is sort of you know<br>independent of actually so that's<br>there's part of this scaling our current<br>solution is also you know coming up with<br>new ideas that obviously we're working<br>on uh but then but then scaling all of<br>that in the last few weeks once yeah and<br>there are a lot of clever things like<br>additional things that that go into this<br>indexing system<br>um for example the bottleneck in terms<br>of costs is not storing things in the<br>vector database or the database it's<br>actually embedding the code and you<br>don't want to Reed the code base for<br>every single person in a company that is<br>using the same exact code except for<br>maybe they're in a different branch with<br>a few different files or they've made a<br>few local changes and so because again<br>embeddings are the bottleneck you can do<br>this one clever trick and not have to<br>worry about like the complexity of like<br>dealing with branches and and the other<br>databases where you just have some cash<br>on<br>the actual vectors uh computed from the<br>hash of a given chunk MH and so this<br>means that when the nth person at a<br>company goes into their code base it's<br>it's really really fast and you do all<br>this without actually storing any code<br>on our servers at all no code data<br>stored we just store the vectors in the<br>vector database and the vector cache<br>what's the biggest gains at this time<br>you get from indexing the code base like<br>just out of curiosity like what<br>what benefit users have it seems like<br>longer term there'll be more and more<br>benefit but in the short term just<br>asking questions of the code<br>base uh what what's the use what's the<br>usefulness of that I think the most<br>obvious one is um just you want to find<br>out where something is happening in your<br>large code base and you sort of have a<br>fuzzy memory of okay I want to find the<br>place where we do X um but you don't<br>exactly know what to search for in a<br>normal text search and to ask a chat uh<br>you hit command enter to ask with with<br>the codebase chat and then uh very often<br>it finds the the right place that you<br>were thinking of I think like you like<br>you mentioned in the future I think this<br>only going to get more and more powerful<br>where we're working a lot on improving<br>the quality of our retrieval um and I<br>think the cealing for that is really<br>really much higher than people give a<br>credit for one question that's good to<br>ask here have you considered and why<br>haven't you much done sort of local<br>stuff to where you can do the it seems<br>like everything we just discussed is<br>exceptionally difficult to do to go to<br>go to the cloud you have to think about<br>all these things with the caching and<br>the<br>uh you know large code Bas with a large<br>number of programmers are using the same<br>code base you have to figure out the<br>puzzle of that a lot of it you know<br>most software just does stuff this heavy<br>computational stuff locally so if you<br>consider doing sort of embeddings<br>locally yeah we thought about it and I<br>think it would be cool to do it locally<br>I think it's just really hard and and<br>one thing to keep in mind is that you<br>know uh some of our users use the latest<br>MacBook Pro uh and but most of our users<br>like more than 80% of our users are in<br>Windows machines which uh and and many<br>of them are are not very powerful and<br>and so local models really only works on<br>the on the latest computers and it's<br>also a big overhead to to to build that<br>in and so even if we would like to do<br>that um it's currently not something<br>that we are able to focus on and I think<br>there there are some uh people that that<br>that do that and I think that's great um<br>but especially as models get bigger and<br>bigger and you want to do fancier things<br>with like bigger models it becomes even<br>harder to do it locally yeah and it's<br>not a problem of like weaker computers<br>it's just that for example if you're<br>some big company you have big company<br>code base it's just really hard to<br>process big company code based even on<br>the beefiest MacBook Pros so even if<br>it's not even a matter matter of like if<br>you're if you're just like uh a student<br>or something I think if you're like the<br>best programmer at at a big company<br>you're still going to have a horrible<br>experience if you do everything locally<br>when you could you could do it and sort<br>of scrape by but like again it wouldn't<br>be fun anymore yeah like at approximate<br>nearest neighbors and this massive code<br>base is going to just eat up your memory<br>and your CPU and and and that's and<br>that's just that like let's talk about<br>like also the modeling side where said<br>there are these massive headwinds<br>against uh local models where one uh<br>things seem to move towards Moes which<br>like one benefit is maybe they're more<br>memory bandwidth bound which plays in<br>favor of local uh versus uh using gpus<br>um or using Nvidia gpus but the downside<br>is these models are just bigger in total<br>and you know they're going to need to<br>fit often not even on a single node but<br>multiple nodes um there's no way that's<br>going to fit inside of even really good<br>MacBooks um and I think especially for<br>coding it's not a question as much of<br>like does it clear some bar of like the<br>model's good enough to do these things<br>and then like we're satisfied which may<br>may be the case for other other problems<br>and maybe where local models shine but<br>people are always going to want the best<br>the most intelligent the most capable<br>things and that's going to be really<br>really hard to run for almost all people<br>locally don't you want the the most<br>capable model like you want you want<br>Sonet you and also with o I like how<br>you're pitching<br>me1 would you be satisfied with an<br>inferior model listen I yeah I'm yes I'm<br>one of those but there's some people<br>that like to do stuff locally especially<br>like yeah really there's a whole<br>obviously open source movement that kind<br>of resists and it's good that they exist<br>actually because you want to resist the<br>power centers that are growing are<br>there's actually an alternative to local<br>models uh that I particularly fond of uh<br>I think it's still very much in the<br>research stage but you could imagine um<br>to do homomorphic encryption for<br>language model inference so you encrypt<br>your input on your local machine then<br>you send that up and then um the server<br>uh can use lots of computation they can<br>run models that you cannot run locally<br>on this encrypted data um but they<br>cannot see what the data is and then<br>they send back the answer and you<br>decrypt the answer and only you can see<br>the answer uh so I think uh that's still<br>very much research and all of it is<br>about trying to make the overhead lower<br>because right now the overhead is really<br>big uh but if you can make that happen I<br>think that would be really really cool<br>and I think it would be really really<br>impactful um because I think one thing<br>that's actually kind of worrisome is<br>that as these models get better and<br>better uh they're going to become more<br>and more economically useful and so more<br>and more of the world's information and<br>data uh will th flow through you know<br>one or two centralized actors um and<br>then there are worries about you know<br>there can be traditional hacker attempts<br>but it also creates this kind of scary<br>part where if all of the world's<br>information is flowing through one node<br>in PL text um you can have surveillance<br>in very bad ways and sometimes that will<br>happen for you know in initially will be<br>like good reasons like people will want<br>to try to prot protect against like bad<br>Act using AI models in bad ways and then<br>you will add in some surveillance code<br>and then someone else will come in and<br>you know you're in a slippery slope and<br>then you start uh doing bad things with<br>a lot of the world's data and so I I'm<br>very hopeful that uh we can solve<br>homomorphic encryption for doing privacy<br>preserving machine learning but I would<br>say like that's the challenge we have<br>with all software these days it's<br>like there's so many features that can<br>be provided from the cloud and all of us<br>increasingly rely on it and make our<br>life awesome but there's downsides and<br>that's that's why you rely on really<br>good security to protect from basic<br>attacks but there's also only a small<br>set of companies that are controlling<br>that data you know and they they<br>obviously have leverage and they could<br>be infiltrated in all kinds of ways<br>that's the world we live in yeah I mean<br>the thing I'm just actually quite<br>worried about is sort of the world where<br>mean entropic has this responsible<br>scaling policy and so where we're on<br>like the low low asls which is the<br>entropic security level or whatever uh<br>of like of the models but as we get your<br>like cod and code ASL 3L 4 whatever<br>models uh which are sort of very<br>powerful<br>but for for mostly reasonable security<br>reasons you would want to monitor all<br>the prompts uh but I think I think<br>that's that's sort reasonable and<br>understandable where where everyone is<br>coming from but man it'd be really<br>horrible if if sort of like all the<br>world's information is sort of monitor<br>that heavily it's way too centralized<br>it's like it's like sort of this like<br>really fine line you're walking where on<br>the one side like you don't want the<br>models to go Rogue on the other side<br>like man humans like I I don't know if I<br>if I trust like all the world's<br>information to pass through like three<br>three model providers yeah why do you<br>think it's different than Cloud<br>providers because I<br>think the this is a lot of this data<br>would never have gone to the cloud<br>providers in the in the first place um<br>where this is often like you want to<br>give more data to the eio models you<br>want to give personal data that you<br>would never have put online in the first<br>place uh to these companies or or or to<br>these models um and it also centralizes<br>control uh where right now um for for<br>cloud you can often use your own<br>encryption keys and it like it can't<br>really do much um but here it's just<br>centralized actors that see the exact<br>plain text of<br>everything on the topic of context that<br>that's actually been a friction for me<br>when I'm writing code you know in Python<br>there's a bunch of stuff imported<br>there's a you could probably int it the<br>kind of stuff I would like to include in<br>the context is there like how how hard<br>is it to Auto figure out the<br>context It's Tricky um I think we can do<br>a lot better um at uh Computing the<br>context automatically in the future one<br>thing that's important to not is there<br>are trade-offs with including automatic<br>context so the more context you include<br>for these models um first of all the<br>slower they are and um the more<br>expensive those requests are which means<br>you can then do less model calls and do<br>less fancy stuff in the background also<br>for a lot of these models they get<br>confused if you have a lot of<br>information in the prompt so the bar for<br>um accuracy and for relevance of the<br>context you include should be quite High<br>um but this is already we do some<br>automatic context in some places within<br>the product it's definitely something we<br>want to get a lot better at and um I<br>think that there are a lot of cool ideas<br>to try there um both on the learning<br>better retrieval systems like better<br>edding models better rankers I think<br>that there are also cool academic ideas<br>you know stuff we've tried out<br>internally but also the field is<br>grappling with RIT large about can you<br>get language models to a place where you<br>can actually just have the model itself<br>like understand a new Corpus of<br>information and the most popular talked<br>about version of this is can you make<br>the context Windows infinite then if you<br>make the context Windows infinite can<br>make the model actually pay attention to<br>the infinite context and then after you<br>can make it pay attention to the<br>infinite context to make it somewhat<br>feasible to actually do it can you then<br>do caching for that infinite context you<br>don't have to recompute that all the<br>time but there are other cool ideas that<br>are being tried that are a little bit<br>more analogous to fine-tuning of<br>actually learning this information and<br>the weights of the model and it might be<br>that you actually get sort of a<br>qualitatively different type of<br>understanding if you do it more at the<br>weight level than if you do it at the<br>Inc context learning level I think the<br>journey the jury is still a little bit<br>out on how this is all going to work in<br>the end uh but in the interm US us as a<br>company we are really excited about<br>better retrieval systems and um picking<br>the parts of the code base that are most<br>relevant to what you're doing uh we<br>could do that a lot better like one<br>interesting proof of concept for the<br>learning this knowledge directly in the<br>weights is with vs code so we're in a vs<br>code fork and vs code the code is all<br>public so these models in pre-training<br>have seen all the code um they probably<br>also seen questions and answers about it<br>and then they've been fine tuned and RL<br>Chef to to be able to answer questions<br>about code in general so when you ask it<br>a question about vs code you know<br>sometimes it'll hallucinate but<br>sometimes it actually does a pretty good<br>job at answering the question and I<br>think like this is just by it happens to<br>be okay at it but what if you could<br>actually like specifically train or Post<br>train a model such that it really was<br>built to understand this code base um<br>it's an open research question one that<br>we're quite interested in and then<br>there's also uncertainty of like do you<br>want the model to be the thing that end<br>to end is doing everything I.E it's<br>doing the retrieval in its internals and<br>then kind of answering your question<br>creating the code or do you want to<br>separate the retrieval from the Frontier<br>Model where maybe you know you'll get<br>some really capable models that are much<br>better than like the best open source<br>ones in a handful of months um and then<br>you'll want to separately train a really<br>good open source model to be the<br>retriever to be the thing that feeds in<br>the context um to these larger models<br>can you speak a little more to the post<br>trining a model to understand the code<br>base like what do you what do you mean<br>by that with is this synthetic data<br>direction is this yeah I mean there are<br>many possible ways you could try doing<br>it there's certainly no shortage of<br>ideas um it's just a question of going<br>in and like trying all of them and being<br>empirical about which one works best um<br>you know one one very naive thing is to<br>try to replicate What's Done uh with<br>vscode uh and these Frontier models so<br>let's like continue pre-training some<br>kind of continued pre-training that<br>includes General code data but also<br>throws in a lot of the data of some<br>particular repository that you care<br>about and then in post trainining um<br>meaning in let's just start with<br>instruction fine tuning you have like a<br>normal instruction fine tuning data set<br>about code then you throw in a lot of<br>questions about code in that repository<br>um so you could either get ground truth<br>ones which might be difficult or you<br>could do what you kind of hinted at or<br>suggested using synthetic data um I.E<br>kind of having the model uh ask<br>questions about various re pieces of the<br>code um so you kind of take the pieces<br>of the code then prompt the model or<br>have a model propose a question for that<br>piece of code and then add those as<br>instruction find Uni data points and<br>then in theory this might unlock the<br>models ability to answer questions about<br>that code base let me ask you about open<br>ai1 what do you think is the role of<br>that kind of test time compute system in<br>programming I think test time compute is<br>really really interesting so there's<br>been the pre-training regime which will<br>kind of as you scale up the amount of<br>data and the size of your model get you<br>better and better performance both on<br>loss and then on Downstream benchmarks<br>um and just general performance when we<br>use it for coding or or other tasks um<br>we're starting to hit uh a bit of a data<br>wall meaning it's going to be hard to<br>continue scaling up this regime and so<br>scaling up 10 test time compute is an<br>interesting way of now you know<br>increasing the number of inference time<br>flops that we use but still getting like<br>uh like yeah as you increase the number<br>of flops use inference time getting<br>corresponding uh improvements in in the<br>performance of these models<br>traditionally we just had to literally<br>train a bigger model that always uses uh<br>that always used that many more flops<br>but now we could perhaps use the same<br>siiz model um and run it for longer to<br>be able to get uh an answer at the<br>quality of a much larger model and so<br>the really interesting thing I like<br>about this is there are some problems<br>that perhaps require<br>100 trillion parameter model<br>intelligence trained on 100 trillion<br>tokens um but that's like maybe 1% maybe<br>like 0.1% of all queries so are you<br>going to spend all of this effort all<br>this compute training a model uh that<br>cost that much and then run it so<br>infrequently it feels completely<br>wasteful when instead you get the model<br>that can that is that you train the<br>model that's capable of doing the 99.9%<br>of queries then you have a way of<br>inference time running it longer for<br>those few people that really really want<br>Max<br>intelligence how do you figure out which<br>problem requires what level of<br>intelligence is that possible to<br>dynamically figure out when to use GPT 4<br>when to use like when to use a small<br>model and when you need the the<br>01 I mean yeah that's that's an open<br>research problem certainly uh I don't<br>think anyone's actually cracked this<br>model routing problem quite well uh we'd<br>like to we we have like kind of initial<br>implementations of this for things for<br>something like cursor tab um but at the<br>level of like going between 40 Sonet<br>to1 uh it's a bit trickier perh like<br>there's also a question of like what<br>level of intelligence do you need to<br>determine if the thing is uh too hard<br>for for the the four level model maybe<br>you need the 01 level model um it's<br>really unclear but but you mentioned so<br>there's a there's there's a pre-training<br>process then there's Pro post training<br>and then there's like test time compute<br>that fair does sort of separate where's<br>the biggest gains um well it's weird<br>because like test time compute there's<br>like a whole training strategy needed to<br>get test time compute to work and the<br>Really the other really weird thing<br>about this is no one like outside of the<br>big labs and maybe even just open AI no<br>one really knows how it works like there<br>have been some really interesting papers<br>that uh show hints of what they might be<br>doing and so perhaps they're doing<br>something with research using process<br>reward models but yeah I just I think<br>the issue is we don't quite know exactly<br>what it looks like so it would be hard<br>to kind of comment on like where it fits<br>in I I would put it in post training but<br>maybe like the compute spent for this<br>kind of for getting test time compute to<br>work for a model is going to dwarf<br>pre-training<br>eventually so we don't even know if 0an<br>is using just like Chain of Thought RL<br>we don't know how they're using any of<br>these we don't know anything it's fun to<br>speculate like if you were to uh build a<br>competing model what would you do yeah<br>so one thing to do would be I I think<br>you probably need to train a process<br>reward model which is so maybe we can<br>get into reward models and outcome<br>reward models versus process reward<br>models outcome reward models are the<br>kind of traditional reward models that<br>people are trained for these for for<br>language models language modeling and<br>it's just looking at the final thing so<br>if you're doing some math problem let's<br>look at that final thing you've done<br>everything and let's assign a grade to<br>it How likely we think uh like what's<br>the reward for this this this outcome<br>process reward models Instead try to<br>grade The Chain of Thought and so open<br>AI had some preliminary paper on this I<br>think uh last summer where they use<br>human labelers to get this pretty large<br>several hundred thousand data set of<br>creating chains of thought um um<br>ultimately it feels like I haven't seen<br>anything interesting in the ways that<br>people use process reward models outside<br>of just using it as a means of uh<br>affecting how we choose between a bunch<br>of samples so like what people do uh in<br>all these papers is they sample a bunch<br>of outputs from the language model and<br>then use the process reward models to<br>grade uh all those Generations alongside<br>maybe some other heuristics and then use<br>that to choose the best answer the<br>really interesting thing that people<br>think might work and people want to work<br>is Tre search with these processor re<br>models because if you really can grade<br>every single step of the Chain of<br>Thought then you can kind of Branch out<br>and you know explore multiple Paths of<br>this Chain of Thought and then use these<br>process word models to evaluate how good<br>is this branch that you're<br>taking yeah when the when the quality of<br>the branch is somehow strongly<br>correlated with the quality of the<br>outcome at the very end so like you have<br>a good model of knowing which should<br>take so not just this in the short term<br>and like in the long term yeah and like<br>the interesting work that I think has<br>been done is figuring out how to<br>properly train the process or the<br>interesting work that has been open-<br>sourced and people I think uh talk about<br>is uh how to train the process reward<br>models um maybe in a more automated way<br>um I I could be wrong here could not be<br>mentioning some papers I haven't seen<br>anything super uh that seems to work<br>really well for using the process reward<br>models creatively to do tree search and<br>code um this is kind of an AI safety<br>maybe a bit of a philosophy question so<br>open AI says that they're hiding the<br>Chain of Thought from the user and<br>they've said that that was a difficult<br>decision to make they instead of showing<br>the Chain of Thought they're asking the<br>model to summarize the Chain of Thought<br>they're also in the background saying<br>they're going to monitor the Chain of<br>Thought to make sure the model is not<br>trying to manipulate the user which is a<br>fascinating possibility but anyway what<br>do you think about hiding the Chain of<br>Thought one consideration for open Ai<br>and this is completely speculative could<br>be that they want to make it hard for<br>people to distill these capabilities out<br>of their model it might actually be<br>easier if you had access to that hidden<br>Chain of Thought uh to replicate the<br>technology um because that's pretty<br>important data like seeing seeing the<br>steps that the model took to get to the<br>final result so you can probably train<br>on that also and there was sort of a<br>mirror situation with this with some of<br>the large language model providers and<br>also this is speculation but um some of<br>these apis um used to offer easy access<br>to log probabilities for the tokens that<br>they're generating um and also log<br>probabilities over the promp tokens and<br>then some of these apis took those away<br>uh and again complete speculation but um<br>one of the thoughts is that the the<br>reason those were taken away is if you<br>have access to log probabilities um<br>similar to this hidden train of thought<br>that can give you even more information<br>to to try and distill these capabilities<br>out of the apis out of these biggest<br>models into models you control as an<br>asteris on also the the previous<br>discussion about uh us integrating 01 I<br>think that we're still learning how to<br>use this model so we made o1 available<br>in cursor because like we were when we<br>got the model we were really interested<br>in trying it out I think a lot of<br>programmers are going to be interested<br>in trying it out but um uh 01 is not<br>part of the default cursor experience in<br>any way up um and we still haven't found<br>a way to yet integrate it into an editor<br>in uh into the editor in a way that we<br>we we reach for sort of you know every<br>hour maybe even every day and so I think<br>that the jury's still out on how to how<br>to use the model um and uh I we haven't<br>seen examples yet of of people releasing<br>things where it seems really clear like<br>oh that's that's like now the use case<br>um the obvious one to to turn to is<br>maybe this can make it easier for you to<br>have these background things running<br>right to have these models in Loops to<br>have these models be atic um but we're<br>still um still discovering to be clear<br>we have ideas we just need to we need to<br>try and get something incredibly useful<br>before we we put it out there but it has<br>these significant limitations like even<br>like barring capabilities uh it does not<br>stream and that means it's really really<br>painful to use for things where you want<br>to supervise the output um and instead<br>you're just waiting for the wall text to<br>show up um also it does feel like the<br>early Innings of test time Computing<br>search where it's just like a very very<br>much of V zero um and there's so many<br>things that like like don't feel quite<br>right and I suspect um in parallel to<br>people increasing uh the amount of<br>pre-training data and the size of the<br>models and pre-training and finding<br>tricks there you'll now have this other<br>thread of getting search to work better<br>and<br>better so let me ask you<br>about strawberry tomorrow<br>eyes so it looks like GitHub um co-pilot<br>might be integrating 01 in some kind of<br>way and I think some of the comments are<br>saying this this mean cursor is<br>done I think I saw one comment saying<br>that I saw time to shut down cursory<br>time to shut down<br>cursor so is it time to shut down cursor<br>I think this space is a little bit<br>different from past software spaces over<br>the the 2010s um where I think that the<br>ceiling here is really really really<br>incredibly high and so I think that the<br>best product in 3 to four years will<br>just be so much more useful than the<br>best product today and you can like Wax<br>potic about Moes this and brand that and<br>you know this is our uh Advantage but I<br>think in the end just if you don't have<br>like if you stop innovating on the<br>product you will you will lose and<br>that's also great for startups um that's<br>great for people trying to to enter this<br>Market um because it means you have an<br>opportunity um to win against people who<br>have you know lots of users already by<br>just building something better um and so<br>I think yeah over the next few years<br>it's just about building the best<br>product building the best system and<br>that both comes down to the modeling<br>engine side of things and it also comes<br>down to the to the editing experience<br>yeah I think most of the additional<br>value from cursor versus everything else<br>out there is not just integrating the<br>new model fast like o1 it comes from all<br>of the kind of depth that goes into<br>these custom models that you don't<br>realize are working for you in kind of<br>every facet of the product as well as<br>like the really uh thoughtful ux with<br>every single<br>feature all right uh from that profound<br>answer let's descend back down to the<br>technical you mentioned you have a<br>taxonomy of synthetic data oh yeah uh<br>can you please explain yeah I think uh<br>there are three main kinds of synthetic<br>data the first is so so what is<br>synthetic data first so there's normal<br>data like non- synthetic data which is<br>just data that's naturally created I.E<br>usually it'll be from humans having done<br>things so uh from some human process you<br>get this data synthetic data uh the<br>first one would be distillation so<br>having a language model kind of output<br>tokens or probability distributions over<br>tokens um and then you can train some<br>less capable model on this uh this<br>approach is not going to get you a net<br>like more capable model than the<br>original one that has produced The<br>Tokens um<br>but it's really useful for if there's<br>some capability you want to elicit from<br>some really expensive High latency model<br>you can then that distill that down into<br>some smaller task specific model um the<br>second kind is when like One Direction<br>of the problem is easier than the<br>reverse and so a great example of this<br>is bug detection like we mentioned<br>earlier where it's a lot easier to<br>introduce reasonable looking bugs<br>than it is to actually detect them and<br>this is this is probably the case for<br>humans too um and so what you can do is<br>you can get a model that's not training<br>that much data that's not that smart to<br>introduce a bunch of bugs and code and<br>then you can use that to then train use<br>a synthetic data to train a model that<br>can be really good at detecting bugs um<br>the last category I think is I guess the<br>main one that it feels like the big labs<br>are doing for synthetic data which is um<br>producing texts with language models<br>that can then be verified easily um so<br>like you know extreme example of this is<br>if you have a verification system that<br>can detect if language is Shakespeare<br>level and then you have a bunch of<br>monkeys typing and typewriters like you<br>can eventually get enough training data<br>to train a Shakespeare level language<br>model and I mean this is the case like<br>very much the case for math where<br>verification is is is actually really<br>really easy for formal um formal<br>language<br>and then what you can do is you can have<br>an OKAY model uh generate a ton of roll<br>outs and then choose the ones that you<br>know have actually proved the ground<br>truth theorems and train that further uh<br>there's similar things you can do for<br>code with leode like problems or uh<br>where if you have some set of tests that<br>you know correspond to if if something<br>passes these tests it has actually<br>solved a problem you could do the same<br>thing where we verify that it's passed<br>the test and then train the model the<br>outputs that have passed the tests um I<br>think I think it's going to be a little<br>tricky getting this to work in all<br>domains or just in general like having<br>the perfect verifier feels really really<br>hard to do with just like open-ended<br>miscellaneous tasks you give the model<br>or more like long Horizon tasks even in<br>coding that's cuz you're not as<br>optimistic as Arvid but yeah uh so yeah<br>so that that that third category<br>requires having a verifier yeah<br>verification is it feels like it's best<br>when you know for a fact that it's<br>correct and like then like it wouldn't<br>be like using a language model to verify<br>it would be using tests or uh formal<br>systems or running the thing too doing<br>like the human form of verification<br>where you just do manual quality control<br>yeah yeah but like the the language<br>model version of that where it's like<br>running the thing it's actually<br>understands yeah but yeah no that's sort<br>of somewhere between yeah yeah I think<br>that that's the category that is um most<br>likely to to result in like massive<br>gains what about RL with feedback side<br>rhf versus RL<br>if um what's the role of that in um<br>getting better performance on the<br>models yeah so<br>rhf is when the reward model you use uh<br>is trained from some labels you've<br>collected from humans giving<br>feedback um I think this works if you<br>have the ability to get a ton of human<br>feedback for this kind of task that you<br>care about r r aif is interesting uh<br>because you're kind of depending on like<br>this is actually kind of uh going to<br>it's depending on the constraint that<br>verification is actually a decent bit<br>easier than generation because it feels<br>like okay like what are you doing you're<br>using this language model to look at the<br>language model outputs and then improve<br>the language model but no it actually<br>may work if the language model uh has a<br>much easier time verifying some solution<br>uh than it does generating it then you<br>actually could perhaps get this kind of<br>recursively but I don't think it's going<br>to look exactly like that um the other<br>the other thing you could do<br>is that we kind of do is like a little<br>bit of a mix of rif and rhf where<br>usually the model is actually quite<br>correct and this is in the case of<br>cursor tab at at picking uh between like<br>two possible generations of what is what<br>is what is the better one and then it<br>just needs like a hand a little bit of<br>human nudging with only like on the on<br>the order of 50 100 uh examples um to<br>like kind of align that prior the model<br>has with exactly with what what you want<br>it looks different than I think normal<br>RF we usually usually training these<br>reward models in tons of<br>examples what what's your intuition when<br>you compare generation and verification<br>or generation and<br>ranking is is ranking way easier than<br>generation my intuition would just say<br>yeah it should be like this is kind<br>of going going back<br>to like if you if you believe P does not<br>equal NP then there's this massive class<br>of problems that are much much easier to<br>verify given a proof than actually<br>proving it I wonder if the same thing<br>will prove P not equal to NP or P equal<br>to NP that would be that would be really<br>cool that'd be a whatever Fields<br>metal by AI who gets the credit another<br>open philosophical<br>question I'm<br>I'm I'm actually surprisingly curious<br>what what what like a good betat for one<br>uh one a will get the fields medal will<br>be actually don't is this mon specialty<br>uh I I don't know what a Mon's bed here<br>is oh sorry Nobel Prize or Fields medal<br>first F Metal Fields metal level Feld<br>metal I think Fields metal comes first<br>well you would say that of course but<br>it's also this like isolated system you<br>can verify and no sure like I don't even<br>know if I you don't need to do have much<br>more I felt like the path to get to IMO<br>was a little bit more clear because it<br>already could get a few IMO problems and<br>there are a bunch of like there's a<br>bunch of lwh hang fruit given the<br>literature at the time of like what what<br>tactics people could take I think I'm<br>one much less first in the space of the<br>improving now and to yeah less intuition<br>about how close we are to solving these<br>really really hard open problems so you<br>think you'll be feels mod first it won't<br>be like in U physics or in oh 100% I<br>think I I think I think that's probably<br>more likely like it's probably much more<br>likely that it'll get in yeah yeah yeah<br>well I think it goes to like I don't<br>know like BSD which is a bird when turn<br>di conjecture like remon hypothesis or<br>any one of these like hard hard math<br>problems which just like actually really<br>hard it's sort of unclear what the path<br>to to get even a solution looks like<br>like we we don't even know what a path<br>looks like let alone um and you don't<br>buy the idea that this is like an<br>isolated system and you can actually you<br>have a good reward system and<br>uh it feels like it's easier to train<br>for that I think we might get Fields<br>metal before AGI I think I mean I'd be<br>very<br>happy be very happy but I don't know if<br>I I think 202h<br>2030 feels metal feels metal all right<br>it's uh it feels like forever from now<br>given how fast things have been going um<br>speaking of how fast things have been<br>going let's talk about scaling laws so<br>for people who don't know uh maybe it's<br>good to talk about this<br>whole uh idea of scaling laws what are<br>they where do things stand and where do<br>you think things are going I think it<br>was interesting the original scaling<br>laws paper by open AI was slightly wrong<br>because I think of some uh issues they<br>did with uh learning right schedules uh<br>and then chinchilla showed a more<br>correct version and then from then<br>people have again kind of deviated from<br>doing the computer optimal thing because<br>people people start now optimizing more<br>so for uh making the thing work really<br>well given a given an inference budget<br>and I think there are a lot more<br>Dimensions to these curves than what we<br>originally used of just compute number<br>of uh parameters and data like inference<br>compute is is the obvious one I think<br>context length is another obvious one so<br>if you care like let's say you care<br>about the two things of inference<br>compute and and then uh context window<br>maybe the thing you want to train is<br>some kind of SSM because they're much<br>much cheaper and faster at super super<br>long context and even if maybe it is 10x<br>wor scaling properties during training<br>meaning you have to spend 10x more<br>compute to train the thing to get the<br>same same level of capabilities um it's<br>worth it because you care most about<br>that inference budget for really long<br>context windows so it'll be interesting<br>to see how people kind of play with all<br>these Dimensions so yeah I mean you<br>speak to the multiple Dimensions<br>obviously the original conception was<br>just looking at the variables of the<br>size of the model as measured by<br>parameters and the size of the data as<br>measured by the number of tokens and<br>looking at the ratio of the two yeah and<br>it's it's kind of a compelling notion<br>that there is a number or at least a<br>minimum and it seems like one was<br>emerging um do you still believe that<br>there is a kind of bigger is<br>better I mean I think bigger is<br>certainly better for just raw<br>performance and raw intelligence and raw<br>intelligence I think the the path that<br>people might take is I'm particularly<br>bullish on distillation and like yeah<br>how many knobs can you turn to if we<br>spend like a ton ton of money on<br>training like get the most capable uh<br>cheap model right like really really<br>caring as much as you can because like<br>the the the naive version of caring as<br>much as you can about inference time<br>Compu is what people have already done<br>with like the Llama models are just<br>overtraining the out of 7B models<br>um on way way way more tokens than isal<br>optimal right but if you really care<br>about it maybe thing to do is what Gemma<br>did which is let's just not let's not<br>just train on tokens let's literally<br>train on<br>uh minim minimizing the K Divergence<br>with uh the distribution of Gemma 27b<br>right so knowledge distillation there um<br>and you're spending the compute of<br>literally training this 27 billion model<br>uh billion parameter model on all these<br>tokens just to get out this I don't know<br>smaller model and the distillation gives<br>just a faster model smaller means faster<br>yeah distillation in theory is um I<br>think getting out more signal from the<br>data that you're training on and it's<br>like another it's it's perhaps another<br>way of getting over not like completely<br>over but like partially helping with the<br>data wall where like you only have so<br>much data to train on let's like train<br>this really really big model on all<br>these tokens and we'll distill it into<br>this smaller one and maybe we can get<br>more signal uh per token uh for this for<br>this much smaller model than we would<br>have originally if we trained it so if I<br>gave you1 trillion how would you how<br>would you spend it I mean you can't buy<br>an island or whatever um how would you<br>allocate it in terms of improving the<br>the big model<br>versus maybe paying for HF in the rhf or<br>yeah I think there's a lot of these<br>secrets and details about training these<br>large models that I I I just don't know<br>and are only priv to the large labs and<br>the issue is I would waste a lot of that<br>money if I even attempted this because I<br>wouldn't know those things uh suspending<br>a lot of disbelief and assuming like you<br>had the<br>knowhow um and operate or or if you're<br>saying like you have to operate with<br>like the The Limited information you<br>have now no no no actually I would say<br>you swoop in and you get all the<br>information all the little<br>characteristics all the little<br>parameters all the all the parameters<br>that Define how the thing is trained mhm<br>if we look<br>and how to invest money for the next 5<br>years in terms of maximizing what you<br>called raw intelligence I mean isn't the<br>answer like really simple you just you<br>just try to get as much compute as<br>possible like like at the end of the day<br>all all you need to buy is the gpus and<br>then the researchers can find find all<br>the all like they they can sort of you<br>you can tune whether you want between a<br>big model or a small model like well<br>this gets into the question of like are<br>you really limited by compute and money<br>or are you limited by these other things<br>and I'm more PR to arvid's arvid's<br>belief that we're we're sort of Ideal<br>limited but there's always that like but<br>if you have a lot of computes you can<br>run a lot of experiments so you would<br>run a lot of experiments versus like use<br>that compute to train a gigantic model I<br>would but I I do believe that we are<br>limited in terms of ideas that we have I<br>think yeah because even with all this<br>compute and like you know all the data<br>you could collect in the world than you<br>really are ultimately limited by not<br>even ideas but just like really good<br>engineering like even with all the<br>capital in the world would you really be<br>able to assemble like there aren't that<br>many people in the world who really can<br>like make the difference here um and and<br>there's so much work that goes into<br>research that is just like pure really<br>really hard engineering work um as like<br>a very kind of handwavy example if you<br>look at the original Transformer paper<br>you know how much work was kind of<br>joining together a lot of these really<br>interesting Concepts embedded in the<br>literature versus then going in and<br>writing all the code like maybe the Cuda<br>kernels maybe whatever else I don't know<br>if it ran on gpus or tpus originally<br>such that it actually saturated the GP<br>GPU performance right getting Gomes here<br>to go in and do do all this code right<br>and Nome is like probably one of the<br>best engineers in the world or maybe<br>going a step further like the next<br>generation of models having these things<br>like getting model Paralis to work and<br>scaling it on like you know thousands of<br>or maybe tens of thousands of like v100s<br>which I think gbd3 may have been um<br>there's just so much engineering effort<br>that has to go into all of these things<br>to make it work um if you really brought<br>that cost down<br>to like you know maybe not zero but just<br>made it 10x easier made it super easy<br>for someone with really fantastic ideas<br>to immediately get to the version of<br>like the new architecture they dreamed<br>up that is like getting 50 40% uh<br>utilization on the gpus I think that<br>would just speed up research by a ton I<br>mean I think I think if if you see a<br>clear path to Improvement you you should<br>always sort of take the low hanging<br>fruit first right and I think probably<br>open eye and and all the other labs it<br>did the right thing to pick off the low<br>hanging fruit where the low hanging<br>fruit is like sort<br>of you you could scale up to a GP<br>24.25<br>scale um and and you just keep scaling<br>and and like things things keep getting<br>better and as long as like you there's<br>there's no point of experimenting with<br>new ideas when like everything<br>everything is working and you should<br>sort of bang on and try try to get as<br>much as much juice out as the possible<br>and then and then maybe maybe when you<br>really need new ideas for I think I<br>think if you're if you're spending $10<br>trillion you probably want to spend some<br>you know then actually like reevaluate<br>your ideas like probably your idea<br>Limited at that point I think all of us<br>believe new ideas are probably needed to<br>get you know all the way way there to<br>Ai<br>and all of us also probably believe<br>there exist ways of testing out those<br>ideas at smaller<br>scales um and being fairly confident<br>that they'll play out it's just quite<br>difficult for the labs in their current<br>position to dedicate their very limited<br>research and Engineering talent to<br>exploring all these other ideas when<br>there's like this core thing that will<br>probably improve performance um for some<br>like decent amount of<br>time yeah but also these big Labs like<br>winning so they're just going wild<br>okay so how uh big question looking out<br>into the future you're now at the the<br>center of the programming world how do<br>you think programming the nature<br>programming changes in the next few<br>months in the next year in the next two<br>years the next 5 years 10 years I think<br>we're really excited about a future<br>where the programmer is in the driver's<br>seat for a long time and you've heard us<br>talk about this a little bit but one<br>that<br>emphasizes speed and agency for the<br>programmer and control the ability to<br>modify anything you want to modify the<br>ability to iterate really fast on what<br>you're<br>building<br>and this is a little different I think<br>than where some people um are are<br>jumping to uh in the space where I think<br>one idea that's captivated people is can<br>you talk to your um computer can you<br>have it build software for you as if<br>you're talking to like an engineering<br>department or an engineer over slack and<br>can it just be this this sort of<br>isolated text box and um part of the<br>reason we're not excited about that is<br>you know some of the stuff we've talked<br>about with latency but then a big piece<br>a reason we're not excited about that is<br>because that comes with giving up a lot<br>of control it's much harder to be really<br>specific when you're talking in the text<br>box and um if you're necessarily just<br>going to communicate with a thing like<br>you would be communicating with an<br>engineering department you're actually<br>abdicating tons of tons of really<br>important decisions um to the spot um<br>and this kind of gets at fundamentally<br>what engineering is um I think that some<br>some people who are a little bit more<br>removed from engineering might think of<br>it as you know the spec is completely<br>written out and then the engineers just<br>come and they just Implement and it's<br>just about making the thing happen in<br>code and making the thing um exists um<br>but I think a lot of the the best<br>engineering the engineering we<br>enjoy um involves tons of tiny micro<br>decisions about what exactly you're<br>building and about really hard<br>trade-offs between you know speed and<br>cost and all the other uh things<br>involved in a system and uh we want as<br>long as humans are actually the ones<br>making you know designing the software<br>and the ones um specifying what they<br>want to be built and it's not just like<br>company run by all AIS we think you'll<br>really want the humor the human in a<br>driver seat um dictating these decisions<br>and so there's the jury still out on<br>kind of what that looks like I think<br>that you know one weird idea for what<br>that could look like is it could look<br>like you kind of you can control the<br>level of abstraction you view a codebase<br>at and you can point at specific parts<br>of a codebase that um like maybe you<br>digest a code Base by looking at it in<br>the form of pseudo code and um you can<br>actually edit that pseudo code too and<br>then have changes get made down at the<br>the sort of formal programming level and<br>you keep the like you know you can<br>gestat any piece of logic uh in your<br>software component of programming you<br>keep the inflow text editing component<br>of programming you keep the control of<br>you can even go down into the code you<br>can go at higher levels of abstraction<br>while also giving you these big<br>productivity gains it would be nice if<br>you can go up and down the the<br>abstraction stack yeah and there are a<br>lot of details to figure out there<br>that's sort of a fuzzy idea time will<br>tell if it actually works but these<br>these principles of of control and speed<br>in the human and the driver seat we<br>think are really important um we think<br>for some things like Arvid mentioned<br>before for some styles of programming<br>you can kind of hand it off chapot style<br>you know if you have a bug that's really<br>well specified but that's not most of<br>programming and that's also not most of<br>the programming we think a lot of people<br>value uh what about like the fundamental<br>skill of programming there's a lot of<br>people<br>like young people right now kind of<br>scared like thinking because they like<br>love programming but they're scared<br>about like will I be able to have a<br>future if I pursue this career path do<br>you think the very skill of programming<br>will change fundamentally I actually<br>think this is a really really exciting<br>time to be building software yeah like<br>we remember what programming was like in<br>you know 2013 2012 whatever it was um<br>and there was just so much more Cru and<br>boilerplate and and you know looking up<br>something really gnarly and you know<br>that stuff still exists it's definitely<br>not at zero but programming today is way<br>more fun than back then um it's like<br>we're really getting down to the the<br>Delight concentration and all all the<br>things that really draw people to<br>programming like for instance this<br>element of being able to build things<br>really fast and um speed and also<br>individual control like all those are<br>just being turned up a ton um and so I<br>think it's just going to be I think it's<br>going to be a really really fun time for<br>people who build software um I think<br>that the skills will probably change too<br>I I think that people's taste and<br>creative ideas will be magnified and it<br>will be less<br>about maybe less a little bit about<br>boilerplate text editing maybe even a<br>little bit less about carefulness which<br>I think is really important today if<br>you're a programmer I think it'll be a<br>lot more fun what do you guys think I<br>agree I'm I'm very excited to be able to<br>change like just what one thing that<br>that happened recently was like we<br>wanted to do a relatively big migration<br>to our codebase we were using async<br>local storage in in no. JS which is<br>known to be not very performant and we<br>wanted to migrate to our context object<br>and this is a big migration it affects<br>the entire code base and swall and I<br>spent I don't know five days uh working<br>through this even with today's AI tools<br>and I am really excited for a future<br>where I can just show a couple of<br>examples and then the AI applies that to<br>all of the locations and then it<br>highlights oh this is a new example like<br>what should I do and then I show exactly<br>what to do there and then that can be<br>done in like 10 minutes uh and then you<br>can iterate much much faster then you<br>can then you don't have to think as much<br>up front and stay stand at the<br>Blackboard and like think exactly like<br>how are we going to do this because the<br>cost is so high but you can just try<br>something first and you realize oh this<br>is not actually exactly what I want and<br>then you can change it instantly again<br>after and so yeah I think being a<br>programmer in the future is going to be<br>a lot of fun yeah I I really like that<br>point about it feels like a lot of the<br>time with programming there are two ways<br>you can go about it one is like you<br>think really hard carefully upfront<br>about the best possible way to do it and<br>then you spend your limited time of<br>engineering to actually implement it uh<br>but I much prefer just getting in the<br>code and like you know taking a crack at<br>seeing how it how how it kind of lays<br>out and then<br>iterating really quickly on that that<br>feels more fun um yeah like just<br>speaking to generating the boiler plate<br>is great so you just focus on the<br>difficult design nuanced difficult<br>design decisions migration I feel like<br>this is this is a cool one like it seems<br>like large language models able to<br>basically translate from one programm<br>language to another or like translate<br>like migrate in the general sense of<br>what migrate is um but that's in the<br>current moment so I mean the fear has to<br>do with like okay as these models get<br>better and better then you're doing less<br>and less creative decisions and is it<br>going to kind of move to a place where<br>it's uh you're operating in the design<br>space of natural language where natural<br>language is the main programming<br>language and I guess I could ask that by<br>way of advice like if somebody's<br>interested in programming now what do<br>you think they should<br>learn like to say you guys started in<br>some<br>Java and uh I forget the oh some PHP PHP<br>Objective C Objective C there you go um<br>I mean in the end we all know JavaScript<br>is going to<br>win uh and not typescript it's just it's<br>going to be like vanilla JavaScript it's<br>just going to eat the world and maybe a<br>little bit of PHP and I mean it also<br>brings up the question of like I think<br>Don can has a this idea that some per of<br>the population is Geeks and like there's<br>a particular kind of psychology in mind<br>required for programming and it feels<br>like more and more that's expands the<br>kind of person that should be able to<br>can do great programming might<br>expand I think different people do<br>programming for different reasons but I<br>think the true maybe like the best<br>programmers um are the ones that really<br>love just like absolutely love<br>programming for example there folks in<br>our team who<br>literally when they're they get back<br>from work they go and then they boot up<br>cursor and then they start coding on<br>their side projects for the entire night<br>and they stay till 3:00 a.m. doing that<br>um and when they're sad they they<br>said I just really need to<br>code and I I I think like you know<br>there's there's that level of programmer<br>where like this Obsession and love of<br>programming um I think makes really the<br>best programmers and I think the these<br>types of people<br>will really get into the details of how<br>things work I guess the question I'm<br>asking that exact program I think about<br>that<br>person when you're when the super tab<br>the super awesome praise be the tab is<br>succeeds you keep PR pressing tab that<br>person in the team loves to cursor tab<br>more than anybody else right yeah and<br>it's also not just like like pressing<br>tab is like the just pressing tab that's<br>like the easy way to say it in the The<br>Catch catchphrase you know uh but what<br>you're actually doing when you're<br>pressing tab is that you're you're<br>injecting intent uh all the time while<br>you're doing it you're you're uh<br>sometimes you're rejecting it sometimes<br>you're typing a few more characters um<br>and and that's the way that you're um<br>you're sort of shaping the things that's<br>being created and I I think programming<br>will change a lot to just what is it<br>that you want to make it's sort of<br>higher bandwidth the communication to<br>the computer just becomes higher and<br>higher bandwidth as opposed to like like<br>just typing is much lower bandwidth than<br>than communicating intent I mean this<br>goes to your uh<br>Manifesto titled engineering genius we<br>are an applied research lab building<br>extraordinary productive human AI<br>systems So speaking to this like hybrid<br>element mhm uh to start we're building<br>the engineer of the future a human AI<br>programmer that's an order of magnitude<br>more effective than any one engineer<br>this hybrid engineer will have<br>effortless control over their code base<br>and no low entropy keystrokes they will<br>iterate at the speed of their judgment<br>even in the most complex systems using a<br>combination of AI and human Ingenuity<br>they will outsmart and out engineer the<br>best pure AI systems we are a group of<br>researchers and Engineers we build<br>software and models to invent at the<br>edge of what's useful and what's<br>possible our work has already improved<br>the lives of hundreds of thousands of<br>program<br>and on the way to that will at least<br>make programming more fun so thank you<br>for talking today thank you thanks for<br>having us thank you thank you thanks for<br>listening to this conversation with<br>Michael swall Arvid and Aman to support<br>this podcast please check out our<br>sponsors in the description and now let<br>me leave you with a random funny and<br>perhaps profound programming code I saw<br>on<br>Reddit nothing is as permanent as a<br>temporary solution that<br>works thank you for listening and hope<br>to see you next time - The following is a conversation<br>with the founding members<br>of the Cursor Team,<br>Michael Truell, Sualeh<br>Asif, Arvid Lunnemark,<br>and Aman Sanger.<br>Cursor is a code editor based on VS Code<br>that adds a lot of powerful<br>features for AI-assisted coding.<br>It has captivated the<br>attention and excitement<br>of the programming and AI communities.<br>So I thought, this is<br>an excellent opportunity<br>to dive deep into the<br>role of AI in programming.<br>This is a super technical conversation<br>that is bigger than just<br>about one code editor.<br>It's about the future of programming,<br>and in general, the future<br>of human AI collaboration<br>in designing and engineering complicated<br>and powerful systems.<br>This is the "Lex Fridman Podcast."<br>To support it,<br>please check out our<br>sponsors in the description.<br>And now, dear friends,<br>here's Michael, Sualeh, Arvid and Aman.<br>All right, this is awesome.<br>We have Michael, Aman, Sualeh, Arvid here<br>from the Cursor Team.<br>First up, big ridiculous question.<br>What's the point of a code editor?<br>- So the code editor is largely the place<br>where you build software.<br>And today or for a long<br>time, that's meant the place<br>where you text edit a<br>formal programming language.<br>And for people who aren't programmers,<br>the way to think of a code editor<br>is a really souped up word<br>processor for programmers,<br>where the reason it's souped up<br>is code has a lot of structure.<br>And so the, quote,<br>unquote, "word processor,"<br>the code editor can<br>actually do a lot for you<br>that word processors in the writing space<br>haven't been able to do for<br>people editing texts there.<br>And so that's everything<br>from giving you visual differentiation<br>of the actual tokens in the<br>code so you can scan it quickly,<br>to letting you navigate<br>around the code base,<br>like you're navigating around<br>the internet with hyperlinks,<br>you're going to definitions<br>of things you're using<br>to error checking to<br>catch rudimentary bugs.<br>And so traditionally, that's<br>what a code editor has meant.<br>And I think that what a code editor is,<br>is going to change a lot<br>over the next 10 years<br>as what it means to build software<br>maybe starts to look a bit different.<br>- I think also a code<br>editor should just be fun.<br>- Yes, that is very important,<br>that is very important.<br>And it's actually an underrated aspect<br>of how we decide what to build.<br>A lot of the things that we<br>build and then we try them out,<br>we do an experiment and then<br>we actually throw them out<br>because they're not fun.<br>And so, a big part of being fun<br>is being fast a lot of the time.<br>Fast is fun.<br>- Yeah, fast is. (chuckles)<br>Yeah, that should be a T-shirt.<br>(group chuckling)<br>- Fundamentally, I think one of the things<br>that draws a lot of people to<br>building stuff on computers<br>is this insane iteration speed,<br>where in other disciplines<br>you might be gate capped<br>by resources or the ability.<br>Even the ability to get<br>a large group together<br>and coding is this amazing thing<br>where it's you and the<br>computer and that alone,<br>you can build really cool<br>stuff really quickly.<br>- So for people who don't know,<br>Cursor is this super cool new editor<br>that's a fork of VS Code.<br>It would be interesting<br>to get your explanation<br>of your own journey of editors.<br>I think all of you were big<br>fans of VS Code with Copilot.<br>How did you arrive to VS Code<br>and how did that lead to<br>your journey with Cursor?<br>- Yeah, so I think a lot of us,<br>well, all of us were originally Vim users.<br>- Pure Vim.<br>- Pure Vim, yeah.<br>No Neovim, just pure Vim and a terminal.<br>And at least for myself,<br>it was around the time<br>that Copilot came out,<br>so 2021 that I really wanted to try it.<br>So, I went into VS Code,<br>the only code editor in<br>which it was available,<br>and even though I really<br>enjoyed using Vim,<br>just the experience of<br>Copilot with VS Code<br>was more than good enough<br>to convince me to switch.<br>And so that kind of was the default<br>until we started working on Cursor.<br>- And maybe we should<br>explain what Copilot does.<br>It's a really nice auto complete.<br>As you start writing a thing,<br>it suggests one or two or three lines<br>how to complete the thing.<br>And there's a fun experience in that.<br>You know like when you<br>have a close friendship<br>and your friend completes your sentences?<br>(group chuckles)<br>When it's done well,<br>there's an intimate feeling.<br>There's probably a better<br>word than intimate,<br>but there's a cool feeling of<br>like, "Holy shit, it gets me."<br>(all chuckles)<br>And then, there's an unpleasant feeling<br>when it doesn't get you.<br>And so, there's that kind of friction.<br>But I would say for a lot of people,<br>the feeling that it gets me<br>overpowers that it doesn't.<br>- And, I think, actually one<br>of the underrated aspects<br>of Github Copilot is that<br>even when it's wrong,<br>it's a little bit annoying,<br>but it's not that bad<br>because you just type another character,<br>and then maybe then it gets you,<br>or you type another character<br>and then it gets you.<br>So even when it's wrong,<br>it's not that bad.<br>- Yeah, you can iterate and fix it.<br>I mean, the other underrated<br>part of Copilot for me<br>was just the first real AI product.<br>So the first language<br>model consumer product.<br>- So, Copilot was like<br>the first killer app<br>for LLMs.<br>- Yeah.<br>- Yeah, and the beta was out in 2021.<br>- Right, okay.<br>So, what's the origin story of Cursor?<br>- So around 2020,<br>the scaling loss papers<br>came out from OpenAI,<br>and that was a moment<br>where this looked like<br>clear predictable progress<br>for the field where even if<br>we didn't have any more ideas,<br>it looked like you could make<br>these models a lot better<br>if you had more compute and more data.<br>- By the way, we'll probably<br>talk for three to four hours<br>on the topic of scaling loss.<br>(group chuckling)<br>But just to summarize,<br>it's a paper in a set of<br>papers in a set of ideas<br>that say bigger might be better<br>for model size and data size<br>in the realm of machine learning.<br>- It's bigger and better,<br>but predictably better.<br>- Okay, that's another<br>topic of conversation.<br>- Yes.<br>- Yeah.<br>- So around that time for some of us,<br>there were a lot of<br>conceptual conversations<br>about what's this gonna look like?<br>What's the story gonna be<br>for all these different<br>knowledge worker fields<br>about how they're gonna be made better<br>by this technology getting better?<br>And then, I think, there<br>were a couple of moments<br>where the theoretical gains predicted<br>in that paper started<br>to feel really concrete,<br>and it started to feel like a moment<br>where you could actually<br>go and not do a PhD<br>if you wanted to do useful work in AI.<br>It actually felt like now<br>there was this whole set<br>of systems one could build<br>that were really useful.<br>And I think that the first moment<br>we already talked about a little bit,<br>which was playing with<br>the early beta of Copilot,<br>that was awesome and magical.<br>I think that the next big moment<br>where everything kind of clicked together<br>was actually getting<br>early access to GPT-4.<br>So, it was sort of end of 2022<br>was when we were<br>tinkering with that model,<br>and the step-upping<br>capabilities felt enormous.<br>And previous to that,<br>we had been working on a<br>couple of different projects.<br>Because of Copilot,<br>because of scaling odds,<br>because of our prior<br>interest in the technology,<br>we had been tinkering around<br>with tools for programmers,<br>but things that are very specific.<br>So, we were building tools<br>for financial professionals<br>who have to work within a Jupyter Notebook<br>or playing around with<br>can you do static analysis<br>with these models?<br>And then, the step-up in GPT-4 felt like,<br>look, that really made<br>concrete the theoretical gains<br>that we had predicted before.<br>It felt like you could build<br>a lot more just immediately<br>at that point in time.<br>And also, if we were being<br>consistent, it really felt like<br>this wasn't just gonna be<br>a point solution thing.<br>This was gonna be all of<br>programming was gonna flow<br>through these models.<br>And it felt like that<br>demanded a different type<br>of programming environment, a<br>different type of programming.<br>And so, we set off to build<br>that larger vision around then.<br>- There's one that I distinctly remember.<br>So, my roommate is an IMO Gold winner<br>and there's a competition<br>in the US called the PUTNAM,<br>which is the IMO for college people,<br>and it's this math competition.<br>It's exceptionally good.<br>So, Shengtong and Aman I<br>remember, sort of June of 2022,<br>had this bet on whether<br>the 2024 June or July,<br>you were going to win a gold<br>medal in the IMO with models.<br>- IMO is the International Math Olympiad.<br>- Yeah, IMO is<br>International Math Olympiad.<br>And so, Arvid and I are<br>both also competing in it.<br>So, it was sort of personal.<br>(group chuckling)<br>And I remember thinking, "Matt,<br>this is not gonna happen."<br>Even though I sort of<br>believed in progress,<br>I thought IMO Gold, Aman is delusional.<br>And to be honest, I mean, I<br>was, to be clear, very wrong.<br>But that was maybe the most<br>prescient bet in the group.<br>- So the new results from DeepMind,<br>it turned out that you were correct.<br>(group chattering)<br>- [Arvid] Technically not.<br>- Technically incorrect<br>but one point away.<br>- Aman was very enthusiastic<br>about this stuff back then.<br>And before, Aman had<br>this scaling loss T-shirt<br>that he would wear around<br>where it had the charts<br>and the formulas on it.<br>- So, you felt the AGI or<br>you felt the scaling loss.<br>- Yeah, I distinctly remember<br>there was this one<br>conversation I had with Michael<br>before I hadn't thought super deeply<br>and critically about scaling laws.<br>And he kind of posed the question,<br>why isn't scaling all you need,<br>or why isn't scaling gonna result<br>in massive gains in progress?<br>And I think I went through<br>the stages of grief.<br>There is anger, denial,<br>and then finally at the end<br>just thinking about it, acceptance.<br>And I think I've been quite hopeful<br>and optimistic about progress since.<br>I think one thing I'll caveat<br>is, I think, it also depends<br>on which domains you're<br>gonna see progress.<br>Math is a great domain<br>especially formal theorem proving<br>because you get this fantastic<br>signal of actually verifying<br>if the thing was correct.<br>And so this means<br>something like RL can<br>work really, really well,<br>and I think you could have systems<br>that are perhaps very superhuman in math<br>and still not technically have AGI.<br>- Okay, so can we take<br>it all the way to Cursor.<br>And what is Cursor?<br>It's a fork of VS Code,<br>and VS Code is one of<br>the most popular editors<br>for a long time.<br>Everybody fell in love with it.<br>Everybody left Vim, I left DMAX for it.<br>Sorry.<br>(all laughing)<br>So, unified in some fundamental<br>way the developer community.<br>And then, you look at the space of things,<br>you look at the scaling<br>laws, AI is becoming amazing.<br>And you decided, okay, it's not enough<br>to just write an extension via VS Code<br>because there's a lot<br>of limitations to that.<br>If AI is gonna keep getting<br>better and better and better,<br>we need to really rethink<br>how the AI is gonna be part<br>of the editing process.<br>And so, you decided to fork VS Code,<br>and start to build a lot<br>of the amazing features<br>we'll be able to talk about.<br>But what was that decision like?<br>Because there's a lot of<br>extensions, including Copilot,<br>of VS Code that are doing<br>sort of AI type stuff.<br>What was the decision<br>like to just fork VS Code?<br>- So the decision to do an<br>editor seemed self-evident to us<br>for at least what we<br>wanted to do and achieve,<br>because when we started<br>working on the editor,<br>the idea was these models<br>are gonna get much better,<br>their capabilities are gonna improve,<br>and it's gonna entirely<br>change how you build software,<br>both in a you will have<br>big productivity gains<br>but also radical and now<br>the active building software<br>is gonna change a lot.<br>And so, you're very limited<br>in the control you have over a code editor<br>if you're a plugin to an<br>existing coding environment,<br>and we didn't wanna get locked<br>in by those limitations.<br>We wanted to be able to just<br>build the most useful stuff.<br>- Okay.<br>Well then, the natural question is,<br>VS Code is kind of with<br>Copilot a competitor,<br>so how do you win?<br>Is it basically just the<br>speed and the quality<br>of the features?<br>- Yeah, I mean, I think this is a space<br>that is quite interesting,<br>perhaps quite unique<br>where if you look at previous tech waves,<br>maybe there's kind of one<br>major thing that happened<br>and it unlocked a new wave of companies,<br>but every single year, every<br>single model capability<br>or jump you get in model capabilities,<br>you now unlock this new wave of features,<br>things that are possible,<br>especially in programming.<br>And so, I think, in AI programming,<br>being even just a few months<br>ahead, let alone a year ahead,<br>makes your product much,<br>much, much more useful.<br>I think the Cursor a year from now<br>will need to make the Cursor<br>of today look obsolete.<br>And I think Microsoft has done<br>a number of fantastic things,<br>but I don't think they're in a great place<br>to really keep innovating<br>and pushing on this in the<br>way that a startup can.<br>- Just rapidly implementing features.<br>- Yeah, and doing the research<br>experimentation necessary<br>to really push the ceiling.<br>- I don't know if I think<br>of it in terms of features<br>as I think of it in terms of<br>capabilities for programmers.<br>As the new o1 model came out,<br>and I'm sure there are<br>gonna be more models<br>of different types, like longer<br>context and maybe faster,<br>there's all these crazy<br>ideas that you can try,<br>and hopefully 10% of the crazy ideas<br>will make it into something<br>kind of cool and useful<br>and we want people to have that sooner.<br>To rephrase, an underrated fact<br>is we're making it for ourself.<br>When we started Cursor,<br>you really felt this<br>frustration that models,<br>you could see models getting better,<br>but the Copilot experience<br>had not changed.<br>It was like, "Man, the<br>ceiling is getting higher,<br>why are they not making new things?<br>They should be making new things.<br>Where's all the alpha features?<br>There were no alpha features."<br>I'm sure it was selling well.<br>I'm sure it was a great business,<br>I'm one of these people<br>that really want to<br>try and use new things,<br>and there was no new thing<br>for a very long while.<br>- Yeah, it's interesting.<br>I don't know how you put that into words,<br>but when you compare<br>a Cursor with Copilot,<br>Copilot pretty quickly<br>started to feel stale<br>for some reason.<br>- Yeah, I think one thing<br>that I think helps us<br>is that we're doing it all in one<br>where we're developing the UX<br>and the way you interact with the model<br>at the same time as we're developing<br>how we actually make the<br>model give better answers.<br>So, how you build up the prompt<br>or how do you find the<br>context and for a Cursor Tab,<br>how do you train the model?<br>So, I think that helps<br>us to have all of it<br>the same people working on the<br>entire experience end to end.<br>- Yeah, it's like the person making the UI<br>and the person training the<br>model sit like 18 feet away.<br>- [Aman] Often the same person even.<br>- Yeah, often even the same person.<br>You can create things that<br>are sort of not possible<br>if you're not talking,<br>you're not experimenting.<br>- And you're using, like you<br>said, Cursor to write Cursor?<br>- Of course.<br>- Oh, yeah.<br>- Yeah.<br>- Well, let's talk about<br>some of these features.<br>Let's talk about the all-knowing,<br>the all-powerful praise be to the Tab,<br>(group chuckles)<br>auto complete on steroids basically.<br>So how does Tab work?<br>What is Tab?<br>- To highlight and<br>summarize at a high level,<br>I'd say that there are two things<br>that Cursor is pretty good at right now.<br>There are other things that it does,<br>but two things that it<br>helps programmers with.<br>One is this idea of<br>looking over your shoulder,<br>and being a really fast colleague<br>who can kind of jump<br>ahead of you, and type,<br>and figure out what you're gonna do next.<br>That was the kernel of the idea<br>behind a good auto complete<br>was predicting what you're gonna do next,<br>but you can make that<br>concept even more ambitious<br>by not just predicting the<br>characters after your Cursor<br>but actually predicting<br>the next entire change<br>you're gonna make, the next diff,<br>next place you're gonna jump to.<br>And the second thing Cursor is<br>pretty good at right now too<br>is helping you sometimes<br>jump ahead of the AI<br>and tell it what to do and<br>go from instructions to code.<br>And on both of those,<br>we've done a lot of work<br>on making the editing experience<br>for those things ergonomic<br>and also making those<br>things smart and fast.<br>- One of the things we really wanted,<br>was we wanted the model to<br>be able to edit code for us.<br>That was kind of a wish and<br>we had multiple attempts at it<br>before we had a good model<br>that could edit code for you.<br>Then after we had a good model,<br>I think there've been a lot of effort<br>to make the inference fast<br>for having a good experience,<br>and we've been starting to incorporate,<br>I mean, Michael sort of<br>mentioned this ability<br>to jump to different places,<br>and that jump to different<br>places I think came<br>from a feeling of once you accept an edit,<br>it's like, "Man, it should<br>be just really obvious<br>where to go next."<br>It's like, "I'd made this change,<br>the model should just know<br>that the next place to<br>go to is 18 lines down."<br>If you're a WIM user, you<br>could press 18JJ or whatever,<br>but why am I doing this?<br>The model should just know it.<br>So the idea was you just press Tab,<br>it would go 18 lines down,<br>and then show you the next<br>edit and you would press Tab,<br>so as long as you could keep pressing Tab.<br>And so the internal competition was,<br>how many Tabs can we make someone press?<br>Once you have the idea, more abstractly,<br>the thing to think about is<br>how are the edits zero entropy?<br>There's no new bits of information<br>to finish your thought,<br>but you still have to type some characters<br>to make the computer understand<br>what you're actually thinking,<br>then maybe the model<br>should just read your mind<br>and all the zero entropy bits<br>should just be tabbed away.<br>That was sort of the abstract version.<br>- There's this interesting thing<br>where if you look at language model loss<br>on different domains, I<br>believe the bits per byte,<br>which is a kind of<br>character normalize loss<br>for code is lower than language,<br>which means in general, there<br>are a lot of tokens in code<br>that are super predictable,<br>a lot of characters that<br>are super predictable.<br>And this is I think even magnified<br>when you're not just trying<br>to auto complete code,<br>but predicting what the<br>user's going to do next<br>in their editing of existing code.<br>And so, the goal of Cursor<br>Tab is let's eliminate<br>all the low entropy actions<br>you take inside of the editor.<br>When the intent is effectively determined,<br>let's just jump you forward<br>in time, skip you forward.<br>- Well, what's the intuition<br>and what's the technical details<br>of how to do next Cursor prediction?<br>That jump, that's not so<br>intuitive I think to people.<br>- Yeah.<br>I think I can speak to<br>a few of the details<br>on how to make these things work.<br>They're incredibly low latency,<br>so you need to train<br>small models on this task.<br>In particular, they're<br>incredibly pre-fill token hungry.<br>What that means is they have these really,<br>really long prompts where<br>they see a lot of your code<br>and they're not actually<br>generating that many tokens.<br>And so, the perfect fit for<br>that is using a sparse model,<br>meaning an MOE model.<br>So that was one breakthrough we made<br>that substantially<br>improved its performance<br>at longer context.<br>The other being a variant<br>of speculative decoding<br>that we built out called<br>speculative edits.<br>These are two, I think, important pieces<br>of what make it quite high<br>quality and very fast.<br>- Okay, so MoE, Mixture of Experts,<br>the input is huge, the output is small.<br>- Yeah.<br>- Okay.<br>Does caching play a role-<br>- Oh, caching plays a huge role.<br>Because you're dealing with<br>this many input tokens,<br>if every single keystroke that<br>you're typing in a given line<br>you had to rerun the model on<br>all of those tokens passed in,<br>you're just going to one,<br>significantly degrade latency,<br>two, you're gonna kill<br>your GPUs with load.<br>So, you need to design<br>the actual prompts you use<br>for the model such that<br>they're caching aware.<br>And then yeah, you need<br>to reuse the KV cache<br>across requests<br>just so that you're spending<br>less work, less compute.<br>- Again, what are the<br>things that Tab is supposed<br>to be able to do in the near<br>term, just to linger on that?<br>Generate code, fill empty space,<br>also edit code across multiple lines,<br>and then jump to different<br>locations inside the same file,<br>and then-<br>- Hopefully,<br>jump to different files also.<br>So if you make an edit in one file,<br>and maybe you have to go to another file<br>to finish your thought,<br>it should go to the second file also.<br>- The full generalization<br>is next action prediction.<br>Sometimes, you need to run<br>a command in the terminal<br>and it should be able to<br>suggest the command based<br>on the code that you wrote too.<br>It suggests something,<br>but it's hard for you<br>to know if it's correct<br>because you actually need some<br>more information to learn.<br>You need to know the<br>type to be able to verify<br>that it's correct.<br>And so maybe it should<br>actually take you to a place<br>that's the definition of something,<br>and then take you back<br>so that you have all<br>the requisite knowledge<br>to be able to accept the next completion.<br>- So providing the human the knowledge.<br>- [Arvid] Yes.<br>- Right.<br>- Mm-hmm, yeah.<br>- I just gotten to know<br>a guy named Primeagen.<br>You can order coffee via SSH.<br>- (chuckles) Oh, yeah.<br>- We did that.<br>- We did that.<br>- So, can also the model do that<br>and provide you with caffeine?<br>Okay, so that's the general framework.<br>- Yeah.<br>- Programming is this weird discipline<br>where sometimes the next<br>five minutes, not always,<br>but sometimes the next five<br>minutes of what you're gonna do<br>is actually predictable from<br>the stuff you've done recently.<br>And so, can you get to a world<br>where that next five minutes<br>either happens by you disengaging<br>and it taking you through?<br>Or maybe a little bit more<br>of just you seeing next step<br>what it's gonna do and you're like,<br>"Okay, that's good, that's<br>good, that's good, that's good,"<br>and you can just tap, tap<br>through these big changes.<br>- As we're talking about this,<br>I should mention one of the really cool<br>and noticeable things about Cursor is that<br>there's this whole diff<br>interface situation going on.<br>So, the model suggests<br>with the red and the green<br>of here's how we're gonna modify the code,<br>and in the chat window you can apply<br>and it shows you the diff<br>and you can accept the diff.<br>So, maybe can you speak to<br>whatever direction of that?<br>- We'll probably have four or<br>five different kinds of diffs.<br>So we have optimized the<br>diff for the auto complete,<br>so that has a different diff interface<br>than when you're reviewing<br>larger blocks of code.<br>And then we're trying to<br>optimize another diff thing<br>for when you're doing<br>multiple different files.<br>And at a high level, the difference is<br>for when you're doing auto-complete,<br>it should be really, really fast to read.<br>Actually, it should be really fast to read<br>in all situations,<br>but in auto-complete your<br>eyes are focused in one area.<br>The humans can't look in<br>too many different places.<br>- So, you're talking about<br>on the interface side?<br>- On the interface side.<br>So it currently has this box on this side.<br>So we have the current box,<br>and it you tries to delete code<br>in some place and tries to add other code,<br>it tries to show you a box on the side.<br>- You can maybe show it if<br>we pull it up in cursor.com.<br>This is what we're talking.<br>- So that box-<br>- Exactly here.<br>- It was like three or<br>four different attempts<br>at trying to make this thing work<br>where first the attempt was<br>this blue crossed out line.<br>So before it was a box on the side,<br>it used to show you the code to delete<br>by showing you Google Docs style,<br>you would see a line through it<br>and then you would see the new code.<br>That was super distracting.<br>There was deletions, there<br>was trying the red highlight.<br>Then the next iteration of<br>it, which is sort of funny,<br>you would hold the, on<br>Mac, the Option button.<br>So, it would highlight a region of code<br>to show you that there<br>might be something coming.<br>So, maybe in this example,<br>the input and the value<br>would all get blue.<br>And the blue was to highlight that the AI<br>had a suggestion for you.<br>So instead of directly<br>showing you the thing,<br>it would just hint that<br>the AI had a suggestion<br>and if you really wanted to see it,<br>you would hold the Option button,<br>and then you would see the new suggestion.<br>And if you release the Option button,<br>you would then see your original code.<br>- Mm-hmm, by the way, that's pretty nice,<br>but you have to know to<br>hold the Option button.<br>- Yeah.<br>- And by the way, I'm not a<br>Mac user, but I got it, Option.<br>It's a button I guess you people have.<br>- Again, it's just not intuitive.<br>I think that's the key thing.<br>- And there's a chance<br>this is also not the final version of it.<br>- I am personally very excited<br>for making a lot of<br>improvements in this area.<br>We often talk about it as<br>the verification problem<br>where these diffs are<br>great for small edits.<br>For large edits or when it's<br>multiple files or something,<br>it's actually a little bit prohibitive<br>to review these diffs.<br>So, there are a couple<br>of different ideas here.<br>One idea that we have is,<br>okay, parts of the diffs are important.<br>They have a lot of information.<br>And then parts of the diff<br>are just very low entropy.<br>They're the same thing<br>over and over again.<br>And so maybe you can<br>highlight the important pieces<br>and then gray out the<br>not so important pieces.<br>Or maybe you can have a<br>model that looks at the diff<br>and sees, oh, there's a likely bug here.<br>I will mark this with a<br>little red squiggly and say,<br>"You should probably review<br>this part of the diff."<br>Ideas in that vein I think are exciting.<br>- Yeah, that's a really fascinating space<br>of UX design engineering.<br>So you're, basically, trying<br>to guide the human programmer<br>through all the things they need to read<br>and nothing more, optimally.<br>- Yeah, and you want an<br>intelligent model to do it.<br>Currently, diff algorithms,<br>they're just like normal algorithms.<br>There's no intelligence.<br>There's intelligence that went<br>into designing the algorithm,<br>but then you don't care<br>if it's about this thing<br>or this thing as you want<br>the model to do this.<br>- So, I think the<br>general question is like,<br>man, these models are<br>going to get much smarter.<br>As the models get much smarter,<br>changes they will be able<br>to propose are much bigger.<br>So as the changes gets<br>bigger and bigger and bigger,<br>the humans have to do more and more<br>and more verification work.<br>You need to help them out.<br>I don't wanna spend all<br>my time reviewing code.<br>- Can you say a little more<br>across multiple files diff?<br>- Yeah, I mean, so GitHub<br>tries to solve this, right,<br>with code review.<br>When you're doing code review,<br>you're reviewing multiple<br>diffs across multiple files.<br>But like Arvid said earlier,<br>I think you can do much<br>better than code review.<br>Code review kind of sucks.<br>You spend a lot of time<br>trying to grok this code<br>that's often quite unfamiliar to you<br>and it often doesn't even<br>actually catch that many bugs.<br>And I think you can significantly improve<br>that review experience using<br>language models, for example,<br>using the kinds of tricks<br>that Arvid had described<br>of maybe pointing you towards the regions<br>that actually matter.<br>I think also if the code is produced<br>by these language models<br>and it's not produced by someone else.<br>The code review experience is<br>design for both the reviewer<br>and the person that produced the code.<br>In the case where the person<br>that produced the code<br>is a language model,<br>you don't have to care that<br>much about their experience<br>and you can design the entire thing<br>around the reviewer such that<br>the reviewer's job is as fun,<br>as easy, as productive as possible.<br>I think that feels like the<br>issue with just naively trying<br>to make these things<br>look like code review.<br>I think you can be a lot more creative<br>and push the boundary on what's possible.<br>- And just one idea there<br>is, I think ordering matters.<br>Generally, when you review a<br>PR, you have this list of files<br>and you're reviewing<br>them from top to bottom,<br>but you actually wanna<br>understand this part first<br>because that came logically first,<br>and then you want to<br>understand the next part.<br>And you don't want to have<br>to figure out that yourself.<br>You want a model to guide<br>you through the thing.<br>- And is the step of<br>creation going to be more<br>and more natural language,<br>is the goal versus with<br>actual writing the book?<br>- I think sometimes. I don't<br>think it's going to be the case<br>that all of programming<br>will be natural language,<br>and the reason for that is<br>if I'm pair programming with Sualeh,<br>and Sualeh is at the<br>computer and the keyboard,<br>and sometimes if I'm driving,<br>I want to say to Sualeh,<br>"Hey, implement this<br>function," and that works.<br>And then sometimes it's just so annoying<br>to explain to Sualeh<br>what I want him to do,<br>and so I actually take over<br>the keyboard and I show him.<br>I write part of the example<br>and then it makes sense<br>and that's the easiest way to communicate.<br>And so, I think that's<br>also the case for AI.<br>Sometimes the easiest way<br>to communicate with the AI<br>will be to show an example<br>and then it goes and does<br>the thing everywhere else.<br>Or sometimes if you're making<br>a website, for example,<br>the easiest way to show<br>to the AI what you want<br>is not to tell it what to do<br>but drag things around or draw things,<br>and maybe eventually,<br>we will get to brain machine<br>interfaces or whatever<br>and you can understand<br>what you're thinking.<br>And so, I think natural<br>language will have a place.<br>I think it will definitely not be the way<br>most people program most of the time.<br>- I'm really feeling the<br>AGI with this editor.<br>(group chuckling) It<br>feels like there's a lot<br>of machine learning going on underneath.<br>Tell me about some of the ML<br>stuff that makes it all work?<br>- Where Cursor really works<br>via this ensemble of custom models<br>that we've trained alongside<br>the frontier models<br>that are fantastic at the<br>reasoning intense things.<br>And so Cursor Tab, for<br>example, is a great example<br>of where you can specialize<br>this model to be,<br>even better than even frontier models<br>if you look at evals on<br>the task we set it at.<br>The other domain, which it's surprising<br>that it requires custom models<br>but it's necessary and works<br>quite well, is in Apply.<br>The frontier models are quite<br>good at sketching out plans<br>for code and generating<br>rough sketches of the change,<br>but actually, creating diffs is quite hard<br>for frontier models, for<br>your training models.<br>You try to do this with Sonnet,<br>with o1, any frontier model<br>and it really messes up stupid things<br>like counting line numbers,<br>especially in super, super large files.<br>And so what we've done to alleviate this<br>is we let the model sketch<br>out this rough code block<br>that indicates what the change will be<br>and we train a model to then<br>Apply that change to the file.<br>- And we should say that<br>Apply is the model looks<br>at your code, it gives you a<br>really damn good suggestion<br>of what new things to do.<br>And the seemingly for humans trivial step<br>of combining the two, you're<br>saying is not so trivial.<br>- Contrary to popular perception,<br>it is not a deterministic algorithm.<br>- Yeah, I think you see shallow<br>copies of Apply elsewhere<br>and it just breaks most of the time<br>because you think you can try to do<br>some deterministic matching,<br>and then it fails at least 40% of the time<br>and that just results in a<br>terrible product experience.<br>I think in general, this regime of you<br>are going to get smarter<br>and smarter models.<br>So one other thing that Apply lets you do<br>is it lets you use fewer tokens<br>with the most intelligent models.<br>This is both expensive in terms of latency<br>for generating all these tokens and cost.<br>So, you can give this<br>very, very rough sketch<br>and then have your model<br>models go and implement it<br>because it's a much<br>easier task to implement,<br>this very, very sketched out code.<br>And I think that this regime will continue<br>where you can use smarter<br>and smarter models<br>to do the planning and then<br>maybe the implementation details<br>can be handled by the<br>less intelligent ones.<br>Perhaps you'll have maybe o1,<br>maybe it'll be even more capable models<br>given an even higher level plan<br>that is recursively applied by Sauna<br>and then the Apply model.<br>- Maybe we should talk<br>about how to make it fast<br>if you like.<br>Fast is always an interesting detail.<br>- [Arvid] Fast is good.<br>- Yeah, how do you make it fast?<br>- Yeah, so one big component<br>of making it fast is speculative edits.<br>So, speculative edits are a<br>variant of speculative decoding,<br>and maybe it'd be helpful<br>to briefly describe speculative decoding.<br>With speculative decoding,<br>what you do is you can<br>take advantage of the fact<br>that most of the time,<br>and I'll add the caveat<br>that it would be when you're memory bound<br>in language model generation,<br>if you process multiple tokens at once,<br>it is faster than generating<br>one token at a time.<br>So this is the same reason why<br>if you look at tokens per<br>second with prompt tokens<br>versus generated tokens,<br>it's much much faster for prompt tokens.<br>So what we do is instead of using<br>what speculative decoding normally does,<br>which is using a really small model<br>to predict these draft<br>tokens that your larger model<br>will then go in and verify,<br>with code edits, we<br>have a very strong prior<br>of what the existing code will look like,<br>and that prior is literally<br>the same exact code.<br>So you can do is you can just feed chunks<br>of the original code back into the model,<br>and then the model will<br>just pretty much agree<br>most of the time that,<br>"Okay, I'm just gonna<br>spit this code back out."<br>And so, you can process all<br>of those lines in parallel<br>and you just do this with<br>sufficiently many chunks.<br>And then eventually, you'll<br>reach a point of disagreement<br>where the model will now<br>predict text that is different<br>from the ground truth original code.<br>It'll generate those tokens<br>and then we will decide<br>after enough tokens<br>match the original code<br>to re-start speculating in chunks of code.<br>What this actually ends up looking like<br>is just a much faster version<br>of normal editing code.<br>So, it looks like a much faster version<br>of the model rewriting all the code.<br>So, we can use the same exact interface<br>that we use for diffs,<br>but it will just stream down a lot faster.<br>- And then the advantage is<br>that while it's streaming,<br>you can just also start reviewing the code<br>before it's done so there's<br>no big loading screen.<br>Maybe that is part of the advantage.<br>- So, the human can start<br>reading before the thing is done.<br>- I think the interesting<br>riff here is something like,<br>I feel like speculation is a<br>fairly common idea nowadays.<br>It's not only in language models.<br>There's obviously speculation in CPUs<br>and there's speculation for databases<br>and there's speculation<br>all over the place.<br>- Well, let me ask the ridiculous question<br>of which LLM is better at coding?<br>GPT, Claude, who wins in<br>the context of programming?<br>And I'm sure the answer<br>is much more nuanced<br>because it sounds like every single part<br>of this involves a different model.<br>- Yeah, I think there's no model<br>that Pareto dominates others,<br>meaning, it is better in all categories<br>that we think matter, the<br>categories being speed,<br>ability to edit code, ability<br>to process lots of code,<br>long context, a couple of other things<br>and coding capabilities.<br>The one that I'd say right now<br>is just net best is Sonnet.<br>I think this is a consensus opinion.<br>O1's really interesting and<br>it's really good at reasoning.<br>So if you give it really hard<br>programming interview style<br>problems or lead code problems,<br>it can do quite well on them,<br>but it doesn't feel like it<br>understands your rough intent<br>as well as Sonnet does.<br>If you look at a lot of<br>the other frontier models,<br>one qualm I have is it feels like<br>they're not necessarily over,<br>I'm not saying they train on benchmarks,<br>but they perform really<br>well in benchmarks relative<br>to everything that's in the middle.<br>So if you tried on all these benchmarks<br>and things that are in the distribution<br>of the benchmarks they're evaluated on,<br>they'll do really well.<br>But when you push them a<br>little bit outside of that,<br>Sonnet is I think the one that does best<br>at maintaining that same capability.<br>You have the same<br>capability in the benchmark<br>as when you try to instruct<br>it to do anything with coding.<br>- Another ridiculous<br>question is the difference<br>between the normal programming experience<br>versus what benchmarks represent?<br>Where do benchmarks fall<br>short, do you think,<br>when we're evaluating these models?<br>- By the way, that's<br>a really, really hard,<br>critically important detail<br>of how different benchmarks<br>are versus real coding,<br>where real coding, it's<br>not interview style coding.<br>Humans are saying<br>half-broken English sometimes<br>and sometimes you're saying,<br>"Oh, do what I did before."<br>Sometimes you're saying,<br>"Go add this thing and then<br>do this other thing for me<br>and then make this UI element."<br>And then, it's just a lot of<br>things are context dependent.<br>You really want to understand the human<br>and then do what the human<br>wants, as opposed to this,<br>maybe the way to put it abstractly<br>is the interview problems<br>are very well specified.<br>They lean a lot on specification<br>while the human stuff is less specified.<br>- Yeah.<br>I think that this benchmark<br>question is both complicated<br>by what Sualeh just mentioned,<br>and then also what Aman was getting into,<br>there's this problem of the skew<br>between what can you actually model<br>in a benchmark versus real programming,<br>and that can be sometimes<br>hard to encapsulate<br>because it's real programming's very messy<br>and sometimes things<br>aren't super well specified<br>what's correct or what isn't.<br>But then it's also doubly hard<br>because of this public benchmark problem.<br>And that's both because public benchmarks<br>are sometimes hill climbed on,<br>then it's really, really<br>hard to also get the data<br>from the public benchmarks<br>out of the models.<br>And so for instance,<br>one of the most popular<br>agent benchmarks, SWE-Bench,<br>is really, really contaminated<br>in the training data<br>of these foundation models.<br>And so if you ask these foundation models<br>to do a SWE-Bench problem,<br>but you actually don't give<br>them the context of a code base,<br>they can hallucinate the right file pass,<br>they can hallucinate the<br>right function names.<br>And so, it's also just the public aspect<br>of these things is tricky.<br>- Yeah, in that case, it could be trained<br>on the literal issues or<br>pull requests themselves,<br>and maybe the labs will<br>start to do a better job<br>or they've already done a good job<br>at decontaminating those things,<br>but they're not going to<br>omit the actual training data<br>of the repository itself.<br>These are all some of the most<br>popular Python repositories.<br>SimPy is one example.<br>I don't think they're going<br>to handicap their models<br>on SimPy and all these<br>popular Python repositories<br>in order to get true evaluation<br>scores in these benchmarks.<br>- I think that given<br>the dirts in benchmarks,<br>there have been a few<br>interesting crutches that places<br>that build systems with these models<br>or build these models actually use<br>to get a sense of are they going<br>the right direction or not.<br>And in a lot of places,<br>people will actually just have<br>humans play with the things<br>and give qualitative feedback on these.<br>One or two of the<br>foundation model companies,<br>they have people that's<br>a big part of their role.<br>And internally, we also<br>qualitatively assess these models<br>and actually lean on that a lot<br>in addition to private<br>emails that we have.<br>- [Arvid] It's like the vibe.<br>- The vibe, yeah, the vibe.<br>- It's like the vibe.<br>- The vibe benchmark, human<br>benchmark, the humans.<br>You pull in the humans to do a vibe check.<br>- Yeah. (chuckles)<br>- Okay.<br>That's what I do, just<br>reading online forums<br>and Reddit and X.<br>Well, I don't know how to<br>properly load in people's opinions<br>'cause they'll say things like,<br>"I feel like Claude or GPT has<br>gotten dumber," or something.<br>They'll say, "I feel like."<br>And then I sometimes feel like that too,<br>but I wonder if it's the<br>model's problem or mine.<br>- With Claude, there's an<br>interesting take I heard<br>where I think AWS has different chips,<br>and I suspect they have<br>slightly different numerics<br>than Nvidia GPUs,<br>and someone speculated that<br>Claude's degraded performance<br>had to do with maybe using<br>the quantized version<br>that existed on AWS Bedrock<br>versus whatever was<br>running on Anthropics GPUs.<br>- I interview a bunch of people<br>that have conspiracy theories,<br>so I'm glad you spoke to this conspiracy.<br>- Well, it's not like conspiracy<br>theory as much as humans.<br>Humans are humans and<br>there's these details.<br>- [Lex] Yes.<br>- And you're doing this<br>queasy amount of flops<br>and chips are messy and<br>man, you can just have bugs.<br>It's hard to overstate how<br>hard bugs are to avoid.<br>- What's the role of a<br>good prompt in all of this?<br>We mentioned that benchmarks<br>have really structured,<br>well-formulated prompts.<br>What should a human be<br>doing to maximize success<br>and what's the importance<br>of what the humans,<br>you wrote a blog post, you<br>called it Prompt Design.<br>- Yeah, I think it depends<br>on which model you're using,<br>and all of them are slightly different<br>and they respond differently<br>to different prompts,<br>but I think the original GPT-4<br>and the original (indistinct)<br>models last year,<br>they were quite sensitive to the prompts,<br>and they also had a very<br>small context window.<br>And so, we have all of<br>these pieces of information<br>around the code base<br>that would maybe be<br>relevant in the prompt.<br>You have the docs, you have<br>the files that you add,<br>you have the conversation history,<br>and then there's a problem<br>like how do you decide<br>what you actually put in the prompt<br>and when you have a limited space?<br>And even for today's models,<br>even when you have long context,<br>filling out the entire<br>context window means<br>that it's slower.<br>It means that sometimes the<br>model actually gets confused<br>and some models get more<br>confused than others.<br>And we have this one system<br>internally that we call Preempt,<br>which helps us with that a little bit.<br>And I think it was<br>built for the era before<br>where we had 8,000 token contact windows.<br>And it's a little bit similar<br>to when you're making a website.<br>You want it to work on mobile,<br>you want it to work on a desktop screen,<br>and you have this dynamic<br>information which you don't have.<br>For example, if you're<br>designing a print magazine,<br>you know exactly where you can put stuff.<br>But when you have a website<br>or when you have a prompt,<br>you have these inputs and<br>then you need to format them<br>to always work, even if<br>the input is really big,<br>then you might have to cut something down.<br>And so the idea was, okay,<br>let's take some inspiration.<br>What's the best way to design websites?<br>Well, the thing that<br>we really like is React<br>and the declarative approach<br>where you use JSX in JavaScript,<br>and then you declare,<br>"This is what I want and I<br>think this has higher priority<br>or this has higher Z index<br>than something else."<br>And then, you have this<br>rendering engine in web design.<br>It's like Chrome, and in our<br>case it's a preempt renderer,<br>which then fits everything onto the page.<br>And as you declare, decide what you want<br>and then it figures out what you want.<br>And so, we have found<br>that to be quite helpful<br>and I think the role of<br>it has shifted over time<br>where initially it was to fit<br>to these small context windows.<br>Now, it's really useful<br>because it helps us with<br>splitting up the data<br>that goes into the prompt and<br>the actual rendering of it.<br>And so, it's easier to debug<br>because you can change the<br>rendering of the prompt<br>and then try it on old prompts<br>because you have the raw data<br>that went into the prompt,<br>and then you can see, "Did<br>my change actually improve it<br>for this entire eval set?"<br>- So, do you literally prompt with JSX?<br>- Yes, yes.<br>- Yeah.<br>- So it looks like React,<br>there are components.<br>We have one component<br>that's a file component<br>and it takes in the Cursor.<br>Usually, there's one line where<br>the Cursor is in your file<br>and that's probably<br>the most important line<br>because that's the one you're looking at.<br>And so, then you can give priorities,<br>so that line has the highest priority,<br>and then you subtract one for every line<br>that is farther away.<br>And then eventually, when it's rendered,<br>it figures out how many<br>lines can actually fit<br>and it centers around that thing.<br>- That's amazing.<br>- Yeah.<br>- And you can do other fancy things<br>where if you have lots of code blocks<br>from the entire code base,<br>you could use retrieval<br>and things like embedding<br>and re-ranking scores<br>to add priorities for you<br>through these components.<br>- So should humans when<br>they ask questions,<br>also try to use something like that?<br>Would it be beneficial to<br>write JSX in the problem<br>or the whole idea is this<br>should be loose and messy?<br>- I think our goal is that<br>you should just do whatever<br>is the most natural thing for you,<br>and then our job is to figure out<br>how do we actually retrieve<br>the relative event things<br>so that your thinking<br>actually makes sense?<br>- Well, this is the discussion I had<br>with Aravind of Perplexity<br>is his whole idea<br>is you should let the person<br>be as lazy as he wants.<br>- Yeah.<br>- Mm-hmm.<br>- Yeah, that's a beautiful thing,<br>but I feel like you're allowed<br>to ask more of programmers, right?<br>- Yes.<br>- So if you say, "Just do what you want,"<br>I mean, humans are lazy.<br>There's a tension between just being lazy<br>versus provide more as be prompted,<br>almost like the system pressuring you<br>or inspiring you to be articulate.<br>Not in terms of the<br>grammar of the sentences,<br>but in terms of the depth of thoughts<br>that you convey inside the prompts.<br>- I think even as a system gets closer<br>to some level of perfection,<br>often when you ask the<br>model for something,<br>not enough intent is<br>conveyed to know what to do.<br>And there are a few ways<br>to resolve that intent.<br>One is the simple thing of<br>having the model just ask you,<br>"I'm not sure how to do these<br>parts based on your query.<br>Could you clarify that?"<br>I think the other could be maybe<br>if there are five or six<br>possible generations,<br>"Given the uncertainty<br>present in your query so far,<br>why don't we just actually<br>show you all of those<br>and let you pick them?"<br>- How hard is it for the<br>model to choose to talk back?<br>It's hard, how deal with the uncertainty.<br>Do I choose to ask for more information<br>to reduce the ambiguity?<br>- So, I mean, one of the things we do,<br>it's like a recent addition,<br>is try to suggest files that you can add.<br>And while you're typing,<br>one can guess what the uncertainty is<br>and maybe suggest that maybe<br>you're writing your API<br>and we can guess using the commits<br>that you've made<br>previously in the same file<br>that the client and the<br>server is super useful<br>and there's a hard technical problem<br>of how do you resolve<br>it across all commits?<br>Which files are the most important<br>given your current prompt?<br>And we're still initial<br>version is ruled out<br>and I'm sure we can make<br>it much more accurate.<br>It's very experimental, but<br>then the idea is we show you,<br>do you just want to add this<br>file, this file, this file also<br>to tell the model to<br>edit those files for you?<br>Because if maybe you're making the API,<br>you should also edit the<br>client and the server<br>that is using the API and the<br>other one resolving the API.<br>So that would be cool as<br>both there's the phase<br>where you're writing a prompt.<br>Before you even click, "Enter,"<br>maybe we can help resolve<br>some of the uncertainty.<br>- To what degree do you<br>use agentic approaches?<br>How useful are agents?<br>- We think agents are really, really cool.<br>- [Lex] (chuckles) Okay.<br>- I think agents, it's like<br>resembles like a human.<br>You can feel that you're<br>getting closer to AGI<br>because you see a demo where<br>it acts as a human would<br>and it's really, really cool.<br>I think agents are not yet<br>super useful for many things.<br>I think we're getting close<br>to where they will actually be useful.<br>And so, I think there are<br>certain types of tasks<br>where having an agent<br>would be really nice.<br>I would love to have an agent.<br>For example, if we have a bug<br>where you sometimes can't<br>Command+C and Command+V<br>inside our chat input box,<br>and that's a task that's<br>super well specified.<br>I just want to say in two sentences,<br>"This does not work, please fix it."<br>And then I would love to have an agent<br>that just goes off, does it,<br>and then a day later, I come<br>back and I review the thing.<br>- You mean it goes, finds the right file?<br>- Yeah, it finds the right files,<br>it tries to reproduce the bug,<br>it fixes the bug and then it<br>verifies that it's correct.<br>And this could be a process<br>that takes a long time.<br>And so, I think I would love to have that.<br>And then I think a lot of programming,<br>there is often this belief<br>that agents will take<br>over all of programming.<br>I don't think we think<br>that that's the case<br>because a lot of programming,<br>a lot of the value is in iterating,<br>or you don't actually want<br>to specify something upfront<br>because you don't really<br>know what you want<br>until you have seen an initial version<br>and then you want to iterate on that,<br>and then you provide more information.<br>And so, for a lot of programming,<br>I think you actually want<br>a system that's instant,<br>that gives you an initial<br>version instantly back<br>and then you can iterate<br>super, super quickly.<br>- What about something like<br>that recently came out,<br>replica agent, that does also setting up<br>the development environment<br>and solving software packages,<br>configuring everything,<br>configuring the databases<br>and actually deploying the app.<br>Is that also in the set<br>of things you dream about?<br>- I think so.<br>I think that would be really cool.<br>For certain types of programming,<br>it would be really cool.<br>- Is that within scope of Cursor?<br>- Yeah, we aren't actively<br>working on it right now.<br>We want to make the<br>programmer's life easier<br>and more fun and some things<br>are just really tedious<br>and you need to go<br>through a bunch of steps<br>and you want to delegate that to an agent.<br>And then some things you<br>can actually have an agent<br>in the background while you're working.<br>Let's say you have a PR that's<br>both backend and frontend,<br>and you're working in the frontend,<br>and then you can have a background agent<br>that doesn't work and figure<br>out what you're doing.<br>And then, when you get to<br>the backend part of your PR,<br>then you have some initial piece of code<br>that you can iterate on.<br>And so that would also be really cool.<br>- One of the things we<br>already talked about is speed,<br>but I wonder if we can just<br>linger on that some more<br>in the various places that<br>the technical details involved<br>in making this thing really fast.<br>So every single aspect of Cursor,<br>most aspects of Cursor feel really fast.<br>Like I mentioned, the Apply<br>is probably the slowest thing.<br>I'm sorry, the pain on<br>Arvid's face as I say that.<br>- I know.<br>It's a pain, it's a<br>pain that we're feeling<br>and we're working on fixing it.<br>(Arvid and Lex chuckling)<br>- Yeah, it says something that feels,<br>I don't know what it is, like<br>one second or two seconds,<br>that feels slow.<br>That means that actually shows<br>that everything else is<br>just really, really fast.<br>So, is there some technical details about<br>how to make some of these models,<br>how to make the chat fast,<br>how to make the diffs fast?<br>Is there something that<br>just jumps to mind?<br>- Yeah.<br>So, we can go over a lot of<br>the strategies that we use.<br>One interesting thing is cache warming.<br>You're probably going to<br>use some piece of context<br>and you can know that before<br>the user's done typing.<br>So as we discussed before,<br>reusing the KV cache<br>results in lower latency,<br>lower costs, cross requests.<br>So as the user starts typing,<br>you can immediately warm the cache<br>with let's say the current file contents,<br>and then when they press Enter,<br>there's very few tokens it<br>actually has to pre-fill<br>and compute before<br>starting the generation.<br>This will significantly lower TTFT.<br>- Can you explain how KV cache works?<br>- [Aman] Yeah, so the<br>way transformers work.<br>(group chuckling)<br>- I like it.<br>(group chuckling)<br>- The mechanism that allows transformers<br>to not just independently<br>look at each token,<br>but see previous tokens are the keys<br>and values to attention.<br>And generally, the way attention works<br>is you have at your<br>current token some query,<br>and then you've all the keys<br>and values of all your previous tokens,<br>which are some kind of representation<br>that the model stores internally<br>of all the previous tokens<br>in the prompt.<br>And by default, when you're<br>doing a chat, the model has to,<br>for every single token,<br>do this forward pass<br>through the entire model.<br>That's a lot of matrix<br>multiplies that happen,<br>and that is really, really slow.<br>Instead, if you have already done that<br>and you stored the keys and values<br>and you keep that in the GPU,<br>let's say I have to sort<br>it for the last N tokens.<br>If I now wanna compute the output token<br>for the N+1nth token,<br>I don't need to pass those first N tokens<br>through the entire model<br>because I already have<br>all those keys and values.<br>And so, you just need<br>to do the forward pass<br>through that last token.<br>And then when you're doing attention,<br>you're reusing those keys and values<br>that have been computed,<br>which is the only kind of sequential part<br>or sequentially dependent<br>part of the transformer.<br>- Is there higher level caching<br>of caching of the prompts<br>or that kind of stuff<br>that could help?<br>- I see.<br>Yeah, there's other types<br>of caching you can do.<br>One interesting thing that<br>you can do for Cursor Tab<br>is you can basically predict ahead<br>as if the user would've<br>accepted the suggestion<br>and then trigger another request.<br>And so then you've cached,<br>you've done the speculative.<br>It's a mix of speculation<br>and caching, right?<br>Because speculating what would<br>happen if they accepted it.<br>And then you have this value<br>that is cached this suggestion.<br>And then when they press Tab,<br>the next one would be<br>waiting for them immediately.<br>It's a clever heuristic/trick<br>that uses a higher level caching.<br>It feels fast despite there<br>not actually being any changes<br>in the model.<br>- And if you can make<br>the KV cache smaller,<br>one of the advantages you get<br>is like maybe you can speculate even more.<br>Maybe you can guess,<br>"Here's the 10 things<br>that could be useful,<br>predict the next 10,"<br>and then it's possible the<br>user hits the one of the 10.<br>It's much higher chance than<br>the user hits the exact one<br>that you showed them.<br>Maybe they type in other character<br>and hit something else in the cache.<br>The general phenomena here is,<br>I think it's also super useful for RL<br>is maybe a single sample from<br>the model isn't very good,<br>but if you predict 10 different things,<br>turns out that one of the 10<br>that's right is the<br>probability is much higher.<br>There's these passive K<br>curves and part of RL,<br>what RL does is you can exploit<br>this passive K phenomena<br>to make many different predictions.<br>And one way to think about this,<br>the model knows internally<br>has some uncertainty<br>over which of the key things is correct<br>or which of the key things<br>does the human wants?<br>When we RL our Cursor Tab model,<br>one of the things we're<br>doing is we're predicting<br>which of the 100 different<br>suggestions the model produces<br>is more amenable for humans?<br>Which of them do humans<br>more like than other things?<br>Maybe there's something<br>where the model can predict very far ahead<br>versus a little bit, maybe<br>somewhere in the middle.<br>And then you can give<br>a reward to the things<br>that humans would like more<br>and punish the things that it would like,<br>and then train the model<br>to output the suggestions<br>that humans would like more.<br>You have these RL loops<br>that are very useful<br>that exploit these passive K curves.<br>Aman, maybe can go into even more detail.<br>- Yeah, it is a little<br>different than speed,<br>but technically, you tie it back in<br>because you can get away<br>with the smaller model<br>if you RL your smaller model<br>and it gets the same<br>performance as the bigger one.<br>So while I was mentioning stuff about KV,<br>about reducing the size of your KV cache,<br>there are other techniques there as well<br>that are really helpful for speed.<br>So, kind of back in the day,<br>all the way two years ago,<br>people mainly use multi-head attention,<br>and I think there's been a migration<br>towards more efficient attention<br>schemes like group query<br>or multi-query attention,<br>and this is really helpful for<br>then with larger batch sizes<br>being able to generate<br>the tokens much faster.<br>The interesting thing here<br>is this now has no effect<br>on that time to first<br>token pre-fill speed.<br>The thing this matters for<br>is now generating tokens.<br>And why is that?<br>'Cause when you're generating tokens,<br>instead of being bottlenecked<br>by doing these super<br>parallelizable matrix multiplies<br>across all your tokens,<br>you're bottlenecked,<br>for a long context with large batch sizes,<br>by how quickly you can read<br>those cache, keys, and values.<br>And so then that's memory bandwidth,<br>and how can we make this faster?<br>We can try to compress the<br>size of these keys and values.<br>So multi-query attention is<br>the most aggressive of these.<br>Where normally with multi-head attention,<br>you have some number of, quote,<br>unquote, "attention heads,"<br>and some number of query heads.<br>Multi-query just<br>preserves the query heads,<br>gets rid of all the key value heads.<br>So there's only one<br>kind of key value head,<br>and there's all the remaining query heads.<br>With group query, you instead<br>preserve all the query heads.<br>There are fewer heads<br>for the keys and values,<br>but you're not reducing it to just one.<br>But anyways, the whole point here<br>is you're just reducing<br>the size of your KV cache.<br>- And then there is MLA.<br>- Yeah, multi-latent.<br>That's a little more complicated.<br>And the way that this works is<br>it kind of turns the entirety<br>of your keys and values<br>across all your heads<br>into this one latent vector<br>that has then kind of<br>expanded in for its time.<br>- But MLA is from this<br>company called DeepSeek.<br>It's quite an interesting algorithm.<br>Maybe the key idea is in<br>both MQA and in other places,<br>what you're doing is you're<br>reducing the number of KV heads.<br>And the advantage you get from<br>that is there's less of them.<br>You want each of the keys and values<br>to actually be different.<br>So, one way to reduce the size<br>is you keep one big shared vector<br>for all the keys and values,<br>and then you have smaller<br>vectors for every single token.<br>So that you can store the<br>only the smaller thing<br>as some sort of low-rank reduction.<br>At the end of the time,<br>when you eventually wanna<br>compute the final thing,<br>remember that your memory band,<br>which means that you still<br>have some compute left<br>that you can use for these things.<br>And if you can expand the<br>latent vector back out<br>and somehow this is far more efficient<br>because you're reducing, for example,<br>maybe you're reducing vec 32 or something<br>like the size of the<br>vector that you're keeping.<br>- Yeah, there's perhaps some richness<br>in having a separate<br>set of keys and values<br>and query that kind of pairwise match up<br>versus compressing that all into one<br>in that interaction at least.<br>- Okay, and all of that is<br>dealing with being memory bound.<br>- Yeah.<br>- I mean, ultimately,<br>how does that map to the user experience?<br>Trying to get the-<br>- Yeah, the two things that it maps to<br>is you can now make<br>your cache a lot larger<br>because you've less space<br>allocated for the KV cache.<br>You can maybe cache a<br>lot more aggressively<br>in a lot more things, so<br>you get more cache hits,<br>which are helpful for reducing<br>the time to first token<br>for the reasons that were<br>kind of described earlier.<br>And then the second being,<br>when you start doing inference<br>with more and more requests<br>and larger and larger batch sizes,<br>you don't see much of a slowdown<br>as it's generating the<br>tokens at the speed of that.<br>- Well, it also allows you<br>to make your prompt bigger<br>for certain-<br>- Yeah, yeah.<br>So, the size of your KV cache<br>is both the size of all your prompts,<br>multiplied by the number of prompts<br>being processed in parallel.<br>So you could increase either<br>those dimensions, right?<br>The batch size or the size of your prompts<br>without degrading the<br>latency of generating tokens.<br>- Arvid, you wrote a blog post,<br>"Shadow Workspace: Iterating<br>on Code in the Background."<br>So, what's going on?<br>- So, to be clear, we want there<br>to be a lot of stuff<br>happening in the background,<br>and we're experimenting<br>with a lot of things.<br>Right now, we don't have<br>much stuff happening<br>other than the cache warming<br>or figuring out the right context<br>that goes into your command<br>key prompts, for example.<br>But the idea is if you can<br>actually spend computation<br>in the background, then<br>you can help the user<br>maybe at a slightly longer time horizon<br>than just predicting the next few lines<br>that you're gonna make.<br>But actually in the next 10 minutes,<br>what are you going to make?<br>And by doing it in background,<br>you can spend more computation doing that.<br>And so the idea of the Shadow<br>Workspace that we implemented,<br>and we use it internally for experiments<br>is that to actually get advantage<br>of doing stuff in the background,<br>you want some kind of feedback signal<br>to give back to the model<br>because otherwise, you<br>can get higher performance<br>by just letting the<br>model think for longer,<br>and so o1 is a good example of that.<br>But another way you<br>can improve performance<br>is by letting the model<br>iterate and get feedback.<br>And so, one very important<br>piece of feedback<br>when you're a programmer<br>is the language server,<br>which is this thing, it exists<br>for most different languages,<br>and there's a separate<br>language server per language.<br>And it can tell you, "You're<br>using the wrong type here,"<br>and then gives you an error,<br>or it can allow you to go to definition<br>and understands the<br>structure of your code.<br>There is a TypeScript<br>language server developed<br>by the TypeScript people,<br>a Rust language server<br>developed by the Rust people,<br>and then they all interface<br>over the language server<br>protocol to VS Code.<br>So that VS Code doesn't need to have all<br>of the different languages<br>built into VS Code<br>but rather you can use the<br>existing compiler infrastructure.<br>- For linting purposes, what-<br>- It's for linting.<br>It's for going to definition,<br>and for seeing the right<br>types that you're using.<br>- So it's doing type checking also?<br>- Yes, type checking<br>and going to references.<br>And that's like when you're<br>working in a big project,<br>you kind of need that.<br>If you don't have that,<br>it's really hard to code in a big project.<br>- Can you say, again, how<br>that's being used inside Cursor,<br>the language server protocol<br>communication thing?<br>- So it's being used in Cursor<br>to show to the programmer<br>just like in VS Code, but then the idea is<br>you want to show that same<br>information to the models,<br>the IM models, and you<br>want to do that in a way<br>that doesn't affect the user<br>because you want to do it in background.<br>And so the idea behind<br>the Shadow Workspace was,<br>okay, one way we can do this<br>is we spawn a separate window<br>of Cursor that's hidden, and<br>so you can set this flag in it<br>and like turn it's hidden.<br>There is a window but you<br>don't actually see it.<br>And inside of this window,<br>the AI agents can modify code<br>however they want as long<br>as they don't save it<br>because it's still the same folder,<br>and then can get feedback from the linters<br>and go to definition and<br>iterate on their code.<br>- So literally run<br>everything in the background,<br>right, maybe even run the code.<br>- So that's the eventual version<br>and that's what you want.<br>And a lot of the blog<br>post is actually about<br>how do you make that happen<br>because it's a little bit tricky.<br>You want it to be on the user's machine<br>so that it exactly mirrors<br>the user's environment.<br>And then on Linux, you<br>can do this cool thing<br>where you can actually<br>mirror the file system<br>and have the AI make changes to the files,<br>and it thinks that it's<br>operating on the file level,<br>but actually, that's stored in memory<br>and you can create this<br>kernel-like extension<br>to make it work.<br>Whereas on Mac and Windows,<br>it's a little bit more difficult,<br>but it's a fun technical<br>problem, so that's why.<br>- One may be hacky but interesting idea<br>that I like is holding a lock on saving.<br>And so basically, you can<br>then have the language model<br>kind of hold the lock on saving to disk,<br>and then instead of you operating<br>in the ground truth version of the files<br>that are saved to disk,<br>you actually are operating<br>what was the Shadow Workspace before<br>and these unsaved things<br>that only exist in memory<br>that you still get linter<br>errors for, and you can code in.<br>And then when you try to maybe run code,<br>it's just like there's a small<br>warning that there's a lock,<br>and then you kind of<br>will take back the lock<br>from the language server<br>if you're trying to do things concurrently<br>or from the Shadow Workspace<br>if you're trying to do<br>things concurrently.<br>- That's such an exciting<br>future by the way.<br>It's a bit of a tangent,<br>but to allow a model to change files,<br>it's scary for people<br>but it's really cool,<br>to be able to just let the<br>agent do a set of tasks<br>and you come back the next<br>day and kind of observe,<br>like it's a colleague<br>or something like that.<br>- And I think there may be<br>different versions of runability<br>for the simple things<br>where you're doing things<br>in the span of a few minutes<br>on behalf of the user<br>as they're programming, it makes sense<br>to make something work<br>locally in their machine.<br>I think for the more aggressive things<br>where you're making larger changes<br>that take longer periods of time,<br>you'll probably wanna do this<br>in some sandbox remote environment<br>and that's another<br>incredibly tricky problem<br>of how do you exactly reproduce<br>or mostly reproduce to the point of it<br>being effectively<br>equivalent for running code<br>the user's environment<br>with this remote sandbox.<br>- I'm curious what kind of<br>agents you want for coding?<br>Do you want them to find bugs?<br>Do you want them to<br>implement new features?<br>What agents do you want?<br>- So by the way, when<br>I think about agents,<br>I don't think just about coding.<br>I think so for this particular podcast,<br>there's video editing<br>and if you look in Adobe,<br>there's code behind.<br>It's very poorly documented code,<br>but you can interact with<br>Premiere, for example, using code,<br>and basically all the uploading,<br>everything I do on YouTube,<br>everything as you could probably imagine,<br>I do all of that through code<br>and including translation<br>and overdubbing, all of this.<br>So, I envision all of<br>those kinds of tasks.<br>So automating many of the tasks<br>that don't have to do directly<br>with the editing, so that.<br>Okay, that's what I was thinking about.<br>But in terms of coding,<br>I would be fundamentally<br>thinking about bug finding,<br>many levels of kind of bug finding<br>and also bug finding like logical bugs,<br>not logical like spiritual<br>bugs or something.<br>(group chuckling)<br>Ones like big directions<br>of implementation,<br>that kind of stuff.<br>- Magical (indistinct) and bug finding.<br>- Yeah, I mean, it's really interesting<br>that these models are<br>so bad at bug finding<br>when just naively prompted to find a bug.<br>They're incredibly poorly calibrated.<br>- Even the smartest models.<br>- Exactly, even o1.<br>- How do you explain that?<br>Is there a good intuition?<br>- I think these models are<br>really strong reflection<br>of the pre-training distribution,<br>and I do think they generalize<br>as the loss gets lower and lower,<br>but I don't think the loss is low enough<br>such that they're really<br>fully generalizing on code.<br>The things that we use these things for,<br>the frontier models that<br>they're quite good at,<br>are really code generation<br>and question answering.<br>And these things exist in massive<br>quantities in pre-training<br>with all of the code<br>in GitHub on the scale<br>of many, many trillions of<br>tokens and questions and answers<br>on things like stack overflow<br>and maybe GitHub issues.<br>And so, when you try to<br>push one of these things<br>that really don't exist very much online,<br>for example, the Cursor Tab objective<br>of predicting the next edit<br>given the edits done so far,<br>the brittleness kind of shows.<br>And then bug detection<br>is another great example,<br>where there aren't<br>really that many examples<br>of actually detecting real<br>bugs and then proposing fixes<br>and the models just kind<br>of really struggle at it.<br>But I think it's a question<br>of transferring the model<br>in the same way that you<br>get this fantastic transfer<br>from pre-trained models<br>just on code in general<br>to the Cursor Tab objective.<br>You'll see a very, very similar thing<br>with generalized models<br>that are really good at code<br>to bug detection.<br>It just takes a little bit of kind nudging<br>in that direction.<br>- Look, to be clear,<br>I think, they understand code really well.<br>While they're being pre-trained,<br>the representation that's<br>being built up almost certainly<br>like somewhere in the<br>stream, the model knows<br>that maybe there's<br>something sketchy going on.<br>Part of it is that humans<br>are really calibrated<br>on which bugs are really important.<br>It's not just actually saying<br>there's something sketchy.<br>It's like it's this sketchy trivial,<br>it's this sketchy like you're<br>gonna take the server down.<br>Part of it is maybe the cultural knowledge<br>of why is a staff engineer is good<br>because they know that three years ago<br>someone wrote a really<br>sketchy piece of code<br>that took the server down.<br>(group chuckling)<br>This thing is an experiment.<br>So, a few bugs are fine,<br>you're just trying to experiment<br>and get the feel of the thing.<br>And so if the model gets really annoying<br>when you're writing an<br>experiment, that's really bad,<br>but if you're writing<br>something for super production,<br>you're writing a database.<br>You're writing code in<br>Postgres or Linux or whatever.<br>You're Linus Torvalds.<br>It's sort of unacceptable<br>to have even an edge case<br>and just having the calibration<br>of how paranoid is the user.<br>- But even then if you're<br>putting in a maximum paranoia,<br>it still just doesn't quite get it.<br>- Yeah, yeah, yeah.<br>- I mean, but this is hard<br>for humans too to understand<br>which line of code is<br>important, which is not.<br>I think one of your<br>principles on a website says<br>if a code can do a lot of damage,<br>one should add a comment that say,<br>"This line of code is dangerous."<br>- And all caps, repeated 10 times.<br>(group chuckling)<br>- No, you say for every<br>single line of code<br>inside the function you have<br>to, and that's quite profound,<br>that says something about human beings<br>because the engineers move on,<br>even the same person might just forget<br>how it can sink the<br>Titanic a single function.<br>You might not intuit that quite clearly<br>by looking at the single piece of code.<br>- Yeah, and I think that<br>one is partially also<br>for today's AI models<br>where if you actually write<br>dangerous, dangerous, dangerous<br>in every single line,<br>the models will pay more attention to that<br>and will be more likely to<br>find bugs in that region.<br>- That's actually just straight<br>up a really good practice<br>of labeling code of how<br>much damages can do.<br>- Yeah, I mean, it's controversial.<br>Some people think it's ugly.<br>Sualeh does not like it.<br>- In fact, I actually think<br>this is one of the things<br>I learned from Arvid.<br>Aesthetically, I don't like it,<br>but I think there's certainly something<br>where it's useful for the models<br>and humans just forget a lot,<br>and it's really easy to<br>make a small mistake.<br>Just bring down the server.<br>Of course, we test a lot and whatever,<br>but there's always these things<br>that you have to be very careful.<br>- Yeah, like with just normal docstrings,<br>I think people will often just skim it<br>when making a change and think,<br>"Oh, I know how to do this,"<br>and you really need to<br>point it out to them<br>so that doesn't slip through.<br>- Yeah, you have to be reminded<br>that you could do a lot of damage,<br>that's like we don't<br>really think about that.<br>You think about, "Okay, how<br>do I figure out how this works<br>so I can improve it?"<br>You don't think about the<br>other direction that it could-<br>- Until we have formal<br>verification for everything,<br>then you can do whatever you want<br>and you know for certain that<br>you have not introduced a bug<br>if the proof pass.<br>- Well, concretely, what<br>do you think that future<br>would look like?<br>- I think people will just<br>not write to tests anymore.<br>You write a function, the<br>model will suggest a spec,<br>and you review the spec.<br>And in the meantime, smart<br>reasoning model computes a proof<br>that the implementation follows the spec,<br>and I think that happens<br>for most functions.<br>- Do you think this gets at a little bit<br>some of the stuff you<br>were talking about earlier<br>with the difficulty of specifying intent<br>for what you want with software,<br>where sometimes it might be<br>because the intent is<br>really hard to specify,<br>it's also then going to<br>be really hard to prove<br>that it's actually matching<br>whatever your intent is?<br>- You think that spec is hard to generate?<br>- Yeah, or just for a given spec.<br>I think there is a question of,<br>can you actually do the<br>formal verification?<br>Is that possible?<br>I think that there's more to<br>dig into there, but then also-<br>- Even if you have the spec?<br>- If you have the spec-<br>- Even if you have the spec,<br>is the spec written in natural language?<br>Or is it-<br>- No, the spec would be formal.<br>- But how easier would<br>that be (indistinct).<br>- Okay, so then I think<br>that you care about things<br>that are not going to<br>be easily well specified<br>in the spec language.<br>- I see, I see, yeah, yeah.<br>- Would be maybe an argument<br>against formal verification<br>is all you need.<br>- The worry is there's<br>this massive document-<br>- Replacing something<br>like unit tests, sure.<br>- Yeah, yeah.<br>I think you can probably also<br>evolve the spec languages<br>to capture some of the things<br>that they don't really capture right now.<br>I don't know, I think it's very exciting.<br>- And you're speaking not<br>just about single functions,<br>you're speaking about entire code bases.<br>- I think entire code bases is harder,<br>but that is what I would love to have<br>and I think it should be possible.<br>There's a lot of work recently<br>where you can prove formally<br>verified down to the hardware.<br>You formally verify the C code,<br>and then you formally verify<br>through the GCC compiler,<br>and then through the Verilog<br>down to the hardware.<br>And that's incredibly big<br>system, but it actually works.<br>And I think big code bases<br>are sort of similar in that<br>and they're like multi-layered system.<br>And if you can decompose it<br>and formally verify each part,<br>then I think it should be possible.<br>I think this specification<br>problem is a real problem.<br>- How do you handle side<br>effects or how do you handle,<br>I guess, external dependencies<br>like calling the Stripe API?<br>- Maybe Stripe would write<br>a spec for their API.<br>- But you can't do this for everything.<br>Can you do this for everything you use?<br>Maybe people will use<br>language models as primitives<br>in the programs they write,<br>and there's a dependence on it<br>and how do you now include that?<br>- I think you might be<br>able to prove that still.<br>- Prove what about language models?<br>- I think it feels possible<br>that you could actually prove<br>that a language model<br>is aligned, for example,<br>or you can prove that it<br>actually gives the right answer.<br>- That's the dream.<br>- Yeah, I mean, if it's possible.<br>That's your I have a dream speech.<br>If it's possible, that will certainly help<br>with making sure your<br>code doesn't have bugs<br>and making sure AI doesn't<br>destroy all human civilization.<br>So, the full spectrum of AI<br>safety to just bug finding.<br>So, you said the models<br>struggle with bug finding.<br>What's the hope?<br>- My hope initially is, and I<br>can let Michael chime in too,<br>but it was like it should first<br>help with the stupid bugs.<br>It should query quickly,<br>catch the stupid bugs off by one error.<br>Sometimes you write something in a comment<br>and do the other way.<br>It's very common.<br>I do this.<br>I write less than in a comment<br>and I maybe write the greater<br>than or something like that.<br>And the model is like,<br>"Yeah, you looks sketchy.<br>You sure you wanna do that?"<br>But eventually, it should be<br>able to catch harder bugs too.<br>- Yeah, and I think that<br>it's also important to note<br>that having good bug, finding<br>models feels necessary<br>to get to the highest reaches<br>of having AI do more and<br>more programming for you.<br>If AI is building more and<br>more of the system for you,<br>you need to not just<br>generate but also verify.<br>And without that, some of the problems<br>that we've talked about<br>before with programming,<br>with these models will<br>just become untenable.<br>So it's not just for humans<br>like you write a bug,<br>I write a bug, find the bug for me,<br>but it's also being able<br>to verify the AI's code<br>and check it is really important.<br>- Yeah, and then how do<br>you actually do this?<br>We have had a lot of<br>contentious dinner discussions<br>of how do you actually train a bug model,<br>but one very popular idea<br>is it's potentially easy<br>to introduce a bug than<br>actually finding the bug.<br>And so, you can train a<br>model to introduce bugs<br>in existing code,<br>and then you can train<br>a reverse bug model then<br>that can find bugs using<br>this synthetic data.<br>So that's one example,<br>but there are lots of ideas<br>for how to (indistinct).<br>- You can also do a bunch of work<br>not even at the model level<br>of taking the biggest models<br>and then maybe giving them<br>access to a lot of information<br>that's not just the code.<br>It's a hard problem to<br>stare at a file and be like,<br>"Where's the bug?"<br>And that's hard for humans often, right?<br>And so often, you have to run the code<br>and being able to see things like traces<br>and step through a debugger,<br>there's another whole other direction<br>where it tends toward that.<br>- It could also be<br>that there are two different<br>product form factors here.<br>It could be that you have<br>a really specialty model<br>that's quite fast that's<br>running in the background<br>and trying to spot bugs.<br>And it might be that sometimes<br>to Arvid's earlier example<br>about some nefarious input box bug.<br>You know there's a bug,<br>you're not just checking hypothesis free,<br>you're like, "This is a problem,<br>I really wanna solve it,"<br>and you zap that with tons<br>and tons and tons of compute,<br>and you're willing to put<br>in $50 to solve that bug<br>or something even more.<br>- Have you thought about integrating money<br>into this whole thing?<br>I would pay probably a<br>large amount of money<br>if you found a bug or even generated code<br>that I really appreciated.<br>I had a moment a few days ago<br>when I started using Cursor<br>where it generated perfect three functions<br>for interacting with the<br>YouTube API to update captions<br>for localization in different languages.<br>The API documentation is not<br>very good and the code across.<br>I googled it for a while.<br>I couldn't find exactly,<br>there's a lot of confusing information,<br>and Cursor generated perfectly.<br>I just sit back, I read<br>the code, I was like,<br>"This is correct, I<br>tested it, it's correct."<br>I was like, "I wanna tip."<br>I want a button that goes, "Here's $5."<br>One that's really good<br>just to support the company<br>and support what the interface is.<br>And the other is that<br>probably sends a strong signal<br>like good job.<br>(all chuckling)<br>So, there's this much stronger signal<br>than just accepting the code, right?<br>You just actually send a strong good job.<br>That and for bug finding, obviously,<br>there's a lot of people<br>that would pay a huge amount of money<br>for a bug bounty thing, right?<br>You guys think about that?<br>- Yeah, it's a controversial<br>idea inside the company.<br>I think it depends<br>on how much you believe<br>in humanity almost.<br>I think it would be really cool<br>if you spend nothing to try to find a bug.<br>And if it doesn't find<br>a bug, you spend $0.<br>And then if it does find a<br>bug and you click accept,<br>then it also shows in parentheses like $1.<br>And so, you spend $1 to accept the bug.<br>And then, of course, there's a worry like,<br>"Okay, we spent a lot of computation,<br>maybe people will just copy paste."<br>I think that's a worry.<br>Then there is also the<br>worry that introducing money<br>into the product.<br>It doesn't feel as fun anymore.<br>You have to think about money.<br>And all you want to<br>think about is the code,<br>and so maybe it actually makes more sense<br>to separate it out, and you<br>pay some fee every month,<br>and then you get all of<br>these things for free.<br>- But there could be a tipping component<br>which is not like it cost this-<br>- Yes, but it still<br>has that dollar symbol.<br>I think it's fine, but<br>I also see the point<br>where maybe you don't<br>want to introduce it.<br>- Yeah, I was gonna say the moment<br>that feels like people do<br>this is when they share it.<br>When they have this fantastic example,<br>they just share it with their friends.<br>- There is also a potential world<br>where there's a technical solution to this<br>like honor system problem too,<br>where if we can get to a place<br>where we understand the<br>output of the system more,<br>I mean, to the stuff we were talking about<br>with error checking with the LSP<br>and then also running the code.<br>But if you could get to a place<br>where you could actually somehow verify,<br>"Oh, I have fixed the bug,"<br>maybe then the bounty system<br>doesn't need to rely on<br>the honor system too.<br>- How much interaction is there<br>between the terminal and the code?<br>How much information is gained from<br>if you run the code in the terminal?<br>Can you do a loop where it runs the code<br>and suggests how to change the code?<br>If the code and runtime gets an error?<br>Is right now there's<br>separate worlds completely?<br>I know you can do control<br>K inside the terminal<br>to help you write the code.<br>- You can use terminal context as well<br>inside of check Command+K<br>kind of everything.<br>We don't have the looping part yet,<br>so we suspect something like<br>this could make a lot of sense.<br>There's a question of whether it happens<br>in the foreground too or if<br>it happens in the background<br>like what we've been discussing.<br>- Sure, the background's pretty cool.<br>I could be running the<br>code in different ways.<br>Plus there's a database side to this,<br>which how do you protect it<br>from not modifying the database,<br>but okay.<br>(group chuckling)<br>- I mean, there's certainly<br>cool solutions there.<br>There's this new API<br>that is being developed.<br>It's not in AWS, but<br>it certainly, I think,<br>it's in PlanetScale.<br>I don't know if PlanetScale was<br>the first one to you add it.<br>It's this ability sort of<br>add branches to a database,<br>which is like if you're<br>working on a feature<br>and you wanna test against<br>the broad database,<br>but you don't actually want to test<br>against the broad database,<br>you could add a branch to the database.<br>And the way they do that<br>is they add a branch<br>to the write-ahead log.<br>And there's obviously a<br>lot of technical complexity<br>in doing it correctly.<br>I guess database companies<br>need new things to do.<br>(group chuckling)<br>They have good databases now.<br>And I think turbopuffer,<br>which is one of the databases we use,<br>is going to add maybe branching<br>to the write-ahead log.<br>So maybe the AI agents will use branching,<br>they'll test against some branch,<br>and it's gonna be a<br>requirement for the database<br>to support branching or something.<br>- It would be really interesting<br>if you could branch a file system, right?<br>- Yeah.<br>I feel like everything needs branching.<br>- [Aman] Yeah.<br>- Yeah.<br>The problem with the multiverse, right?<br>(group chuckling)<br>If you branch on everything<br>that's like a lot.<br>- There's obviously these<br>super clever algorithms<br>to make sure that you don't<br>actually use a lot of space<br>or CPU or whatever.<br>- Okay, this is a good place<br>to ask about infrastructure.<br>So, you guys mostly use AWS,<br>what are some interesting details?<br>What are some interesting challenges?<br>Why'd you choose AWS?<br>Why is AWS still winning?<br>Hashtag.<br>- AWS is just really, really good.<br>It is really good.<br>Whenever you use an AWS product,<br>you just know that it's going to work.<br>It might be absolute hell<br>to go through the steps<br>to set it up.<br>- Why is the interface so horrible?<br>- Because it's. (chuckles)<br>- It's just so good.<br>It doesn't need to-<br>- It's the nature of winning.<br>(group chuckling)<br>- I think it's exactly, it's<br>just nature they're winning.<br>- Yeah, yeah.<br>But AWS we can always<br>trust, it will always work.<br>And if there is a problem,<br>it's probably your problem.<br>(Lex chuckles)<br>Yeah.<br>- Okay, is there some<br>interesting challenges,<br>you guys are pretty<br>new startup to scaling,<br>to so many people.<br>- Yeah, I think that it has<br>been an interesting journey<br>adding each extra zero to<br>the request per second.<br>(Lex chuckles)<br>You run into all of these<br>with the general components<br>you're using for caching and databases,<br>run into issues as you make<br>things bigger and bigger,<br>and now we're at the scale<br>where we get into overflows<br>on our tables and things like that.<br>And then, also there have<br>been some custom systems<br>that we've built.<br>For instance, our retrieval<br>system for computing,<br>a semantic index of your code<br>base and answering questions<br>about a code base that have continually,<br>I feel like, been one of the<br>trickier things to scale.<br>- I have a few friends who<br>are super senior engineers<br>and one of their lines is,<br>it's very hard to predict<br>where systems will break<br>when you scale them.<br>You can try to predict in advance,<br>but there's always something<br>weird that's gonna happen<br>when you add these extras here.<br>You thought through everything,<br>which you didn't actually<br>think through everything.<br>But I think for that particular system,<br>we chunk up all of your code,<br>and then we send up the code for embedding<br>and we embed the code.<br>And then, we store the<br>embeddings in a database,<br>but we don't actually<br>store any of the code.<br>And then there's reasons<br>around making sure<br>that we don't introduce client bugs<br>because we're very, very<br>paranoid about client bugs.<br>We store much of the<br>details on the server.<br>Everything is encrypted.<br>So, one of the technical<br>challenges is always making sure<br>that the local index, the local<br>code base state is the same<br>as the state that is on the server.<br>The way, technically, we<br>ended up doing that is,<br>for every single file<br>you can keep this hash,<br>and then for every folder<br>you can keep a hash,<br>which is the hash of all of its children.<br>You can recursively do that until the top.<br>Why do something complicated?<br>One thing you could do<br>is you could keep a hash for every file,<br>and every minute, you could<br>try to download the hashes<br>that are on the server,<br>figure out what are the files<br>that don't exist on the server.<br>Maybe you just created a new file,<br>maybe you just deleted a file,<br>maybe you checked out a new branch,<br>and try to reconcile the state<br>between the client and the server.<br>But that introduces absolutely<br>ginormous network overhead<br>both on the client side.<br>Nobody really wants us to<br>hammer their WiFi all the time<br>if you're using Cursor.<br>But also, it would<br>introduce ginormous overhead<br>on the database.<br>It would be reading these<br>tens of terabytes database,<br>approaching 20 terabytes<br>or something data base every second.<br>That's just crazy.<br>You definitely don't wanna do that.<br>So what you do, you just try<br>to reconcile the single hash,<br>which is at the root of the project.<br>And then if something<br>mismatches, then you go,<br>you find where all the things disagree.<br>Maybe you look at the children<br>and see if the hashes match.<br>If the hashes don't match,<br>go look at their children and so on.<br>But you only do that in the scenario<br>where things don't match.<br>For most people, most of<br>the time, the hashes match.<br>- So it's like a<br>hierarchical reconciliation<br>of hashes.<br>- Yeah, something like that.<br>- Yeah, it's called a Merkle tree.<br>- Yeah, Merkle.<br>- Yeah.<br>- Yeah.<br>This is cool to see<br>that you have to think<br>through all these problems.<br>- The reason it's gotten hard<br>is just because the<br>number of people using it<br>and some of your customers<br>have really, really large code bases.<br>We originally reordered dark<br>code base, which is big,<br>but it's just not the size of some company<br>that's been there for 20 years<br>and has a ginormous number of files<br>and you wanna scale<br>that across programmers.<br>There's all these details<br>where building the simple thing is easy,<br>but scaling it to a lot of people,<br>a lot of companies is<br>obviously a difficult problem,<br>which is independent of, actually,<br>so that there's part of this scaling.<br>Our current solution is also<br>coming up with new ideas<br>that, obviously, we're working on,<br>but then scaling all of that<br>in the last few weeks, months.<br>- Yeah.<br>And there are a lot of clever things,<br>additional things that go<br>into this indexing system.<br>For example, the bottleneck<br>in terms of costs<br>is not soaring things<br>in the vector database<br>or the database, it's<br>actually embedding the code.<br>You don't wanna re-embed the code base<br>for every single person in a company<br>that is using the same exact code<br>except for maybe they're<br>a different branch<br>with a few different files<br>or they've made a few local changes.<br>Because again, embeddings<br>are the bottleneck,<br>you can do this one clever trick<br>and not have to worry about the complexity<br>of dealing with branches<br>and the other databases<br>where you just have some cash<br>on the actual vectors computed<br>from the hash of a given chunk.<br>- Mm-hmm.<br>- So this means that when the<br>nth person at a company goes<br>and embed their code base,<br>it's really, really fast.<br>You do all this without<br>actually storing any code<br>on our servers at all.<br>No code data is stored.<br>We just store the vectors<br>in the vector database<br>and the vector cache.<br>- What's the biggest<br>gains at this time you get<br>from indexing the code base?<br>Just out of curiosity,<br>what benefit do users have?<br>It seems like longer term,<br>there'll be more and more<br>benefit, but in the short term,<br>just asking questions of the code base,<br>what's the usefulness of that?<br>- I think the most obvious one<br>is just, you want to find out<br>where something is happening<br>in your large code base,<br>and you have a fuzzy memory of,<br>"Okay, I want to find<br>the place where we do X,"<br>but you don't exactly<br>know what to search for<br>in a normal text search.<br>So you ask a chat, you hit Command+Enter<br>to ask with the code base chat.<br>And then very often, it<br>finds the right place<br>that you were thinking of.<br>- Like you mentioned, in the future,<br>I think there's only going to<br>get more and more powerful,<br>where we're working a lot<br>on improving the quality<br>of our retrieval.<br>I think the ceiling for that<br>is really, really much higher<br>than people give the credit for.<br>- One question that's good to<br>ask here, have you considered<br>and why haven't you much done local stuff,<br>it seems like everything<br>was just discussed<br>as exceptionally difficult to do.<br>To go to the cloud, you have<br>to think about all these things<br>with the caching and the large code base<br>where a large number of<br>programmers are using<br>the same code base.<br>You have to figure out the puzzle of that.<br>A lot of it, most software<br>just does this heavy<br>computational stuff locally.<br>So, have you considered<br>doing embeddings locally?<br>- Yeah, we thought about it,<br>and I think it would be<br>cool to do it locally.<br>I think it's just really hard.<br>One thing to keep in mind<br>is that some of our users<br>use the latest MacBook Pro,<br>but most of our users,<br>more than 80% of our users<br>are in Windows machines,<br>which many of them are not very powerful.<br>So, local models really only<br>works on the latest computers,<br>and it's also a big<br>overhead to build that in.<br>So even if we would like to do that,<br>it's currently not something<br>that we are able to focus on.<br>I think there are some<br>people that do that,<br>and I think that's great,<br>but especially as models<br>get bigger and bigger<br>and you want to do fancier<br>things with bigger models,<br>it becomes even harder to do it locally.<br>- Yeah, it's not a problem<br>of weaker computers.<br>It's just that for example,<br>if you're some big company,<br>you have big company code base.<br>It's just really hard to<br>process big company code base<br>even on the beefiest MacBook Pros.<br>It's not even a matter of<br>if you're just a student<br>or something.<br>I think, if you're the best<br>programmer at a big company,<br>you're still gonna have<br>a horrible experience.<br>If you do everything locally<br>where you could do it<br>and scrape by, but again,<br>it wouldn't be fun anymore.<br>- Yeah, like at approximate<br>nearest neighbors<br>and this massive code base is<br>gonna just eat up your memory<br>and your CPU, and it's based off of that.<br>That's just that.<br>Let's talk about also<br>the modeling side where,<br>as Arvid said, there are<br>these massive headwinds<br>against local models where one,<br>things that seem to move<br>towards MOEs, which one benefit<br>is maybe their more<br>memory bandwidth bound,<br>which plays in favor of<br>local versus using GPUs<br>or using Nvidia GPUs.<br>But the downside is, these<br>models are just bigger in total,<br>and they're gonna need to fit,<br>often not even on a single<br>node but multiple nodes.<br>There's no way that's gonna fit inside<br>of even really good MacBooks.<br>I think especially for coding,<br>it's not a question as much of,<br>does it clear some bar of<br>the model's good enough<br>to do these things and<br>then we're satisfied?<br>Which may be the case for other problems<br>and maybe where local models shine,<br>but people are always gonna want the best,<br>the most intelligent,<br>the most capable things,<br>and that's gonna be<br>really, really hard to run<br>for almost all people, locally.<br>- Don't you want the most capable model?<br>You want Sonnet too?<br>- And also o1-<br>(Lex chuckling)<br>- I like how you're pitching me.<br>(group chuckling)<br>- O1 is another-<br>- Would you be satisfied<br>with an inferior model?<br>Listen, yes, I'm one of those,<br>but there's some people that<br>like to do stuff locally,<br>really, there's a whole<br>obviously open source movement<br>that resists.<br>It's good that they exist actually<br>because you wanna resist the power centers<br>that are growing our-<br>- There's actually an<br>alternative to local models<br>that I am particularly fond of.<br>I think it's still very<br>much in the research stage,<br>but you could imagine to<br>do homomorphic encryption<br>for language model inference.<br>So you encrypt your input<br>on your local machine,<br>then you send that up,<br>and then the server can<br>use loss of computation.<br>They can run models that<br>you cannot run locally<br>on this encrypted data,<br>but they cannot see what the data is,<br>and then they send back the answer<br>and you decrypt the answer and<br>only you can see the answer.<br>So I think that's still very much research<br>and all of it is about trying<br>to make the overhead lower<br>because right now, the<br>overhead is really big,<br>but if you can make that happen,<br>I think that would be really, really cool,<br>and I think it would be<br>really, really impactful<br>because I think one thing that's<br>actually worrisome is that,<br>as these models get better and better,<br>they're going to become more<br>and more economically useful.<br>And so, more and more of the<br>world's information and data<br>will flow through one or<br>two centralized actors.<br>And then there are worries about,<br>there can be traditional hacker attempts,<br>but it also creates this scary part<br>where if all of the world's<br>information is flowing<br>through one node in plaintext,<br>you can have surveillance<br>in very bad ways.<br>Initially, will be good reasons.<br>People will want to try to protect<br>against bad actors using<br>AI models in bad ways,<br>and then you will add in<br>some surveillance code.<br>And then, someone else will come in<br>and you're on a slippery slope,<br>and then you start doing bad things<br>with a lot of the world's data.<br>So, I am very hopeful<br>that we can solve homomorphic encryption<br>for language model inference.<br>- Yeah, and doing privacy,<br>preserving machine learning.<br>But I would say, that's<br>the challenge we have<br>with all software these days.<br>It's like, there's so many<br>features that can be provided<br>from the cloud and all us<br>increasingly rely on it<br>and make our life awesome.<br>But there's downsides,<br>and that's why you rely<br>on really good security<br>to protect from basic attacks.<br>But there's also only a<br>small set of companies<br>that are controlling that data,<br>and they obviously have leverage<br>and they could be infiltrated<br>in all kinds of ways.<br>That's the world we live in.<br>- Yeah, the thing I'm just<br>actually quite worried about<br>is Anthropic has this<br>responsible scaling policy<br>where we're the low ASLs,<br>which is the Anthropic<br>security level or whatever<br>of the models.<br>But as we get to, quote,<br>unquote, "ASL-3, ASL-4,"<br>whatever models which are very powerful.<br>But for mostly reasonable<br>security reasons,<br>you would wanna monitor all the prompts.<br>But I think that's<br>reasonable and understandable<br>where everyone is coming from.<br>But man, it'd be really horrible<br>if all the world's information<br>is monitored that heavily,<br>it's way too centralized.<br>It's like this really<br>fine line you're walking<br>where on the one side,<br>you don't want the models to go rogue.<br>On the other side, humans like,<br>I don't know if I trust<br>all the world's information<br>to pass through three model providers.<br>- Yeah.<br>- Why do you think it's<br>different than cloud providers?<br>- Because I think a lot of<br>this data would never have gone<br>to the cloud providers in the first place.<br>You want to give more<br>data to the AI models,<br>you want to give personal data<br>that you would never have<br>put online in the first place<br>to these companies or to these models.<br>It also centralizes control<br>where right now, for cloud,<br>you can often use your<br>own encryption keys,<br>and AWS can't really do much.<br>But here, it's just centralized actors<br>that see the exact plain<br>text of everything.<br>- On the topic of a context,<br>that's actually been a friction for me.<br>When I'm writing code in Python,<br>there's a bunch of stuff imported.<br>You could probably<br>intuit the kind of stuff<br>I would like to include in the context.<br>How hard is it to auto<br>figure out the context?<br>- It's tricky.<br>I think we can do a lot better<br>at computing the context<br>automatically in the future.<br>One thing that's important to note is,<br>there are trade-offs with<br>including automatic context.<br>So, the more context you<br>include for these models,<br>first of all, the slower they are<br>and the more expensive those requests are,<br>which means you can<br>then do less model calls<br>and do less fancy stuff in the background.<br>Also, for a lot of these<br>models, they get confused<br>if you have a lot of<br>information in the prompt.<br>So the bar for accuracy and for relevance<br>of the context you include<br>should be quite high.<br>Already, we do some automatic context<br>in some places within the product.<br>It's definitely something we<br>wanna get a lot better at.<br>I think that there are a lot<br>of cool ideas to try there,<br>both on the learning<br>better retrieval systems,<br>like better embedding<br>models, better rerankers.<br>I think that there are<br>also cool academic ideas,<br>stuff we've tried out internally,<br>but also the field is grappling<br>with writ large about,<br>can you get language models to a place<br>where you can actually<br>just have the model itself<br>understand a new corpus of information?<br>The most popular talked<br>about version of this is<br>can you make the context windows infinite?<br>Then if you make the<br>context windows infinite,<br>can you make the model<br>actually pay attention<br>to the infinite context?<br>And then, after you can<br>make it pay attention<br>to the infinite context to<br>make it somewhat feasible<br>to actually do it, can you then do caching<br>for that infinite context?<br>You don't have to recompute<br>that all the time.<br>But there are other cool<br>ideas that are being tried,<br>that are a little bit more<br>analogous to fine-tuning<br>of actually learning this information<br>in the weights of the model.<br>It might be that you<br>actually get a qualitative<br>lead different type of understanding<br>if you do it more at the weight level<br>than if you do it at the<br>in-context learning level.<br>I think the jury's still a little bit out<br>on how this is all gonna work in the end?<br>But in the interim, us as a company,<br>we are really excited about<br>better retrieval systems<br>and picking the parts of the code base<br>that are most relevant<br>to what you're doing,<br>and we could do that a lot better.<br>- One interesting proof of concept<br>for the learning this knowledge<br>directly in the weights<br>is with VS Code.<br>So, we're in a VS Code fork and VS Code.<br>The code is all public.<br>So these models in pre-training<br>have seen all the code.<br>They've probably also seen<br>questions and answers about it.<br>And then, they've been<br>fine-tuned and RLHFed<br>to be able to answer questions<br>about code in general.<br>So when you ask it a<br>question about VS Code,<br>sometimes it'll hallucinate,<br>but sometimes it actually<br>does a pretty good job<br>at answering the question.<br>It happens to be okay,<br>but what if you could<br>actually specifically train<br>or post-train a model such<br>that it really was built<br>to understand this code base?<br>It's an open research question,<br>one that we're quite interested in.<br>And then there's also uncertainty of,<br>do you want the model to be the thing<br>that end-to-end is doing everything,<br>i.e., it's doing the<br>retrieval in its internals,<br>and then answering a<br>question, creating the code,<br>or do you want to separate the retrieval<br>from the frontier model,<br>where maybe you'll get<br>some really capable models<br>that are much better than<br>the best open source ones<br>in a handful of months?<br>And then, you'll want to separately train<br>a really good open source<br>model to be the retriever,<br>to be the thing that feeds in the context<br>to these larger models.<br>- Can you speak a little<br>more to post-training a model<br>to understand the code base?<br>What do you mean by that?<br>Is this a synthetic data direction?<br>Is this-<br>- Yeah, there are many possible<br>ways you could try doing it.<br>There's certainly no shortage of ideas.<br>It's just a question of going<br>in and trying all of them<br>and being empirical about<br>which one works best.<br>One very naive thing is to<br>try to replicate what's done<br>with VS Code and these frontier models.<br>So, let's continue pre-training.<br>Some kind of continued pre-training<br>that includes general code data<br>but also throws in of the data<br>of some particular repository<br>that you care about.<br>And then in post-training, meaning,<br>let's just start with<br>instruction fine-tuning.<br>You have a normal instruction<br>fine-tuning data set<br>about code.<br>Then you throw in a lot<br>of questions about code<br>in that repository.<br>So, you could either<br>get ground truth ones,<br>which might be difficult or<br>you could do what you hinted at<br>or suggested using synthetic data,<br>i.e., having the model ask questions<br>about various recent pieces of the code.<br>So you take the pieces of the code,<br>then prompt the model or have<br>a model propose a question<br>for that piece of code,<br>and then add those as instruction<br>fine-tuning data points.<br>And then in theory, this might<br>unlock the model's ability<br>to answer questions about that code base.<br>- Let me ask you about OpenAI o1.<br>What do you think is the role<br>of that kind of test time<br>compute system in programming?<br>- I think test time compute<br>is really, really interesting.<br>So, there's been the pre-training regime<br>as you scale up the amount of data<br>and the size of your model,<br>get you better and better<br>performance both on loss,<br>and then on downstream benchmarks<br>and just general performance,<br>so we use it for coding or other tasks.<br>We're starting to hit<br>a bit of a data wall,<br>meaning, it's going to be hard to continue<br>scaling up this regime.<br>So, scaling up test time<br>compute is an interesting way,<br>if now increasing the number<br>of inference time flops.<br>Yeah, as you increase the number<br>of flops you use inference<br>time getting corresponding<br>improvements in the<br>performance of these models.<br>Traditionally, we just had to<br>literally train a bigger model<br>that always used that many more flops,<br>but now, we could perhaps<br>use the same size model<br>and run it for longer to<br>be able to get an answer<br>at the quality of a much larger model.<br>And so, the really interesting<br>thing I like about this<br>is there are some problems<br>that perhaps require<br>100 trillion parameter<br>model intelligence trained<br>on 100 trillion tokens.<br>But that's maybe 1%,<br>maybe .1% of all queries.<br>So are you going to<br>spend all of this effort,<br>all of this compute training<br>a model that costs that much<br>and then run it so infrequently?<br>You train the model that is capable<br>of doing the 99.9% of queries,<br>then you have a way of<br>inference time running it longer<br>for those few people<br>that really, really want max intelligence.<br>- How do you figure out<br>which problem requires<br>what level of intelligence?<br>Is that possible to dynamically figure out<br>when to use GPT-4, when<br>to use a small model<br>and when you need the o1?<br>(group chuckles)<br>- Yeah, that's an open<br>research problem, certainly.<br>I don't think anyone's actually cracked<br>this model routing problem quite well.<br>We have initial implementations of this<br>for something like Cursor Tab,<br>but at the level of going<br>between 4o Sonnet to o1,<br>it's a bit trickier.<br>There's also a question like,<br>what level of intelligence<br>do you need to determine<br>if the thing is too hard<br>for the four level model?<br>Maybe you need the o1 level model.<br>It's really unclear.<br>- But you mentioned this.<br>So, there's a pre-training process<br>then there's post-training,<br>and then there's test time compute.<br>Is that fair to separate?<br>Where's the biggest gains?<br>- Well, it's weird<br>because test time compute,<br>there's a whole training strategy needed<br>to get test time compute to work.<br>The other really weird thing about this<br>is outside of the big labs<br>and maybe even just OpenAI,<br>no one really knows how it works.<br>There've been some<br>really interesting papers<br>that show hints of what<br>they might be doing.<br>So, perhaps they're doing something<br>with tree search using<br>process reward models.<br>But yeah, I think the issue<br>is we don't quite know<br>exactly what it looks like,<br>so it would be hard to<br>comment on where it fits in.<br>I would put it in post-training,<br>but maybe the compute spent<br>for this forgetting test time<br>compute to work for a model<br>is going to dwarf pre-training eventually.<br>- So we don't even know if o1<br>is using just chain of thought<br>or we don't know how<br>they're using any of these?<br>We don't know anything?<br>- It's fun to speculate.<br>(group chuckling)<br>- If you were to build a competing<br>model, what would you do?<br>- Yeah, so one thing to do would be,<br>I think you probably need to<br>train a process reward model.<br>So maybe we can get into reward models<br>and outcome reward models<br>versus process reward models.<br>Outcome reward models are<br>the traditional reward models<br>that people are trained<br>for language modeling,<br>and it's just looking at the final thing.<br>So if you're doing some math problem,<br>let's look at that final thing.<br>You've done everything and<br>let's assign a grade to it,<br>how likely we think.<br>What's the reward for this outcome?<br>Process reward models<br>instead try to grade the chain of thought.<br>And so OpenAI had preliminary<br>paper on this, I think,<br>last summer where they use human labelers<br>to get this pretty large several<br>hundred thousand data set<br>of creating chains of thought.<br>Ultimately, it feels like<br>I haven't seen anything<br>interesting in the ways<br>that people use process reward models<br>outside of just using it<br>as a means of affecting<br>how we choose between a bunch of samples.<br>So, what people do in all these papers<br>is they sample a bunch of<br>outputs from the language model,<br>and then use the process reward models<br>to grade all those generations<br>alongside maybe some other heuristics,<br>and then use that to<br>choose the best answer.<br>The really interesting thing<br>that people think might work<br>and people want to work is tree search<br>with these process reward models.<br>Because if you really can<br>grade every single step<br>of the chain of thought,<br>then you can branch out<br>and explore multiple paths<br>of this chain of thought<br>and then use these process<br>reward models to evaluate<br>how good is this branch<br>that you're taking.<br>- Yeah, when the quality of the branch<br>is somehow strongly correlated<br>with the quality of the<br>outcome at the very end,<br>so you have a good model of<br>knowing which branch to take.<br>So not just in the short<br>term, in the long term?<br>- Yeah.<br>The interesting work that<br>I think has been done<br>is figuring out how to<br>properly train the process,<br>or the interesting work<br>that has been open sourced<br>and people I think talk about is<br>how to train the process reward models,<br>maybe in a more automated way.<br>I could be wrong here, could<br>not be mentioning some papers.<br>I haven't seen anything super<br>that seems to work really well<br>for using the process<br>reward models creatively<br>to do tree search and code.<br>- This is an AI safety,<br>maybe a bit of a philosophy question.<br>So OpenAI says that they're<br>hiding the chain of thought<br>from the user,<br>and they've said that that was<br>a difficult decision to make.<br>Instead of showing the chain of thought,<br>they're asking the model to<br>summarize the chain of thought.<br>They're also in the background saying<br>they're going to monitor<br>the chain of thought<br>to make sure the model is not<br>trying to manipulate the user,<br>which is a fascinating possibility.<br>But anyway, what do you think<br>about hiding the chain of thought?<br>- One consideration for OpenAI,<br>and this is completely speculative,<br>could be that they wanna<br>make it hard for people<br>to distill these capabilities<br>out of their model.<br>It might actually be easier<br>if you had access to that<br>hidden chain of thought<br>to replicate the technology,<br>because pretty important data,<br>like seeing the steps that the model took<br>to get to the final results.<br>- So, you could probably<br>train on that also?<br>- And there was a mirror<br>situation with this,<br>with some of the large<br>language model providers,<br>and also this is speculation,<br>but some of these APIs<br>used to offer easy access<br>to log probabilities for all the tokens<br>that they're generating<br>and also log probabilities<br>over the prompt tokens.<br>And then some of these<br>APIs took those away.<br>Again, complete speculation,<br>but one of the thoughts<br>is that the reason those were taken away<br>is if you have access to log probabilities<br>similar to this hidden chain of thought,<br>that can give you even more information<br>to try and distill these<br>capabilities out of the APIs,<br>out of these biggest models<br>and to models you control.<br>As an asterisk on also<br>the previous discussion<br>about us integrating o1,<br>I think that we're still<br>learning how to use this model.<br>So, we made o1 available in Cursor<br>because when we got the model,<br>we were really interested<br>in trying it out.<br>I think a lot of programmers<br>are gonna be interested<br>in trying it out.<br>O1 is not part of the<br>default Cursor experience<br>in any way up,<br>and we still haven't found<br>a way to yet integrate it<br>into the editor in a way<br>that we reach for every hour,<br>maybe even every day.<br>So, I think that the jury's still out<br>on how to use the model,<br>and we haven't seen examples<br>yet of people releasing things<br>where it seems really clear like,<br>"Oh, that's now the use case."<br>The obvious one to turn to<br>is maybe this can make it easier<br>for you to have these<br>background things running,<br>to have these models and loops,<br>to have these models be agentic.<br>But we're still discovering.<br>- To be clear, we have ideas.<br>We just need to try and get<br>something incredibly useful<br>before we put it out there.<br>- But it has these<br>significant limitations.<br>Even barring capabilities,<br>it does not stream.<br>That means it's really, really<br>painful to use for things<br>where you want to supervise the output.<br>Instead, you're just waiting<br>for the wall text to show up.<br>Also, it does feel like the<br>early innings of test time,<br>compute and search where it's<br>just a very, very much a v0,<br>and there's so many things<br>that don't feel quite right.<br>I suspect in parallel to<br>people increasing the amount<br>of pre-training data<br>and the size of the<br>models and pre-training<br>and finding tricks there, you'll<br>now have this other thread<br>of getting search to<br>work better and better.<br>- So, let me ask you about<br>strawberry tomorrow eyes.<br>(group chuckles)<br>So, it looks like GitHub<br>Copilot might be integrating o1<br>in some kind of way,<br>and I think some of the<br>comments are saying,<br>does this mean Cursor is done?<br>(group chuckles)<br>I think I saw one comment saying that.<br>- It's a time to shut down Cursor, yeah.<br>- Time to shut down Cursor, thank you.<br>(group chuckling)<br>So, is it time to shut down Cursor?<br>- I think this space is<br>a little bit different<br>from past software spaces over the 2010s,<br>where I think that the ceiling here<br>is really, really, really incredibly high.<br>So, I think that the best<br>product in three to four years<br>will just be soon much more useful<br>than the best product today.<br>You can wax poetic about<br>moats this and brand that<br>and this is our advantage,<br>but I think in the end,<br>just if you stop innovating<br>on the product, you will lose.<br>That's also great for startups,<br>that's great for people<br>trying to enter this market<br>because it means you have an opportunity<br>to win against people who<br>have lots of users already<br>by just building something better.<br>And so, I think over the next few years,<br>it's just about building the best product,<br>building the best system,<br>and that both comes down<br>to the modeling engine side of things,<br>and it also comes down to<br>the editing experience.<br>- Yeah, I think most of the<br>additional value from Cursor<br>versus everything else out there<br>is not just integrating<br>the new model fast like o1.<br>It comes from all of the depth<br>that goes into these custom models<br>that you don't realize are working for you<br>in every facet of the product,<br>as well as the really thoughtful UX<br>with every single feature.<br>- All right, from that profound answer,<br>let's descend back down to the technical.<br>You mentioned you have a<br>taxonomy of synthetic data.<br>- (chuckles) Oh, yeah.<br>- Can you please explain?<br>- Yeah, I think there are three main kinds<br>of synthetic data.<br>So what is synthetic data, first?<br>So there's normal data,<br>like non-synthetic data,<br>which is just data<br>that's naturally created,<br>i.e., usually it'll be from<br>humans having done things.<br>So, from some human<br>process you get this data.<br>Synthetic data, the first<br>one would be distillation.<br>So having a language model,<br>output tokens or probability<br>distributions over tokens,<br>and then you can train some<br>less capable model on this.<br>This approach is not gonna<br>get you a more capable model<br>than the original one that<br>has produced the tokens,<br>but it's really useful<br>if there's some capability<br>you wanna elicit<br>from some really expensive<br>high-latency model.<br>You can then distill that down<br>into some smaller task-specific model.<br>The second kind is when one<br>direction of the problem<br>is easier than the reverse.<br>So, a great example of<br>this is bug detection,<br>like we mentioned earlier,<br>where it's a lot easier to<br>introduce reasonable-looking bugs<br>than it is to actually detect them.<br>And this is probably<br>the case for humans too.<br>And so what you can do,<br>is you can get a model<br>that's not trained in that much<br>data, that's not that smart,<br>to introduce a bunch of bugs and code.<br>And then, you can use that to then train.<br>Use the synthetic data to train a model<br>that can be really good at detecting bugs.<br>The last category I think<br>is, I guess the main one<br>that it feels like the big labs are doing<br>for synthetic data,<br>which is producing text<br>with language models that<br>can then be verified easily.<br>So, extreme example of this<br>is if you have a verification<br>system that can detect<br>if language is Shakespeare level,<br>and then you have a bunch of<br>monkeys typing and typewriters.<br>You can eventually get<br>enough training data<br>to train a Shakespeare-level<br>language model.<br>And I mean this is very<br>much the case for math<br>where verification is<br>actually really, really easy<br>for formal languages.<br>And then what you can do, is<br>you can have an okay model,<br>generate a ton of rollouts,<br>and then choose the ones<br>that you know have actually proved<br>the ground truth theorems,<br>and train that further.<br>There's similar things you can do for code<br>with lead code like problems,<br>where if you have some set of tests<br>that you know correspond to if<br>something passes these tests,<br>it actually solved problem.<br>You could do the same thing<br>where you verify that it's passed the test<br>and then train the model in the outputs<br>that have passed the tests.<br>I think it's gonna be a little<br>tricky getting this to work<br>in all domains, or just in general.<br>Having the perfect verifier<br>feels really, really hard to do<br>with just open-ended miscellaneous tasks.<br>You give the model or<br>more long horizon tasks,<br>even in coding.<br>- [Lex] That's 'cause you're<br>not as optimistic as Arvid.<br>But yeah, so yeah,<br>(Aman chuckles)<br>that third category<br>requires having a verifier.<br>- Yeah.<br>Verification, it feels like<br>it's best when you know<br>for a fact that it's correct.<br>And then it wouldn't be<br>like using a language model<br>to verify, it would be using<br>tests or formal systems.<br>- Or running the thing too.<br>Doing the human form of verification,<br>where you just do manual quality control.<br>- Yeah.<br>- Yeah.<br>- But the language model version of that,<br>where it's running the thing<br>and it actually understands the output.<br>- Yeah, no, that's-<br>- I'm sure it's somewhere in between.<br>- Yeah.<br>I think that's the category<br>that is most likely to result<br>in massive gains.<br>- What about RL with feedback<br>side RLHF versus RLAIF?<br>What's the role of that<br>in getting better<br>performance on the models?<br>- Yeah.<br>So, RLHF is when the reward<br>model you use is trained<br>from some labels you've collected<br>from humans giving feedback.<br>I think this works if you have the ability<br>to get a ton of human feedback<br>for this kind of task that you care about.<br>RLAIF is interesting<br>because it's depending on the<br>constraint that verification<br>is actually a decent bit<br>easier than generation.<br>Because it feels like,<br>okay, what are you doing?<br>Are you using this language model<br>to look at the language model outputs<br>and then prove the language model?<br>But no, it actually may work<br>if the language model has a<br>much easier time verifying<br>some solution than it does generating it.<br>Then you actually could perhaps<br>get this recursive loop.<br>But I don't think it's gonna<br>look exactly like that.<br>The other thing you could<br>do, that we kind of do,<br>is a little bit of a<br>mix of RLAIF and RLHF,<br>where usually the model<br>is actually quite correct<br>and this is the case of<br>precursor tap picking<br>between two possible generations<br>of what is the better one.<br>And then, it just needs a<br>little bit of human nudging<br>with only on the order 50, 100 examples<br>to align that prior the model has<br>with exactly with what you want.<br>It looks different than<br>I think normal RLHF<br>where you're usually<br>training these reward models<br>in tons of examples.<br>- What's your intuition<br>when you compare generation<br>and verification or<br>generation and ranking?<br>Is ranking way easier than generation?<br>- My intuition would just<br>say, yeah, it should be.<br>Like, if you believe P does not equal NP,<br>then there's this<br>massive class of problems<br>that are much, much easier<br>to verify given proof,<br>than actually proving it.<br>- I wonder if the same thing<br>will prove P not equal to NP<br>or P equal to NP.<br>- (chuckles) That would be really cool.<br>- That'd be a whatever Field's Medal<br>(group giggling)<br>by AI.<br>Who gets the credit?<br>Another the open philosophical question.<br>(group chuckling)<br>- Whoever prompted it.<br>(group chuckling)<br>- I'm actually surprisingly curious<br>what a good bet for one AI<br>will get the Field's Medal will be.<br>I actually don't have-<br>- Isn't this<br>Aman's specialty?<br>- I don't know what Aman's bet here is.<br>- Oh, sorry, Nobel Prize<br>or Field's Medal first?<br>- Field's Medal-<br>- Oh, Field's Medal level?<br>- Field's Medal comes first, I think.<br>- Field's Medal comes first.<br>Well, you would say that, of course.<br>(group chuckling)<br>- But it's also this<br>isolated system you verify.<br>- Sure.<br>- Yeah.<br>- I don't even know if I-<br>- You don't need to do (indistinct).<br>- I feel like I have<br>much more to do there.<br>It felt like the path<br>to get to IMO was a little bit more clear.<br>Because it already could<br>get a few IMO problems<br>and there was a bunch<br>of low-hanging fruit,<br>given the literature at the time,<br>of what tactics people could take.<br>I think I'm, one, much less versed<br>in the space of theorem proving now.<br>And two, less intuition about<br>how close we are to solving<br>these really, really hard open problems.<br>- So you think you'll<br>be Field's Medal first?<br>It won't be in physics or in-<br>- Oh, 100%, I think that's<br>probably more likely.<br>It is probably much more<br>likely that it'll get in.<br>Yeah, yeah, yeah.<br>Well, I think it both<br>to, I don't know, BSD,<br>which is a Birch and<br>Swinnerton-Dyer conjecture,<br>or (indistinct) iPods,<br>or any one of these hard math problems<br>are just actually really hard.<br>It's unclear what the path<br>to get even a solution looks like.<br>We don't even know what a path looks like,<br>let alone (indistinct).<br>- And you don't buy the idea<br>this is just like an isolated system<br>and you can actually have<br>a good reward system,<br>and it feels like it's<br>easier to train for that.<br>- I think we might get<br>Field's Medal before AGI.<br>- I mean, I'd be very happy.<br>I'd be very happy.<br>But I don't know if I think 2028, 2030.<br>(Aman chuckles)<br>- For Field's Medal?<br>- Field's Medal.<br>- All right.<br>It feels like forever from now,<br>given how fast things have been going.<br>Speaking of how fast<br>things have been going,<br>let's talk about scaling laws.<br>So, for people who don't know,<br>maybe it's good to talk<br>about this whole idea<br>of scaling laws.<br>What are they, where'd you think stand,<br>and where do you think things are going?<br>- I think it was interesting.<br>The original scaling laws paper<br>by OpenAI was slightly wrong.<br>'Cause I think of some issues they did<br>with learning right schedules.<br>And then, Chinchilla showed<br>a more correct version.<br>And then, from then<br>people have again deviated<br>from doing the compute optimal thing.<br>'Cause people start now optimizing more so<br>for making the thing work really well<br>given an inference budget.<br>And I think there are a lot<br>more dimensions to these curves<br>than what we originally used,<br>of just compute number<br>of parameters and data.<br>Like inference compute is the obvious one.<br>I think context length<br>is another obvious one.<br>Let's say you care about the two things<br>of inference compute<br>and then context window,<br>maybe the thing you wanna<br>train is some kind of SSM.<br>Because they're much,<br>much cheaper and faster<br>at super, super long context.<br>And even if, maybe it was<br>10 X more scaling properties<br>during training, meaning,<br>you spend 10 X more compute<br>to train the thing to get the<br>same level of capabilities,<br>it's worth it<br>because you care most<br>about that inference budget<br>for really long context windows.<br>So, it'll be interesting to see<br>how people play with all these dimensions.<br>- So, yeah, I mean, you speak<br>to the multiple dimensions, obviously.<br>The original conception was<br>just looking at the variables<br>of the size of the model<br>as measured by parameters,<br>and the size of the data<br>as measured by the number of tokens,<br>and looking at the ratio of the two.<br>- Yeah.<br>- And it's kind of a compelling notion<br>that there is a number,<br>or at least a minimum.<br>And it seems like one was emerging.<br>Do you still believe<br>that there is a kind of bigger is better?<br>- I mean, I think bigger<br>is certainly better<br>for just raw performance.<br>- And raw intelligence.<br>- And raw intelligence.<br>I think the path that people might take,<br>I'm particularly bullish on distillation.<br>And how many knobs can you turn to,<br>if we spend a ton, ton<br>of money on training,<br>get the most capable cheap model.<br>Really, really caring as much as you can.<br>'Cause the naive version of<br>caring as much as you can<br>about inference time compute,<br>is what people have already<br>done with the Llama models.<br>Or just over-training<br>the shit out of 7B models<br>on way, way, way more tokens<br>than is essential optimal.<br>But if you really care about it,<br>maybe the thing to do is what Gamma did,<br>which is let's not just train on tokens,<br>let's literally train on<br>minimizing the KL divergence<br>with the distribution of gemma 27B, right?<br>So knowledge distillation there.<br>And you're spending the compute<br>of literally training this<br>27 billion parameter model<br>on all these tokens, just to get out this,<br>I don't know, smaller model.<br>- And the distillation gives<br>you just a faster model,<br>smaller means faster.<br>- Yeah, distillation in theory is,<br>I think, getting out more signal<br>from the data that you're training on.<br>And it's perhaps another<br>way of getting over,<br>not completely over,<br>but partially helping with the data wall.<br>Where you only have so<br>much data to train on,<br>let's train this really, really big model<br>on all these tokens<br>and we'll distill it<br>into this smaller one.<br>And maybe we can get more signal per token<br>for this much smaller model<br>than we would've originally<br>if we trained it.<br>- So if I gave you $10 trillion,<br>how would you spend it?<br>(Aman chuckles)<br>I mean, you can't buy<br>an island or whatever.<br>How would you allocate it<br>in terms of improving the big model<br>versus maybe paying for HF in the RLHF?<br>- Yeah, yeah.<br>I think, there's a lot of<br>these secrets and details<br>about training these large<br>models that I just don't know,<br>and are only privy to the large labs.<br>And the issue is, I would<br>waste a lot of that money<br>if I even attempted this,<br>because I wouldn't know those things.<br>Suspending a lot of disbelief<br>and assuming you had the know-how,<br>or if you're saying you have to operate<br>with the limited information you have now.<br>- No, no, no, actually, I would say,<br>you swoop in and you<br>get all the information,<br>all the little heuristics,<br>all the little parameters,<br>all the parameters that define<br>how the thing is trained.<br>- Mm-hmm.<br>- If we look in how to invest<br>money for the next five years<br>in terms of maximizing what<br>you called raw intelligence.<br>- I mean, isn't the answer really simple?<br>You just try to get as<br>much compute as possible.<br>At the end of the day, all<br>you need to buy is the GPUs.<br>You can tune whether you<br>want to pre-train a big model<br>or a small model.<br>- Well, this gets into the question<br>of are you really limited<br>by compute and money,<br>or are you limited by these other things?<br>- I'm more privy to Arvid's<br>belief that we're idea-limited,<br>but there's always that like-<br>- But if you have a lot of compute,<br>you can run a lot of experiments.<br>- So you would run a lot of experiments<br>versus use that compute<br>to trend a gigantic model?<br>- I would, but I do<br>believe that we are limited<br>in terms of ideas that we have.<br>- I think yeah, 'cause<br>even with all this compute,<br>and all the data you could<br>collect in the world,<br>I think you really are ultimately<br>limited by not even ideas,<br>but just really good engineering.<br>There aren't that many people in the world<br>who really can make the difference here.<br>And there's so much work<br>that goes into research<br>that is just pure, really,<br>really hard engineering work.<br>As a very hand-wavy example,<br>if you look at the<br>original Transformer paper,<br>how much work was joining together a lot<br>of these really interesting<br>concepts embedded<br>in the literature, versus then going in<br>and writing all the codes,<br>maybe the CUDA kernels,<br>maybe whatever else.<br>I don't know if it ran them GPUs or TPUs.<br>Originally, such that<br>it actually saturated<br>the GPU performance.<br>Getting GNOME Azure to go<br>in and do all this code.<br>And GNOME is probably<br>one of the best engineers<br>in the world.<br>Or maybe going a step further,<br>like the next generation of<br>models, having these things.<br>Like getting model parallelism to work,<br>and scaling it on thousands of,<br>or maybe tens of thousands of V100s,<br>which I think GBDE-III may have been.<br>There's just so much engineering effort<br>that has to go into all of<br>these things to make it work.<br>If you really brought that<br>cost down to maybe not zero,<br>but just made it 10 X easier,<br>made it super easy for someone<br>with really fantastic ideas,<br>to immediately get to the version<br>of the new architecture they dreamed up,<br>that is getting 50, 40%<br>utilization on their GPUs,<br>I think that would just<br>speed up research by a ton.<br>- I mean, I think if you see<br>a clear path to improvement,<br>you should always take the<br>low-hanging fruit first, right?<br>I think probably OpenAI<br>and all the other labs<br>that did the right thing to<br>pick off the low-hanging fruit.<br>Where the low-hanging fruit is like,<br>you could scale up to a GPT-4.25 scale<br>and you just keep scaling,<br>and things keep getting better.<br>There's no point of<br>experimenting with new ideas<br>when everything is working.<br>And you should bang on<br>and to try to get as much as<br>much juice out of the possible.<br>I think if you're spending $10 trillion,<br>you probably wanna spend some,<br>then actually reevaluate your ideas,<br>probably your idea a<br>little bit at that point.<br>- I think all of us believe<br>new ideas are probably needed<br>to get all the way there to AGI.<br>And all of us also probably believe<br>there exist ways of<br>testing out those ideas<br>at smaller scales, and<br>being fairly confident<br>that they'll play out.<br>It's just quite difficult for the labs<br>in their current position<br>to dedicate their very limited research<br>and engineering talent to<br>exploring all these other ideas,<br>when there's this core thing<br>that will probably improve performance<br>for some decent amount of time.<br>- Yeah, but also, these<br>big labs like winning.<br>(Lex chuckles)<br>So, they're just going wild.<br>Okay.<br>(all chuckling)<br>So, big question, looking<br>out into the future.<br>You're now at the center<br>of the programming world.<br>How do you think programming,<br>the nature of programming<br>changes in the next few months,<br>in the next year, in the next two years<br>and the next five years, 10 years?<br>- I think we're really<br>excited about a future<br>where the programmer<br>is in the driver's seat<br>for a long time.<br>And you've heard us talk<br>about this a little bit,<br>but one that emphasizes speed<br>and agency for the programmer and control.<br>The ability to modify<br>anything you wanna modify,<br>the ability to iterate really<br>fast on what you're building.<br>And this is a little different,<br>I think, than where some people<br>are jumping to in the space,<br>where I think one idea<br>that's captivated people,<br>is can you talk to your computer?<br>Can you have it build software for you?<br>As if you're talking to<br>an engineering department<br>or an engineer over Slack.<br>And can it just be this<br>sort of isolated text box?<br>And part of the reason we're<br>not excited about that,<br>is some of the stuff we've<br>talked about with latency,<br>but then a big piece, a reason<br>we're not excited about that,<br>is because that comes with<br>giving up a lot of control.<br>It's much harder to be really specific<br>when you're talking in the text box.<br>And if you're necessarily<br>just going to communicate<br>with a thing like you<br>would be communicating<br>with an engineering department,<br>you're actually advocating tons<br>of really important decisions to this bot.<br>And this kind of gets at, fundamentally,<br>what engineering is.<br>I think that some people<br>who are a little bit more<br>removed from engineering<br>might think of it as the spec<br>is completely written out<br>and then the engineers just<br>come and they just implement.<br>And it's just about making<br>the thing happen in code<br>and making the thing exist.<br>But I think a lot of the best engineering,<br>the engineering we enjoy,<br>involves tons of tiny micro decisions<br>about what exactly you're building,<br>and about really hard trade-offs<br>between speed and cost<br>and just all the other<br>things involved in a system.<br>As long as humans are actually<br>the ones designing the software<br>and the ones specifying<br>what they want to be built,<br>and it's not just like<br>company run by all AIs,<br>we think you'll really want the human<br>in a driver's seat<br>dictating these decisions.<br>And so the jury's still out<br>on what that looks like.<br>I think that one weird idea<br>for what that could look like,<br>is it could look like you can control<br>the level of abstraction<br>you view a code base at.<br>And you can point at specific<br>parts of a code base,<br>like, maybe you digest a<br>code base by looking at it<br>in the form of pseudocode.<br>And you can actually<br>edit that pseudocode too,<br>and then have changes get made down<br>at the formal programming level.<br>And you can gesture at any piece of logic<br>in your software component of programming.<br>You keep the inflow text editing<br>component of programming,<br>you keep the control of, you<br>can even go down into the code,<br>you can go at higher<br>levels of abstraction,<br>while also giving you these<br>big productivity gains.<br>- It'd be nice<br>if you can go up and down<br>the abstraction stack.<br>- Yeah.<br>And there are a lot of<br>details to figure out there<br>that's sort of like a fuzzy idea.<br>Time will tell if it actually works.<br>But these principles of control and speed<br>in the human in the driver's seat,<br>we think are really important.<br>We think for some things<br>like Arvid mentioned before,<br>for some styles of programming,<br>you can hand it off chatbot-style.<br>If you have a bug that's<br>really well specified.<br>But that's not most of programming,<br>and that's also not<br>most of the programming<br>we think a lot of people value.<br>- What about the fundamental<br>skill of programming?<br>There's a lot of people, like<br>young people right now scared,<br>'cause they love programming,<br>but they're scared about,<br>"Will I be able to have a future<br>if I pursue this career path?"<br>Do you think the very skill of programming<br>will change fundamentally?<br>- I actually think this is a<br>really, really exciting time<br>to be building software.<br>We remember what programming was like<br>in 2013, 2012, whatever it was.<br>And there was just so much<br>more cruft and boilerplate<br>and looking up something really gnarly.<br>And that stuff still exists,<br>it's definitely not at zero.<br>But programming today is<br>way more fun than back then.<br>It's like we're really getting down<br>to the delight concentration.<br>And all the things that really<br>draw people to programming,<br>for instance, this element of being able<br>to build things really fast and speed,<br>and also individual control,<br>all those are just being turned up a ton.<br>And so I think it's gonna<br>be a really, really fun time<br>for people who build software.<br>I think that the skills<br>will probably change too.<br>I think that people's<br>taste and creative ideas<br>will be magnified.<br>And it will be maybe less, a little bit,<br>about boilerplate text editing.<br>Maybe even a little bit<br>less about carefulness,<br>which I think is really important today<br>if you're a programmer.<br>I think it'll be a lot more fun.<br>- What do you guys think?<br>- I agree.<br>I'm very excited to be able to change.<br>One thing that happened recently,<br>was we wanted to do a<br>relatively big migration<br>to our code base.<br>We were using<br>AsyncLocalStorage in Node.js,<br>which is known to be not very performant,<br>and we wanted to migrate<br>to a context object.<br>And this is a big migration<br>and affects the entire code base.<br>Sualeh and I spent, I don't know,<br>five days working through this,<br>even with today's AI tools.<br>And I am really excited for a future<br>where I can just show a couple of examples<br>and then the AI applies that<br>to all of the locations.<br>And then it highlights,<br>"Oh, this is a new<br>example, what should I do?"<br>And then, I show exactly what to do there.<br>And then, that can be done in 10 minutes.<br>And then, you can iterate<br>much, much faster.<br>Then, you don't have to<br>think as much upfront<br>and stand at the blackboard and think,<br>"Exactly, how are we gonna do this,<br>because the cost is so high?"<br>But you can just try something<br>first and you realize,<br>"Oh, this is not actually<br>exactly what I want."<br>And then, you can change<br>it instantly again after.<br>And so, yeah, I think being<br>a programmer in the future<br>is going to be a lot of fun.<br>- Yeah, I really like that point.<br>It feels like a lot of<br>the time with programming,<br>there are two ways you can go about it.<br>One is you think really<br>hard, carefully upfront<br>about the best possible way to do it,<br>and then you spend your<br>limited time of engineering<br>to actually implement it.<br>But I must refer just getting in the code<br>and taking a crack at<br>seeing how it lays out<br>and then iterating really quickly on that.<br>That feels more fun.<br>- Yeah, just speaking to generate<br>the boilerplate, is great.<br>So you just focus on the nuanced,<br>difficult design decisions.<br>Migration, I feel like this is a cool one.<br>It seems like a larger<br>language models is able<br>to basically translate for one<br>program language to another.<br>Or translate, migrate in the general sense<br>of what migrate is.<br>But that's in the current moment.<br>So mean the fear has to do with,<br>okay, as these models<br>get better and better,<br>then you're doing less and<br>less creative decisions.<br>And is it going to kind of move to a place<br>where you're operating<br>in the design space of natural language<br>where natural language is the<br>main programming language?<br>And, I guess, I could ask<br>that by way of advice.<br>If somebody's interested<br>in programming now,<br>what do you think they should learn?<br>You guys started in some Java.<br>(group chuckling)<br>And I forget, oh, some PHP.<br>- PHP.<br>- Objective-C.<br>- Objective-C, there you go.<br>I mean in the end, we all know<br>JavaScript was going to win<br>(group chuckling)<br>and not TypeScript.<br>It's going to be like vanilla JavaScript.<br>It's just going to eat the<br>world and maybe live with PHP.<br>And I mean, it also<br>brings up the question of,<br>I think Don Knuth has this idea<br>that some percent of<br>the population is geeks,<br>and there's a particular<br>kind of psychology<br>in mind required for programming.<br>And it feels like more<br>and more that expands<br>the kind of person that should be able to,<br>can do great programming might expand.<br>- I think different people do programming<br>for different reasons.<br>But I think the true,<br>maybe the best programmers<br>are the ones that really love,<br>just absolutely love programming.<br>For example, there are folks on our team<br>who literally when they<br>get back from work,<br>they go and then they boot up Cursor,<br>and then they start coding<br>on their side projects<br>for the entire night,<br>and they stay up until 3:00 am doing that.<br>And when they're sad, they said,<br>"I just really need to code."<br>(group chuckling)<br>And I think there's<br>that level of programmer<br>where this obsession<br>and love of programming,<br>I think makes, really,<br>the best programmers.<br>And I think these types of people<br>will really get into the<br>details of how things work.<br>- I guess the question I'm<br>asking, that exact programmer,<br>let's think about that person.<br>When the super Tab,<br>the super awesome praise<br>be the Tab succeeds,<br>and you keep pressing Tab.<br>- That person in the team loves Cursor Tab<br>more than anybody else, right?<br>- Yeah.<br>Pressing Tab is just pressing Tab.<br>That's the easy way to<br>say it in the catchphrase.<br>But what you're actually doing<br>when you're pressing Tab,<br>is that you're injecting<br>intent all the time<br>while you're doing it.<br>Sometimes you're rejecting it,<br>sometimes you're typing<br>a few more characters.<br>And that's the way that<br>you're shaping the things<br>that's being created.<br>And I think programming<br>will change a lot to just,<br>"What is it that you want to make?"<br>- It's sort of higher bandwidth.<br>The communication to the computer<br>just becomes higher and higher bandwidth<br>as opposed to just typing<br>as much lower bandwidth<br>than communicating intent.<br>- I mean, this goes to your manifesto<br>titled Engineering Genius.<br>"We are an applied research lab building<br>extraordinary productive<br>human AI systems."<br>So, speaking to this hybrid element.<br>"To start, we're building<br>the engineer of the future,<br>a human AI programmer that's an order<br>of magnitude more effective<br>than any one engineer.<br>This hybrid engineer will<br>have effortless control<br>over their code base and<br>no low entropy keystrokes.<br>They will iterate at the<br>speed of their judgment,<br>even in the most complex systems.<br>Using a combination of<br>AI and human ingenuity,<br>they will out-smart and out-engineer<br>the best pure AI systems.<br>We are a group of<br>researchers and engineers.<br>We build software and models to invent<br>at the edge of what's<br>useful and what's possible.<br>Our work has already improved the lives<br>of hundreds of thousands of programmers."<br>And on the way to that,<br>we'll at least make programming more fun.<br>So, thank you for talking today.<br>- Thank you.<br>- Thanks for having us.<br>- Thank you.<br>- Thank you.<br>- Thanks for listening<br>to this conversation<br>with Michael, Sualeh, Arvid and Aman.<br>To support this podcast,<br>please check out our<br>sponsors in the description.<br>And now, let me leave<br>you with a random, funny,<br>and perhaps profound programming<br>code I saw on Reddit.<br>Nothing is as permanent as a<br>temporary solution that works.<br>Thank you for listening and<br>hope to see you next time.