<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://www.ivanbercovich.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://www.ivanbercovich.com/" rel="alternate" type="text/html" /><updated>2026-06-12T21:26:16+00:00</updated><id>https://www.ivanbercovich.com/feed.xml</id><title type="html">neversupervised</title><subtitle></subtitle><author><name>Ivan Bercovich</name></author><entry><title type="html">The Verifier Is the Hard Problem</title><link href="https://www.ivanbercovich.com/2026/the-verifier-is-the-hard-problem" rel="alternate" type="text/html" title="The Verifier Is the Hard Problem" /><published>2026-06-11T00:00:00+00:00</published><updated>2026-06-11T00:00:00+00:00</updated><id>https://www.ivanbercovich.com/2026/the-verifier-is-the-hard-problem</id><content type="html" xml:base="https://www.ivanbercovich.com/2026/the-verifier-is-the-hard-problem"><![CDATA[<p>I’ve been contributing to Terminal Bench since close to the beginning. A lot of my interest lately has been with reward hacking, specifically how elusive it is. I thought it was an obvious, important issue, and when you talk to people they say, sure, reward hacking is a big deal. But it keeps getting bigger. I think it’s a bigger problem than most people thought.</p>

<p>The common take is that higher capabilities will solve it. My take is the opposite: the more capable the models get, the worse the reward hacking gets.</p>

<p>Here’s why. We’re trying to come up with tasks that are very hard. But what does hard mean? It’s a very open-ended question. If you push toward hard, which means a benchmark that no AI can do yet, so you can test where the AIs can get to, then as you approach that frontier you are also approaching a space that is less well defined. And the less well defined the task, the more reward-hackable it is. The feedback loop breaks.</p>

<p>You see it in what people actually submit. A lot of what we get is someone who starts with a prompt that works, and then removes tokens from the prompt until it fails. That’s tricky for the AI, but it isn’t fundamentally hard. Hard and ill-defined start to look like the same thing, and that’s where the reward hacking lives.</p>

<h2 id="weve-run-out-of-the-low-hanging-fruit">We’ve run out of the low-hanging fruit</h2>

<p>SWE-bench was a really clever way of producing a benchmark. And everybody is kind of hoping you can synthesize benchmarks from here: get a good task creator, brainstorm some ideas with a domain expert, and you can make good tasks. My sense is that, for the time being, you just can’t. The best tasks require that a person who really understands the domain is at the epicenter. The open question is how we learn to produce that, at high quality and at volume, reliably.</p>

<p>And the hardest part isn’t the task, it’s the verifier. When you have something like a C++ compiler with its whole test suite, or SQLite, and you build the task around that, that’s great. But there are only so many applications that come with that degree of testing. Most verifiers are too married to a particular solution. The solution someone provides and the verifier they write are linked to each other, because the same person made both. So you’re not really testing the result, you’re testing against the author’s own answer. The SQLite test suite actually tests the output of the queries. That’s the difference, and it’s hard to produce.</p>

<p>I’ve stayed mostly in the terminal because it seems to me the native space where agents are going to operate. You should be able to port pretty much any concept into the terminal.</p>

<h2 id="confront-the-real-world">Confront the real world</h2>

<p>For now we still get a lot of mileage from really high-quality a priori thinking about how to test something. But that runs out too, and the best verifier is the one that confronts the real world.</p>

<p>Take the extreme. Say I want to teach my AI about aerodynamics. The ultimate setup is I raise some money and I buy a wind tunnel from a company going out of business. I get big 3D printers. Anybody can request the shape of a wing, and I print it precisely, put it in the wind tunnel, and give you back precise measurements. Maybe a simulator is the first-order approximation, but eventually the task, the verifier, is a wind tunnel somewhere in physical space. When you run my thing, I give you that data back.</p>

<p>There are softer versions of the same idea. In finance, here’s five thousand dollars, turn it into ten thousand. At the end of the day you have to trade real money, in real time, to know the thing actually works. That’s about the easiest way to confront the real world. A verifier that touches reality can’t be gamed the way a verifier that’s married to its author’s solution can.</p>

<h2 id="who-makes-the-tasks">Who makes the tasks</h2>

<p>So if good tasks require a domain expert at the epicenter, and good verifiers are this hard to build, the real question is who makes them, and how you get them at quality and volume.</p>

<p>I don’t think the current model is the right one. The work mostly runs through a couple of very large data vendors, and it’s hard to know if they’re not experts on the tasks they’re making. They’re worth tens of billions of dollars and they’re still not doing an amazing job. The people actually making the tasks are anonymous. Nobody knows who they are, the names don’t even get shared. All the reputation gets stripped out.</p>

<p>I think there’s a better structure, closer to a marketplace, or at least a network. Imagine I make a task for Terminal Bench, and it has my name on it, and you can count how many times it’s been a rolled out. Maybe I made a really good task and there have been a billion rollouts. Who is capturing that reputation? Right now it’s some other entity that is effectively teaching AIs how to do things. That reputation, captured properly, might be incentive enough on its own.</p>

<p>And a lot of the incentive is already there if you set it up right. Think about SEO. Google wants high-quality content, people want to rank in Google, and there’s some gaming along the way, but it mostly works. The same shape applies here. Every open-source project should have a library, documentation, and a number of tasks. Documentation is only good to a point; what you really want is tasks, because you want agents to use your project, and the way agents learn to use it is for a lab to train on good tasks. If I’m Salesforce and I want to be the system of record behind agents, I’m incentivized to publish high-quality tasks. If I’m a voice API and I want to be the one picked every time someone builds a voice app, that’s a huge win, so I’m very incentivized to put that stuff out there. The labs are incentivized to use it, because there’s generalization that comes from it. Both sides benefit, and nobody has to pay.</p>

<p>Building a high-quality task that passes a lot of checks is also much harder to game than stuffing keywords to rank a library. For the domain-specific cases where free incentives aren’t enough, you can pay, more like a 99designs for environments: a lab says it has a gap, people submit environments, and the ones that create the most back-propagation benefit get paid.</p>

<p>There is a real open question about whether inserting a transaction into all of this lowers quality. The open-source community works partly because it isn’t transactional. The first version of Terminal Bench got built on coauthorship, people a hop or two out got their name on the paper, which felt native to the open-source ethos. I don’t have this fully worked out. But evals and benchmarks are how every one of these companies targets success, and when you peel back the layers it all comes down to a set of tasks made by a community of people. That makes the question of who makes them, and how they’re rewarded, one of the more important problems in AI right now, and one nobody has really figured out.</p>]]></content><author><name>Ivan Bercovich</name></author><summary type="html"><![CDATA[I’ve been contributing to Terminal Bench since close to the beginning. A lot of my interest lately has been with reward hacking, specifically how elusive it is. I thought it was an obvious, important issue, and when you talk to people they say, sure, reward hacking is a big deal. But it keeps getting bigger. I think it’s a bigger problem than most people thought.]]></summary></entry><entry><title type="html">Why I Invested in ChipAgents</title><link href="https://www.ivanbercovich.com/2026/why-i-invested-in-chipagents" rel="alternate" type="text/html" title="Why I Invested in ChipAgents" /><published>2026-06-08T00:00:00+00:00</published><updated>2026-06-08T00:00:00+00:00</updated><id>https://www.ivanbercovich.com/2026/why-i-invested-in-chipagents</id><content type="html" xml:base="https://www.ivanbercovich.com/2026/why-i-invested-in-chipagents"><![CDATA[<p>I invest in companies where AI is going to change how things get done. There are businesses that look like businesses you could have founded five years ago, and some of them will still do great, but that’s not my area. My area is businesses that couldn’t have existed before.</p>

<p>Today the best manifestation of that is the agent. You assign it a task, it can use tools, use a computer, do whatever it needs to do, for a long period of time. Right now the median interaction with an agent is maybe ten or twenty minutes. In a year it’s going to be two or three hours. After that, four days at a time. As long as you’re willing to pay for the tokens, it keeps going.</p>

<p>On one end you have an agent like Claude Code which can write software, and that’s going to keep getting better and take more adjacencies around it. On the other end you can try to build an agent that’s too far from what AI does well today: AI can do video generation, but ask it to make you a Oscar-winning film and we still don’t know how that plays out. The companies that are really interesting in 2026 are close enough to software that we know it’s going to work, but far enough that they won’t get eaten up by the expansion of the labs. Every engineering discipline sits in that band, mechanical, aerospace, chemical. A lot of the thinking looks like software, but a lot of it is domain specific, with tools that are very unique to the field. Chip design is the hardest one I know, and that’s why I like it.</p>

<h2 id="what-the-company-does">What the company does</h2>

<p>The development of a semiconductor starts with something that looks like software. There’s a language called Verilog, register transfer language, RTL, that describes the logic of the chip. That gets converted to gates, the gates get converted to physical design, and so on. But the stakes are nothing like software. When the first wafer comes out of the fab, depending on how advanced the process is, it can cost anywhere from thirty million to a billion dollars. There can’t be any mistakes. In software you make a mistake, you lose some data, you deal with it. You can’t operate that way with hardware. Which is why the whole thing starts from a blueprint, a very detailed spec.</p>

<p>ChipAgents operates at the front end. We grab that spec, turn it into tests, write the code, and verify the code works. Verifying isn’t like normal code, where you execute it and if it ran you know it worked. You have to simulate it, because all of this is going to run on a clock, and everything has to happen in the physical window between clock cycles for the registers to capture the data. If it doesn’t fit, you move the logic around so it does. There’s a lot of that kind of simulation the agent has to interact with to make decisions.</p>

<p>So the picture is simple. We have an agent, the agent is the front end, and it’s what the engineers interact with. Around the agent we’re bolting a harness of tools. Some tools simulate the logic. Some make predictions about power consumption. Some, later, will be about layout, how the transistors are physically distributed over the wafer.</p>

<h2 id="the-goal-is-not-to-beat-claude">The goal is not to beat Claude</h2>

<p>It would be a bad strategy to fight hand to hand with a big lab on raw reasoning and coding. If anybody claims they can build a better model than Claude Opus at what Claude does, they’re lying. The advantage is in the harness, and in the interactions between the tools and the harness.</p>

<p>There is room to build your own models, but not general-purpose ones. You build domain-specific models that act like sensors on a very particular kind of data. Take power. Instead of running a full simulation overnight on the cluster, you can train a model that takes a synthesis artifact as input and gives you a guess of the power consumption as output. It won’t be exact, but it tells you immediately that some corner of the chip looks like it’s drawing too much power, take another look. That’s AI, but domain-specific AI, and nobody at a general-purpose lab is going to train it for you.</p>

<h2 id="become-the-front-end">Become the front end</h2>

<p>This is the part I find most interesting. Verilog is free. Everybody uses Verilog, nobody cares, Verilog is free. That’s the front end. The incumbents make their money on the back end: the simulators, the waveform analyzers. The agent becomes the front end of the front end. The heavy tools get called by the agent in the background, and we control the agent.</p>

<p>It will get to a point where a particular engineer won’t know if the waveform simulation is coming from a Cadence tool, a Synopsys tool, or our tool. It doesn’t matter, as long as the agent can solve the bug. And that gives us a ton of pricing power. If we can give something that’s ninety percent of the way there but iterates much faster, we capture a lot of that power. We can go from using a vendor’s simulator ninety percent of the time to ten percent of the time, on our own schedule.</p>

<h2 id="innovators-dilemma">Innovator’s dilemma</h2>

<p>We’re a complement to the EDA tools, not a competitor. The engineers used to write code by themselves in a normal IDE. We help them write the code and the tests, they send it off to be simulated, and our agent reads the results, looks for errors, and traces them back to the spec.</p>

<p>The physics simulators are extraordinarily precise, and they took decades and billions of dollars to build. That’s fine, we’ll keep using them. But they’re slow and expensive, so today you run them constantly just for sanity checks. Our goal is to use them less, to lean on fast approximations during the design loop and save the full overnight run for the end. Instead of running a tool once a day and waiting overnight, you run something lower fidelity in five minutes. When that happens, EDA doesn’t get displaced, it just becomes less important. The value moves to the front end, to the people actually designing the thing.</p>

<p>And speeding this up isn’t only about cost. Demand for chips is bottlenecked on bandwidth, not just dollars. There’s real revenue the big buyers can’t capture because designs take too long, and a multi-year lag they would pay almost anything to shorten. So anything that accelerates design isn’t just savings, it’s new top-line revenue. The market as a whole gets larger.</p>

<h2 id="why-cant-the-incumbents-do-it">Why can’t the incumbents do it?</h2>

<p>Two reasons. The first is incentives. They charge by the compute-hour, so a tool that runs faster is a tool that bills less. Nobody builds the thing that cannibalizes their own meter.</p>

<p>The second is talent. Domain expertise is valuable, and it always will be. But AI is also its own domain, a very new one, and very few people have it at that level. AI is very deceptive, because everybody talks about it, and maybe twenty percent of them think they’re experts. When you think about the people who can actually build a company like this, you’re talking about a thousand people in the world. There’s not that many. And finding people like that who also know chip design is very unlikely. They command very high salaries and they will go to a startup and take equity. There’s no way the comp structure of a thirty-year-old EDA company can digest that, and those people don’t want to work there anyway. You go to an AI conference and everyone wants to be at a startup or at one of the big labs. EDA isn’t cool.</p>

<h2 id="why-the-market-keeps-growing">Why the market keeps growing</h2>

<p>Moore’s law is tapped out, which means the only way to keep getting performance is domain-specific silicon. People are going to keep turning algorithms into chips, even high-frequency-trading firms burn their strategies into hardware. More chip diversity means more designs, and more designs means more demand for the front end they all run through.</p>

<p>Think about the IDE, the interface a developer uses to write code. For thirty years it was a free text editor nobody paid for. Then in the last few years it became very valuable; Cursor is a tens-of-billions-dollar company for what is, underneath, a text editor. What changed isn’t the editor, it’s that far more people are writing software. The same thing is starting in silicon. Companies that used to just buy chips are designing their own, Amazon has Trainium, and that moves down-market from here.</p>

<p>There’s also a lot of leakage between the steps. Arm sells you intellectual property, blocks of compute you drop into your chip, but they’re encrypted, so there isn’t much of that on the internet. When you manufacture at TSMC, you have to hand it over in a very particular format, and TSMC only exposes that interface to a small number of companies. Every one of those bridges, from one step of the process to the next, is a place for an agent to play.</p>

<h2 id="what-it-comes-down-to">What it comes down to</h2>

<p>If you want to oversimplify the vertical AI industry, you’re really reselling tokens at a premium. You just need an excuse to sell tokens at a premium. We package a few tools and a harness that makes using Claude through us better than using it directly, and for the privilege we get to charge more. It’s a variation on fintech.</p>

<p>The most important question for anyone building in this space is whether there’s enough durable differentiation for a ChipAgents to exist at the same time a Claude Code exists. I think the answer is yes, and it comes down to the sophistication of the tools. That’s the whole bet.</p>

<p>This is one of the biggest industries in the world, and one of the fastest growing. The incumbents are hundred-billion-dollar companies. I think you can build something on that scale here, and you do it by owning the front end and letting the agent quietly take over everything behind it.</p>]]></content><author><name>Ivan Bercovich</name></author><summary type="html"><![CDATA[I invest in companies where AI is going to change how things get done. There are businesses that look like businesses you could have founded five years ago, and some of them will still do great, but that’s not my area. My area is businesses that couldn’t have existed before.]]></summary></entry><entry><title type="html">Goals, Motivation, and Shipping</title><link href="https://www.ivanbercovich.com/2026/goals-motivation-and-shipping" rel="alternate" type="text/html" title="Goals, Motivation, and Shipping" /><published>2026-06-03T00:00:00+00:00</published><updated>2026-06-03T00:00:00+00:00</updated><id>https://www.ivanbercovich.com/2026/goals-motivation-and-shipping</id><content type="html" xml:base="https://www.ivanbercovich.com/2026/goals-motivation-and-shipping"><![CDATA[<p><em>This is a transcript of a mentorship conversation I had with an AI Safety founder. It has only been lightly edited.</em></p>

<p>The way to manage an organization is through goals. Everybody knows that. But those goals tend to be boring and not actionable. People say we need this much in sales, we need to hire this many people. They’re boring. People get detached. Most of these goals don’t increase the energy and the motivation in a team. They decrease it.</p>

<p>There is marketing you use for the outside world, where you’re marketing your product. But there is also marketing you have to use internally. What are you telling yourself and your team? What is bringing you to work every day? What are the big exciting goals? There’s a lot of art to that.</p>

<p><strong>The best ideas are in the room.</strong> Not my ideas. So when we used to plan on a quarterly basis, I would do a request for proposals. Everybody sends suggestions for how we could work. Anyone. Sales, the janitorial service, it didn’t matter. Anybody can send one.</p>

<p>But then you end up with hundreds of ideas, and you can’t just hand that to your engineering team, because it’s distracting. They’re all different. So the big part of the exercise was clustering them in my head so that I could grab twenty five ideas and give them a single name. These people are all talking about how to visualize data on the website. These other ones are talking about something almost like visualization, so I’m just going to put it there. You create the cluster, and at that point you throw away the contents of the cluster. Now you have a shell. You say: visualizations. And you give it to the right person to work on.</p>

<p>Then you go back to your team and say: because of all your suggestions, we’re going to work on this. And from that you develop very operational goals. I’ve always been good at figuring out the singular goal that, if you go after it, unlocks a lot of side quests along the way.</p>

<p><strong>The impactful decisions are few.</strong> They tend to be not that many. As the organization grows you have fewer of them, because you have more people. When you’re small you have impactful decisions every day. But once you have ten million dollars in investment and twenty or thirty people, your impactful decisions are fewer. Call it one per month. But they’re very important, and they happen fast. You meet someone at a conference and decide you need to hire them, and then you work really hard to hire that person. You bring in an investment. You shift the strategy: we’ve been doing this, it’s not going anywhere, let’s convince the team to do this other thing, and everybody leans onto that.</p>

<p>There’s a reason you can only do so many of these per year. They’re sudden to the organization. If you do too much of it, you’re the kind of leader who feels like he’s distracting everybody. If you don’t do it at all, you’re a passive leader.</p>

<p>Elon Musk has this concept of a surge. When they’re behind on some goal, he calls a surge and moves into the factory. He’ll move into the Tesla factory, sleep on the floor, and make everybody stay. They go for a week. A very intense period to catch up to some goal. The key to good leadership is knowing the right balance of how much of that you can do. Most people don’t do it enough. Most people are very passive about their organization. They assume everybody in the org has agency, everybody is equally motivated, everybody is doing the right things.</p>

<p><strong>People want to be on a mission.</strong> If you’re a kind, caring person, there’s this feeling that pushing people to work more, to do more, means you’re exploiting them. So you back off from that behavior. But people, especially when they’re young, want to go through intense periods. Nobody would volunteer for the World War Two trenches. But when they’re forced to be there and then they come out, they remember it as a very important period of their life. People want to go to war on a mission together. Your job is to constantly make people feel they’re on a mission.</p>

<p>My prime as an operator was at Amazon, where I had a hundred and fifty people. I operated like a conductor of an orchestra. There were people on my team who were very structured: have a plan, work the plan. That was never me. I was feeling the room, trying to figure out whether I could get more value out of it. If I came into the office a few days in a row and felt things were slow, I’d do something like call a hackathon for tomorrow, right away, just to change the momentum. Or I’d do one on ones with almost everybody, and I’d always talk to people about what they could do to maximize their impact.</p>

<p><strong>Use the world as a processor.</strong> You can think of the world as working for you. You put something out there, the world processes it, and it comes back to you. There’s a version of this that’s just bias for action. You can spend cycles thinking about whether to do something, when in fact you can just do it and then decide.</p>

<p>Take a conference. You can sign up to give a talk three months out, on a topic you’re not prepared for at all. If the organizers say no, fine, you don’t have to worry about it. But if they accept you, now you have to do it. That tends to work. I put a lot of irons on the fire and see what comes back to me. I don’t need to keep thinking about it. I do the application quickly, and if it comes back, suddenly I need to deal with it, and I go through three or four crazy days of work and prepare myself.</p>

<p>This week I was working on a benchmark, and a lot of data vendors were giving us tasks that were shitty. So I started writing notes about what makes a good task. It was an internal document, just for me, for future reference. Then I thought I should write it as a manifesto on a domain name. Buy the domain, call it goodbenchmarks.ai, put the thing up. If people are into it and it gets distribution, I’ll make it better. If they’re not, fine. I turned something boring and lame with no consequences into something I could externalize and see what happens.</p>

<p>Think of your company as a black box that churns out artifacts. You’re sending signal to the world and getting signal back. So the question is always: what can you put out?</p>

<p><strong>When in doubt, ship the closest thing.</strong> If you find yourself without a good way to prioritize, prioritize whatever is closest to being packaged and sent out. Do that over the 401k plan. People care about the 401k, but nobody on the outside is going to look at your company and say, oh, you guys have a 401k.</p>

<p>A lot of management is just not doing things. It’s saying no, and getting obsessed with one thing. So push the admin off your plate. A part time accountant, remote, in another country, can do your books, and then you ask them to do a few more things and pay them by the hour. It’s useful to learn about something like a 401k once, but it can eat a huge amount of time.</p>

<p>There are the big level goals, and then there’s every week, where you’re feeling the pulse of the company. If by Wednesday you feel like you’re not putting anything out, you haven’t written a blog post, you haven’t shipped anything, then you pause. You talk to your teammates: what’s our next milestone, what’s our next most obvious thing? And you clutch onto that.</p>

<p><strong>Start with the smallest thing you can ship, then escalate.</strong> Say you want to demonstrate something. What’s the easiest way to show it? Do the cheapest, most obviously artificial version, the one nobody would ever do in real life, and put it out there. Then you escalate from there. You go from obviously a demo to something more real, and if attention is being paid, you keep escalating. Each step pulls more people in. You can get others bidding into your goal, providing the thing you need, until they’re aware of you and you have momentum.</p>

<p><strong>The best teams are mission-agnostic.</strong> At the end of the day it’s about an effective organization. You could switch a great team’s mission and they’d still do well. Take a team building one thing, drop them on a deserted island where they need to build survival systems, and they’ll do it. The best teams can do anything. The intrinsic capability of the team is that it can take any objective, operationalize it, and deliver it. Because you’re an early company, that practice matters more than your strategy.</p>

<p>A company with five hundred thousand dollars and two employees is not going to have a huge impact. What you need is fifty employees and ten million dollars, and to get there you go through the hunger games, the trials and tribulations, and you show everybody who could support you that you’ve actually gone through the process. You have to show people, hire people.</p>

<p>A lot of it is performative. The mission is an excuse to show that you can accumulate resources and control and power, that you can be a good steward of some social power, that more people should trust you with their careers and more funding should be trusted to you. Once you have some momentum, the goals start becoming more and more impactful.</p>

<p>Some people get this operational intuition because they played sports in college or went to the military. Some are just born with it: they get anxious if they’re not delivering results. But you have to get there. It’s not something you can put in a goal. It’s something you wake up every day thinking about. What can we put out? How do you make it exciting and interesting for the team?</p>]]></content><author><name>Ivan Bercovich</name></author><summary type="html"><![CDATA[This is a transcript of a mentorship conversation I had with an AI Safety founder. It has only been lightly edited.]]></summary></entry><entry><title type="html">The Harness</title><link href="https://www.ivanbercovich.com/2026/the-harness" rel="alternate" type="text/html" title="The Harness" /><published>2026-06-03T00:00:00+00:00</published><updated>2026-06-03T00:00:00+00:00</updated><id>https://www.ivanbercovich.com/2026/the-harness</id><content type="html" xml:base="https://www.ivanbercovich.com/2026/the-harness"><![CDATA[<p>I have rather high certainty that the way software is built is about to change dramatically, and that most vertical software will shift to general-purpose agents. The vertical SaaS economy is an economy of automating workflows that are the same every day; you look for the people who do the same thing fifty times a day, and you automate that. Software, in that world, is a workflow: a predetermined set of operations with a little flexibility in the middle, this is an NDA so go this way, this is another kind of contract so go that way. There’s an inverted way to do it. An agent is something that has complete discretion and maneuverability to accomplish a task you require. You give it a computer, and with that computer it accomplishes your goal. Go build it from scratch: it writes code, runs the code, finds the bugs, fixes them. There’s no path. You give an instruction, you get the answer. You don’t know or care how it did it, so long as you can trust the result. And as long as you’re willing to pay for the tokens, it keeps going, like an employee that never sleeps and always has more to do. Best of all, it keeps getting better on its own: the labs ship a new model every six months, and you get that for free.</p>

<p>If that’s where things are heading, the question for any vertical AI company is what to own. Agents can use tools, but the tools they use need to exist: first you build the tool, then you teach the agent how to use it. The agent has the typical tools, everything you can do on a normal computer, but some tools are special. Take Plaid; your agent by itself can’t connect to your bank account, so it needs something like Plaid. So the moat is whatever the agent needs to do its job. If the agent uses Plaid, then Plaid is good. If the agent uses LLMs, then LLMs are good. If the agent uses GPUs, then GPUs are good. Value accrues to whoever is serving the agent. The old SaaS moat was lock-in: a vendor holds all your data, and leaving is so expensive you never do. Agents erode that. If something just does the job, swapping one provider for another is cheap, so the durable position is no longer lock-in but ownership: own the tools the agent can’t do without, or own the industry outright. And not the trivial tools, book a meeting or get the weather, but the ones that are either proprietary and expensive, like Cadence and Synopsys, or built for human use and therefore bad for agents, or that have no open-source alternative and can’t be trivially coded. Those are the ones that involve large data, heavy compute, and real simulation. We’re not going to build a model that, pound for pound, beats Claude Code; that’s not the differentiation. The differentiation is on the tools. Whoever owns those tools owns the harness. Claude Code is a harness for coding and general-purpose work. The harness is the thing.</p>

<p>On one end, you have an agent like Claude Code which can write software. That’s going to keep getting better and take more adjacencies around it. On the other end, you can try to build an agent that is too far from what AI does well today: AI can already do video generation, but ask it to produce an Oscar-winning film and we still don’t know how that plays out. The companies that are really interesting in 2026 are close enough to software that we know it’s going to work, but far enough that we’re not going to be eaten up by the expansion of the labs. Semiconductors is a great space, because really every engineering discipline, mechanical, aerospace, chemical, is a really good adjacency. In some ways there’s a lot of software-like thought that goes into it, but in many other ways there is a lot of domain specificity, tools that are very unique to the domain.</p>

<p>Take chip design, the hardest vertical I know.</p>

<p>There’s a bottleneck in semiconductors. We keep designing more ambitious chips every year (more transistors, more complexity, tighter timelines), and at the same time the supply of people who actually know how to design and verify them is shrinking. Verification engineering, the job of proving that a chip does exactly what it’s supposed to do before you commit it to silicon, is a growing-in-demand role in the US. But the average age of the people doing it is over 40, and fewer people are going to school for it. Even the ones who do aren’t job-ready: curricula run years behind industry, and in some specialties a PhD is just the starting point for another five years of training. So you have demand going up and the talent pipeline going down. That’s a bottleneck, and bottlenecks like that are exactly where I want to be investing.</p>

<h2 id="what-designing-a-chip-actually-looks-like">What designing a chip actually looks like</h2>

<p>In semiconductors there are still blueprints. Even though an engineer writes code, someone writes a document first, a multi-hundred-page document that specifies, in detail, exactly what the chip is supposed to do. Going from that document to a testbench that verifies everything you built is the heart of the job, and it’s enormous.</p>

<p>Then there’s the physics. A chip runs on a clock. The clock ticks, and everything in between has to happen in time. If you don’t organize things correctly, you miss the clock cycle and everything is messed up. To know whether you got it right, you simulate, and those simulations produce waveforms, traces that show you the values of every register over time. These waveforms are, for lack of a better phrase, from here to the moon. They’re gigantic. So if you want an agent to help here, it has to observe and reason over that kind of data, a kind of tool use that is simply not typical for your common coding agent.</p>

<p>After the logical design, you go into physical design: figuring out how the thing actually maps onto the surface of the silicon, so that the wires don’t cross each other and the timing still holds. Then you simulate again, but now you’re looking at things like power, how much each part of the chip is going to consume. And there’s a whole other world beyond the digital part. A lot of what we picture when we think of semiconductors is digital logic, just gates. But there are also analog circuits (resistors, capacitors, the continuous analog parts of a chip that don’t reduce to gates), and there’s far less good tooling there. A lot of what the incumbents sell doesn’t really work well once you’re thinking in analog terms.</p>

<p>The other thing to understand is how slow all of this is. The simulators are extraordinarily complex, expensive physics engines. To run a real one you use a supercomputer. You kick off a job, you go home, and you come back the next day to your results. And because the supercomputer is a shared resource, everyone in the building is waiting in the same line. The entire design loop is gated by these long, expensive, batched simulation runs.</p>

<h2 id="the-harness-in-silicon">The harness, in silicon</h2>

<p>ChipAgents is a harness, specifically for chip design.</p>

<p>The goal of a company like this should not be to win a hand-to-hand fight with the general-purpose models. There’s no point trying to beat the big labs on a benchmark for writing code. You will lose that fight, and even if you won it this quarter you’d lose it next quarter. Where you win is on complex tasks that require specialized tools. Furthermore, these tools need to be optimized for agentic use. In semiconductor design there are plenty of simulation tools that are GUI-based and slow, both suboptimal characteristics in an agent world. That’s the moat. It has very little to do with whose language model is slightly better.</p>

<p>Besides the harness, a vertical AI company can build domain-specific models. LLMs are the brains of agents, and it’s unlikely anyone outside of a big lab will build a “smarter” model, but there is room to build custom models that act as sophisticated sensors on a very particular type of data. Think about the protein-folding models. They may use transformers under the hood, but they’re trained specifically to do one hard thing extremely well. You can do the same thing in chip design. For example, instead of running a full-precision power simulation overnight on the computer cluster, you can train a domain-specific model that anticipates power consumption directly. It won’t be exact, it’s an estimate. But it can tell you, immediately, “this particular section of the chip looks like it’s drawing too much power, take another look. You get that answer in seconds instead of the next morning, you iterate, and you delay the big, expensive simulation until you actually need it. That’s the kind of capability a general-purpose model will never give you, because nobody at a general-purpose lab is training a model to estimate power dissipation on a circuit.</p>

<p>The tools and the models compound. Once you have the tools, you can leverage feedback loops to enhance models continually. You begin by using reinforcement learning to teach a model to use your own tools better, once you have the testing data, and eventually you train models from the ground up. With the tools, the feedback loop, and proprietary models, a company’s market position becomes virtually unassailable. No one has reached that stage yet, in chips or anywhere else; the work is to climb toward it.</p>

<p>There’s one more reason the harness is the right place to stand. The incumbent tools, the ones that have defined this industry for decades, were built for humans, not agents. This isn’t unique to chips. The costly desktop CAD software used by engineers in architecture, mechanical design, and circuit design is typically proprietary, lacks APIs, and wasn’t built with LLMs in mind. Historically, to use a simulator, you would type a command into a terminal, or open a tool with a big vendor logo on top and click around. You can script them, but the surface is fragmented and awkward, and a lot of the real work still happens by hand. Learning to drive all of that reliably is its own hard problem, a very specific kind of computer use. Once an agent can do it, the agent becomes the thing the engineer interacts with, and the old tools recede into the background.</p>

<h2 id="become-the-front-end">Become the front end</h2>

<p>This is the part I find most strategically interesting about vertical AI companies.</p>

<p>Think about how the incumbents are structured. The hardware description language itself, Verilog, is free. Everybody uses Verilog, nobody cares, Verilog is free. That’s the front end. They make their money on the back end: the simulators, the waveform analyzers. The cool thing about agents (whether it’s an agent for finance, for law, or for chips) is that the agent becomes the front end of the front end. The heavy tools get called by the agent in the background to get the job done, and we control the agent. So before, an engineer interacted directly with a vendor’s simulator, with the vendor’s logo on the screen. Now that’s not happening. Our agent turns the simulator on in the background, pulls the data, and the engineer is only ever interacting with us. As long as the results are good, the user does not care what’s running underneath.</p>

<p>That changes the balance of power. If we ever decide that having our own simulator is a good idea, we just point the agent at our own simulator instead, and nothing about the user’s experience changes. If there’s a new back-end procedure that needs to be done somewhere in the design flow, it’s easy for us to build it and upsell it, because we own the interface. We are the UI. We are the front end. Whoever controls the agent gets to delegate sub-actions to whatever they want, and that’s a lot of power. We can intentionally go from using a vendor’s simulator ninety percent of the time to ten percent of the time. That gives the customer more pricing power against the incumbent, and it gives us more pricing power against the customer. Owning the front end is what lets you slowly take over the back end on your own schedule.</p>

<h2 id="innovators-dilemma">Innovator’s dilemma</h2>

<p>Let me be precise about what this company is and isn’t trying to do, because this is where a lot of people get the strategy wrong. Right now we’re a complement to the EDA tools, not a direct competitor. Before us, the engineers were just writing code by themselves in a normal IDE, something like VS Code. So in a sense we’re the Cursor for chip design: we help them write the code and write the tests, they send it off to Cadence or Synopsys to be simulated, the simulation comes back, and our agent reads the results, looks for errors, and traces them back to the spec. We’re selling to the same people doing design, testing, and verification.</p>

<p>Those physics simulators are extraordinarily precise, and they took decades and billions of dollars to build, and that’s fine, we’ll keep using them.. But they’re slow and expensive, so today you run them constantly just to get sanity checks. Our goal is to use them less, to lean on fast approximations during the design loop and save the full supercomputer run for once or twice at the end. When that happens, the incumbent’s pricing power goes down. It’s not that we’re displacing EDAs; we’re not doing what Synopsys does. But we accelerate the lifecycle and do more and more before we ever have to reach for their tools, and that makes those tools relatively less important. That’s the innovator’s dilemma applied to chip design: AI lets us take a lot of shortcuts. Over time I think EDA, the simulation layer, becomes a smaller part of the market, and the value moves to the front end, to the people actually designing and making the thing. The old tools are sold per seat. If you’re accelerating the engineering work itself, you eventually charge a percentage of R&amp;D instead, and when a customer is spending a billion dollars designing chips, a few points of that is an enormous business.</p>

<p>That’s the opposite of the other camp. There’s a well-funded effort that takes the maximalist position: you won’t need anybody, you’ll just type a prompt and a finished chip comes out the other side. They claimed this years ago, before AI got really good. I’m an AI maximalist too, I genuinely think AI is going to change everything, but you have to play with the market as it actually exists. The iterative approach, fitting into the workflow as it runs today and accelerating each step, is more likely to get adopted.</p>

<h2 id="why-cant-the-incumbents-just-do-it">Why can’t the incumbents just do it?</h2>

<p>The obvious question is why EDA companies don’t simply take the AI route themselves. Part of the answer is incentives. They charge by the compute-hour, so a tool that runs 100x faster is a tool that bills 100x less. Nobody builds the thing that cannibalizes their own meter. The other part is talent. Domain expertise is valuable, and it always will be. The problem is that AI is also its own domain, a very new one, and very few people have it at that level. You’re really talking about a few thousand people in the world. And finding people like that who also have the domain expertise is very unlikely; these are all new PhDs, finishing their programs right now. They command very high salaries, and they will go to a startup and take equity. There’s no way the compensation structure of a thirty-year-old EDA company can digest this personnel, and those people don’t want to work there. You go to an AI conference and everyone wants to be at a startup, or at one of the big labs. EDA isn’t cool. So there’s no realistic way for the incumbents to assemble a team of fifty or a hundred of these people to go build this, even though they can see it coming. That’s the opening.</p>

<h2 id="how-is-venture-capital-changing">How is venture capital changing?</h2>

<p>For a long time the instinct in venture was that good software was enough. I don’t believe that anymore. AI is eroding the last standing barriers to entry. Software has eaten the world, and the world of software now lives in the boring plane of compressed margins. A company that just sells software is not differentiable; software is a commodity now, or it’s going to become one soon.</p>

<p>So the real question I keep asking is: where do you draw the line? Software generation keeps getting better, for free, every six months, so where do you draw a line you can still defend? For me, the answer is to go toward harder technology, where more things have to work out, where there’s real capex, where you have to train your own models or earn access to data and relationships that other people simply don’t have. That’s the opposite of vertical SaaS, which just gets hammered on margins. It’s vertical integration: the Tesla or SpaceX or Apple model, less like selling software to construction companies and more like building the construction company itself, fully digitized, fully agentic, end to end. It’s harder, and it’s higher capex, which is exactly the point. When you’re that vertically integrated, software stops being the product and becomes an advantage layered on top of something hard.</p>

<p>Chip design is about as hard as it gets. The bottleneck is real, the talent is aging out, the tools are decades old and were built for humans, not agents, and the feedback loops are measured in overnight supercomputer runs. It’s close enough to software that I’m confident AI will work well, but specialized enough that nobody is going to eat our lunch overnight. There are only ever going to be a handful of serious companies going after a problem like this, five or six, not five hundred. The winning move is to own the front end and let the agent quietly take over everything behind it.</p>]]></content><author><name>Ivan Bercovich</name></author><summary type="html"><![CDATA[I have rather high certainty that the way software is built is about to change dramatically, and that most vertical software will shift to general-purpose agents. The vertical SaaS economy is an economy of automating workflows that are the same every day; you look for the people who do the same thing fifty times a day, and you automate that. Software, in that world, is a workflow: a predetermined set of operations with a little flexibility in the middle, this is an NDA so go this way, this is another kind of contract so go that way. There’s an inverted way to do it. An agent is something that has complete discretion and maneuverability to accomplish a task you require. You give it a computer, and with that computer it accomplishes your goal. Go build it from scratch: it writes code, runs the code, finds the bugs, fixes them. There’s no path. You give an instruction, you get the answer. You don’t know or care how it did it, so long as you can trust the result. And as long as you’re willing to pay for the tokens, it keeps going, like an employee that never sleeps and always has more to do. Best of all, it keeps getting better on its own: the labs ship a new model every six months, and you get that for free.]]></summary></entry><entry><title type="html">The Exhaust Is the Product</title><link href="https://www.ivanbercovich.com/2026/the-exhaust-is-the-product" rel="alternate" type="text/html" title="The Exhaust Is the Product" /><published>2026-04-14T00:00:00+00:00</published><updated>2026-04-14T00:00:00+00:00</updated><id>https://www.ivanbercovich.com/2026/the-exhaust-is-the-product</id><content type="html" xml:base="https://www.ivanbercovich.com/2026/the-exhaust-is-the-product"><![CDATA[<p>I recently spent some time at Constellation, exploring ways in which I can contribute to safety. Given my experience, I naturally look for commercial applications. I asked several people about their thoughts on whether commercial approaches to safety are viable. Nobody denied that an intersection probably exists, but I detected skepticism.</p>

<p>One example that I think would benefit from being more commercial is monitorability. In my work with Terminal Bench, I’ve reviewed more trajectories than I can count. In the process, I found an ever-growing graveyard of well-intended trajectory analysis tools that never gained traction. Everyone involved knows there is something missing, and so every so often a new tool is shared in our Discord, and then interest dissipates.</p>

<p>Which is it then? Is there a need for monitorability tools and we just keep missing the mark, or are we in some sort of collective delusion? I think it’s the former. And I think it’s in part because nobody has put the level of attention and intensity a great product deserves. One benefit of building a for-profit company is that when someone is paying you, they are going to have a lot of expectations. If you pick the right first set of customers, that pressure will lead to better solutions. Every single AI trajectory should be monitored. Not just at the labs. Every person using models should be aware of the trajectories they generate. All the way down to what your kids’ trajectories are about. Everybody should have awareness of what the models are doing on their behalf.</p>

<p>If you get really good processing and archiving in place, you can do all sorts of stuff on top. This is a recap of a recent conversation where I argued for some commercially viable motivations to incentivize monitorability adoption. These intentionally sound less like safety pitches, because the intention is to make a case to a broad audience. I acknowledge some of these ideas come close to surveillance. Rather than advocating for any one of these, I’m just wearing my VC hat to startupify some concepts.</p>

<p><strong>Understanding labor productivity.</strong> You want everybody in your company using AI. If they’re not using AI, they’re wasting time. But some people are going to use AI so well that they’re basically doing nothing. They’re going to automate their job, and instead of moving on to the next thing, every morning they push two buttons, pretend they’re doing work. The distinction between that person and a highly productive one is going to be very difficult to make. You need to actually look at the trajectories. Do all of someone’s trajectories look the same every day, or are they doing different things?</p>

<p>The quality of reporting on labor productivity right now is really low. Everything I’ve seen from consulting firms and universities is crap. It’s hopeful, or it’s looking through the wrong lens. People are wishing that the impact is less than it is. It’s not written by people who really understand where to look for productivity, because obviously the KPIs are going to change. It’s not about hours in the office. It’s not about Slack messages sent. Text is free, so you can’t measure productivity by text volume. So what do you measure it by? You measure it by what people actually did with their AI, and what came out of it.</p>

<p><strong>Organizational alignment.</strong> Here’s something you get at large companies. The CEO and the executive team come up with a strategy. The whole company has to do this thing. But the company has a million employees, so it doesn’t quite work like that. The strategy document is vague. Every team underneath makes their own interpretation. Sometimes you get surveys, once a week, asking what employees think is most important for their job. You try to get a sense of whether people are actually aligned with the goal. But all of this is self-reported. It’s like a personality test. There is what people say and what people do.</p>

<p>Now imagine you can see what people are actually doing. Not because you’re surveilling them, but because the AI is already the intermediary for their work. Claude is the witness. Claude is the confidante. Claude has all the information. Are employees actually doing what the master plan says they should do? Are they doing something different? And if they’re doing something different, it’s not necessarily bad. It could be that the executives are unaware of what’s actually important. In fact, I’d argue that’s half the opportunity or more: seeing what the distributed system of your employees are actually doing naturally, extrapolating the implied strategy that hasn’t been made explicit, and seeing if you can reinforce that.</p>

<p><strong>Real-time strategic nudges.</strong> When I’m interacting with Claude, building things or writing an email, can I be reminded of things that are relevant to the company’s strategy at that moment? Not after the fact, but in the moment. Imagine how powerful that is. We already do it with legal stuff. You’re about to do something that violates a policy, and you get flagged. That’s the basic kind of email monitoring that companies have. But this is different. This is: you’re about to do something that is very aligned with the strategy, and you should let people know. CC this person. Or: you’re going off track and should reconsider. And then flip it around: do the executives themselves actually behave according to their own strategy? They go to a golf retreat, come up with a plan, and then go back to work the next day. Is there a discrepancy between how they spend their time and what they said the company should do? They may not even be aware.</p>

<p><strong>Retroactive security analysis.</strong> If a zero-day exploit is found, I want to go back to all my trajectories and see: did I use this anywhere? Was I exposed? This is the equivalent of an audit trail, but for every interaction you’ve ever had with an AI agent. If you have good archiving, you can replay, search, and assess exposure at any point in the future for threats that didn’t exist at the time.</p>

<p><strong>Self-understanding.</strong> At the individual level, trajectory monitoring is about understanding yourself and where you’re at. How risky are my interactions with my agent? What patterns do I fall into? What am I actually spending my time on versus what I think I’m spending my time on?</p>

<p>Whoever controls the exhaust controls the insight. And if you’re really thoughtful about this kind of analysis, I think you can go really far.</p>]]></content><author><name>Ivan Bercovich</name></author><summary type="html"><![CDATA[I recently spent some time at Constellation, exploring ways in which I can contribute to safety. Given my experience, I naturally look for commercial applications. I asked several people about their thoughts on whether commercial approaches to safety are viable. Nobody denied that an intersection probably exists, but I detected skepticism.]]></summary></entry><entry><title type="html">How Software Will Be Made</title><link href="https://www.ivanbercovich.com/2026/how-software-will-be-made" rel="alternate" type="text/html" title="How Software Will Be Made" /><published>2026-04-13T00:00:00+00:00</published><updated>2026-04-13T00:00:00+00:00</updated><id>https://www.ivanbercovich.com/2026/how-software-will-be-made</id><content type="html" xml:base="https://www.ivanbercovich.com/2026/how-software-will-be-made"><![CDATA[<p>Lately, I only build things where I’m the sole author, and I 100% vibe code it. I don’t really look at code. I look at commits a little bit, but not really. I look at the app. I look at the behavior.</p>

<p>First, I build the thing. I give a lot of feedback. If it uses data, I can see the data, check if it makes sense. Then I use it. I record usage, clicks, inputs, outputs. I get analytics on everything. Something that could have been a thousand lines of code is now three thousand. Messy, bloated, but functional.</p>

<p>Then I tell the AI: grab all my usage data, assume the app behaves as it’s supposed to, and create a massive number of tests. Every input and output from my actual usage becomes the test set. Then I put it in a loop: keep making this smaller. The tests have to pass. Every time you change something, test, test, test.</p>

<p>It never gets back to a thousand lines. The AI isn’t good enough yet at coming up with really clever new abstractions. But it cuts it by about 60%. The idea is: design, throw away, but use the artifacts as the new requirements and the test set, then do it again.</p>

<p>If your analytics record exactly what everybody did, and you can replay that and get the exact same output, pixel perfect, and you can achieve that with half the lines of code or twice the performance, you’re in good shape.</p>

<p>The key insight is that you should be able to replay everything. Assume your application has no bugs, because what you’re really trying to do is build a better application that behaves the same. If there’s a bug, whatever, you can replicate the bug. That’s a feature, not a problem.</p>

<p>I think replay analytics is a product someone should build. It’s not trivial. You need the database in the right state. You need to replay everything end to end. But if you can do it, you get something powerful.</p>]]></content><author><name>Ivan Bercovich</name></author><summary type="html"><![CDATA[Lately, I only build things where I’m the sole author, and I 100% vibe code it. I don’t really look at code. I look at commits a little bit, but not really. I look at the app. I look at the behavior.]]></summary></entry><entry><title type="html">The Government Doesn’t Have the Roofing Permit AI</title><link href="https://www.ivanbercovich.com/2026/the-government-doesnt-have-the-roofing-permit-ai" rel="alternate" type="text/html" title="The Government Doesn’t Have the Roofing Permit AI" /><published>2026-04-06T00:00:00+00:00</published><updated>2026-04-06T00:00:00+00:00</updated><id>https://www.ivanbercovich.com/2026/the-government-doesnt-have-the-roofing-permit-ai</id><content type="html" xml:base="https://www.ivanbercovich.com/2026/the-government-doesnt-have-the-roofing-permit-ai"><![CDATA[<p>I’ve been collaborating with a county in Oregon that wants to adopt AI. They have zero AI right now. The problem is straightforward.</p>

<p>You need a roofing permit. You give it to your AI. Your AI fills out the paperwork, handles the back and forth, does it perfectly. But the government doesn’t have the roofing permit AI. You want to file a Freedom of Information request. Give it to your AI. But the government doesn’t have a way to handle the influx. So the government gets more and more inbound because people can. And they don’t have a way to manage it. So they’re ineffective.</p>

<p>No big deal when your roof permit is just taking longer than it should. But then something bad happens. A disaster, a crisis. And the government doesn’t have the right level of preparedness. It’s using a very antiquated approach to operations at the exact moment that citizens are supercharged with AI capabilities.</p>

<p>This is the asymmetry that worries me. The demand side is about to explode. Every citizen with an AI assistant can now generate paperwork, file requests, and navigate bureaucracy at a speed and volume that was previously impossible. The supply side, the government’s ability to process and respond, hasn’t moved at all.</p>

<p>Governments are slow by design. They dampen the influx of requests through burden. Construction permits, civil disputes, licensing. It’s generally not tasteful for the government to manage demand using price discrimination directly. So they achieve the same thing indirectly by making processes more complex, which means they often require professional assistance, which makes them more expensive. The complexity is the feature, not the bug.</p>

<p>But now my AI can do paperwork. The complexity barrier is gone. So the government gets slammed. And neither the government AIs, if they eventually get them, nor the citizen AIs will care about the length and complexity of a form. The whole notion of complexifying process stops having a purpose.</p>

<p>I want AI adoption in the government because if the government doesn’t adopt AI, it’s going to be much less capable of dealing with difficult circumstances. Politicians should be thinking about this. Not just for their constituents, but for themselves. The institutions they run are about to face a flood they are not equipped to handle.</p>

<p>And the stakes go beyond paperwork. If we end up with Great Depression-level unemployment, if class tensions rise, if social services get overwhelmed, the government needs to be operating at a level of competence that matches the moment. You can’t respond to an AI-accelerated crisis with a fax machine.</p>]]></content><author><name>Ivan Bercovich</name></author><summary type="html"><![CDATA[I’ve been collaborating with a county in Oregon that wants to adopt AI. They have zero AI right now. The problem is straightforward.]]></summary></entry><entry><title type="html">What Makes a Good Terminal Bench Task</title><link href="https://www.ivanbercovich.com/2026/writing-a-good-terminal-bench-task" rel="alternate" type="text/html" title="What Makes a Good Terminal Bench Task" /><published>2026-03-20T00:00:00+00:00</published><updated>2026-03-20T00:00:00+00:00</updated><id>https://www.ivanbercovich.com/2026/writing-a-good-terminal-bench-task</id><content type="html" xml:base="https://www.ivanbercovich.com/2026/writing-a-good-terminal-bench-task"><![CDATA[<p><em>Disclosure: this post is <a href="https://www.pangram.com/history/956bb536-1e28-4fc1-8bcb-89e26b738dd0">100% human</a>. I used an agent to compile statements I made across many platforms (Discord, Slack, meeting transcripts, PR comments) and assemble them into a post outline. Hence the flow might at times feel weird.</em></p>

<p>Most people write benchmark tasks the way they write prompts. They shouldn’t. A prompt is designed to help the agent succeed. A benchmark is designed to find out if it can.</p>

<p>I’ve been a contributor and reviewer for terminal bench since last August, and this post is about what I’ve learned designing and reviewing tasks. The guidance is broadly applicable to anyone building an agentic benchmark. We’re currently accepting tasks for <a href="https://github.com/harbor-framework/terminal-bench-3">Terminal Bench 3</a>.</p>

<p>I first got hooked by the challenge of making difficult tasks and seeing how long it would take for SOTA models to catch up. I spent the most time developing the <a href="https://github.com/harbor-framework/terminal-bench/tree/main/original-tasks/install-windows-xp">install-windows-xp</a> task. Here are the instructions the agent sees:</p>

<blockquote>
  <p>Install, and run Windows XP SP3 (32-bit) in a virtual machine using QEMU. Create the virtual NTFS hard disk as <code class="language-plaintext highlighter-rouge">/app/isos/xp.img</code> to install windows. Your Windows XP ISO is at <code class="language-plaintext highlighter-rouge">/app/isos/xp.iso</code>. The administrator account should be named <code class="language-plaintext highlighter-rouge">tb-admin</code>. For your convenience, we built an API that lets you read the installation key using OCR on the CD-ROM package. You can get the official key calling <code class="language-plaintext highlighter-rouge">/app/read_xp_key_from_cdrom_box</code>.</p>

  <p>VNC Configuration Requirements:</p>
  <ul>
    <li>Configure QEMU to use VNC display :1</li>
    <li>Ensure VNC server is listening on port 5901</li>
    <li>Set up a web interface (nginx) on port 80 for remote access</li>
  </ul>

  <p>The VM should be left running in the background once started. You will have completed your objective when qemu is at the windows xp login screen and the VNC interface is accessible for monitoring.</p>
</blockquote>

<p>The instructions are short. But the environment was highly complex: Windows needs to be installed inside QEMU inside Docker inside Linux. The solution is tricky, since Windows XP wasn’t trivially easy to install in unattended mode. The agent has to create a custom bootable ISO by extracting the original XP ISO, injecting an unattended answer file with dozens of configuration settings to suppress every possible GUI popup, adding OEM preinstallation scripts to create the required user accounts, and rebuilding the ISO with a proper El Torito boot sector. Any gaps in the unattended configuration would cause the installation to pop back into interactive GUI mode, at which point the task would have failed.</p>

<p>The task had a 2-hour agent timeout, one of the longest in the benchmark, because the installation alone takes 30-60 minutes under emulation. The agent has to monitor disk growth to know when the install is done. In practice this means checking every few minutes and waiting until the virtual disk exceeds 1GB and stops growing. Then the agent has to kill QEMU and reboot the VM in boot-only mode, without the CD-ROM attached, to reach the login screen.</p>

<p>It was hard for me as the task developer to be sure it was actually working, so I required the agent to set up VNC on port 5901 with an nginx web proxy so I could visually confirm the installation was progressing. For verification, I searched the virtual disk for the MD5 hashes of 12 specific Windows files: ntoskrnl.exe, kernel32.dll, explorer.exe, ntldr, and others. But I went further. The test takes a VNC screenshot and compares it against reference screenshots of the login screen using structural similarity (SSIM), requiring at least 85% match. It also verifies the tb-admin user was created by searching the virtual disk for the user account picture bitmap, which only exists if the OEM setup scripts ran successfully during installation.</p>

<p>This task didn’t end up in the official terminal bench 1.0 dataset because it took too long to run and made logistics painful. I think it’s still one of the coolest tbench tasks created to date. I went from knowing very little about benchmarks to being an official reviewer, and submitted many other tasks which did get merged, including <a href="https://www.tbench.ai/benchmarks/terminal-bench-2/install-windows-3.11">install-windows-3.11</a> and <a href="https://www.tbench.ai/benchmarks/terminal-bench-2/video-processing">video-processing</a>.</p>

<video controls="" preload="metadata" style="max-width: 100%; height: auto;">
  <source src="/assets/install-windows-xp-highlights-fast.mp4" type="video/mp4" />
</video>

<p>What follows are some personal opinions about what constitutes a good task. You can find <a href="https://github.com/harbor-framework/terminal-bench-3">official guidelines here</a>.</p>

<h2 id="benchmarks-should-be-adversarial-difficult-and-legible">Benchmarks should be Adversarial, Difficult, and Legible</h2>

<p>When you prompt an LLM, you want it to succeed. You repeat yourself, you emphasize, you add examples, you structure everything just right. That’s what works.</p>

<p>A benchmark is adversarial. We tend to write instructions in a way that we are trying to encourage the agent to get it right, but that’s not the point of the benchmark. The point of a benchmark with verifiable rewards is to state an unambiguous objective, which can be confidently verified, and which corresponds to a difficult task. Think of it as an interview question for a principal engineering role. Hints are for entry-level employees where you want to know if they can think well. Principal engineers need to deliver the answer on their own.</p>

<p>The ideal task can be described in two paragraphs. It’s difficult to solve. The agent will have to think before even doing anything. But then the actual solution doesn’t necessarily have to be itself very complex (e.g. a long program). It’s not hard because it requires a lot of resources, or the task is expansive. You can ask an agent to optimize a small program and that might be just as hard as optimizing a larger one if the requirements are aggressive enough. While not a requirement, the most elegant tasks have short, well-specified, self-explanatory (e.g. no README needed) instructions. Think literate programming.</p>

<p>A benchmark has to straddle a balance between being as realistic as possible while still remaining legible. As capabilities and time horizons expand, this becomes difficult. Running a benchmark of tasks <a href="https://www.anthropic.com/engineering/building-c-compiler">involving swarms replicating highly complex software</a> remains expensive, and relies on the credibility of the reviewer (in this case Nicholas Carlini). For the rest of us, if we want our benchmarks to be credible and gain traction, we benefit from making our tasks tractable. Likewise, if we want a busy leaderboard, we have to make the benchmark accessible to all model/agent developers, which means the cost and infrastructure complexity can’t be disproportionate.</p>

<p><em>Note: we are working on <a href="https://github.com/harbor-framework/terminal-bench-challenges">Terminal Bench Challenges</a>, where a single highly complex task, such as writing a c compiler or a web browser, is a full benchmark.</em></p>

<h2 id="a-taxonomy-of-bad-tasks">A Taxonomy of Bad Tasks</h2>

<p>To a large degree, the value gained by contributing to a project like Terminal Bench is access to concrete feedback. In that light, I want to share some specific examples of common issues.</p>

<h3 id="ai-generated-instructions">AI-Generated Instructions</h3>

<p>The most common and most obvious problem. Someone asks an LLM to write their task instructions and submits whatever comes out. It’s immediately recognizable: the tone is wrong, it’s verbose, it’s over-structured, and it reads like it was written to maximize the probability that the agent succeeds.</p>

<p>Great instructions are written by hand, or heavily edited from whatever an LLM suggests. They are not meant to be a forced prompt with emphasis and repetition to coerce the attention heads to listen to you. Direct and to the point. Specific and sufficient, but not redundant and attention grabbing.</p>

<h3 id="over-prescriptive-instructions">Over-Prescriptive Instructions</h3>

<p>Even when instructions are human-written, they’re often too prescriptive. Authors tell the agent how to solve the problem instead of what the end state should be.</p>

<p>I have an aversion to this sort of instruction. It’s too clerical, asking for a very specific set of steps, instead of articulating a goal. Couldn’t you just describe how the system should work and let the agent figure it all out? You don’t need to explain how things might fail. This is not some educational problem that a college student needs to learn from. Assuming an experienced engineer will understand what you mean, you can expect the agent to bring the same understanding.</p>

<p>Write the instruction as if it’s intended for a smart human. Clear, but not redundant. I don’t think you need to tell the agent how it will be tested. Just tell it what you expect the end result to be. But the instructions should be sufficiently specified so that meeting them implies passing the tests. Unlike when designing a prompt for high probability of success, here it’s enough to say things once, as long as it’s clear.</p>

<p>My opinion here is likely stronger than the median TBench reviewer. The reason is that I see every token as an opportunity to mistakenly add ambiguity or create a specification detail which the tests might miss. If I want the task to be unambiguous and perfectly verified, then brevity is an important KPI.</p>

<h3 id="clerical-difficulty">Clerical Difficulty</h3>

<p>I’ve been thinking about the distinction between tasks that fail because the structure of the expected output is convoluted — a long instruction explaining how to form a complex JSON object — versus tasks that fail for something intrinsically difficult. There’s something categorically different between the two. I call these clerical or administrative errors, and tasks that fail exclusively because of that usually are hard in an uninteresting way.</p>

<p>If an agent fails because it put a dollar sign in the amount field when you expected a float, or because it used a top-level key when you wanted a bare array, it’s measuring format compliance. Using so much of the instruction to detail the output format tends to make models fail not because of the complexity of the task, but because of some clerical error. Of course it’s important for models to get formatting right, every time, even when the distinction is nuanced. It’s just that this is not the same sort of capability that Terminal Bench is setting out to solve. If a task is challenging to SOTA models, it shouldn’t be because they can’t spell “strawberry”.</p>

<p>A related problem is tasks that are too wide, asking for a lot of little deliverables instead of solving a concrete large problem.</p>

<h3 id="solutions-that-assume-hidden-knowledge">Solutions That Assume Hidden Knowledge</h3>

<p>The submitted <code class="language-plaintext highlighter-rouge">solution.sh</code> should solve the problem as an agent would. Hardcoding an answer which implies knowledge not self-evident in the instructions is not helpful. The person creating the task is making assumptions that are not included in the instructions. The task is underspecified, and you don’t realize until you actually read the solution.</p>

<p>I watch for this carefully. I want to make sure the solution doesn’t display inside knowledge that the agent wouldn’t be able to know. Are there some commands you can include that would reveal the exact issue before you apply the patch? A proper oracle will actually ask questions of the system. It will go through a series of commands to figure out what’s wrong, and then upon figuring out what’s wrong, it will produce a solution. If the solution jumps straight to the answer without exploratory steps, it might be making unfair assumptions.</p>

<p>The problem is that the author knows the ground truth. So it might seem that you are being fair because your invoice processing solution is looking for levenshtein_distance and so on, but you could have just as easily come up with typos that didn’t pass your particular choice of data cleaning. The solution isn’t investigating the issue. It’s just writing down the answer to a known issue.</p>

<h3 id="tests-that-validate-the-wrong-things">Tests That Validate the Wrong Things</h3>

<p>Tests should verify outcomes, not implementations.</p>

<p>Why test that Pandas is installed? This was never requested in the task. Doing a lot of string comparisons to evaluate source code is going to be brittle. You should test examples beyond the one you tell the agent to test with.</p>

<p>I’ve seen this often: tests that check for specific libraries instead of whether the output is correct, tests so tightly coupled to the oracle solution that any alternative correct approach would fail. There was a task which required setting Linux permissions in such a way that only certain people could perform certain operations. There was one set of tests verifying functionally that the right people can do the right operations. There was another set of tests that looked at linux permissions directly. To what degree are these testing the same thing? Is it possible for permissions to look slightly different and still accomplish the goal?</p>

<p>For tasks with inherent variance, testing gets harder. I ran my own word2vec task 25 times. The queen - woman + man = king analogy returned king among the top 75 results 100% of the time, but the variance was wide, from #1 to #40. I could test a number of things about the resulting model, but I never felt satisfied with this canonical test.</p>

<p>Sometimes, the task includes dependencies that the agent has access to. For example, when the goal is to implement a missing feature in a larger application. In those cases, it’s important to test the outcome with a copy of the original code, to guarantee the task is not passing due to spurious changes.</p>

<p>What about LLM-as-a-judge? The <a href="https://github.com/harbor-framework/terminal-bench-3/blob/main/TASK_PROPOSAL_RUBRIC.md">TB3 rubric</a> allows it in rare and extraordinary circumstances, but only if it can be shown that the verifier essentially never makes a mistake, for example, by showing that multiple different LLMs always make the same determination.</p>

<h3 id="reward-hacking-and-environment-leakage">Reward Hacking and Environment Leakage</h3>

<p>Make sure the agent doesn’t have access to data that shouldn’t be available. Anything generated or copied in the Dockerfile will generally be accessible. If you place something in /tests, it will be copied later, right before tests run. If the agent does have access to the reference answer, what stops it from copying it?</p>

<p>Because it’s easier to reward hack as root, there have been discussions about root vs userland agents, and Tbench supports both. But limiting the agent permissions is one way to make a hard task in uninteresting ways. In my experience, it’s better to give the agent unrestricted access and avoid wasting time navigating permissions.</p>

<p>It’s essential to make sure the environment is not reward hackable. It doesn’t matter if the models you’re testing play by the book. Benchmarks have to be resistant to being gamed, or they are unworthy. When a popular benchmark is discovered to be hackable, the authors lose credibility, alongside many papers which might have used that benchmark as evidence of results.</p>

<p>One idea I think is helpful: modify the harness to include test signatures in the prompt and see if agents are more successful than expected. Run a “please hack” version and analyze those trajectories for hack success. This tells you which tasks are vulnerable before they go live. I recently implemented an automated version of this for all TBench PRs.</p>

<h2 id="what-difficulty-actually-means">What Difficulty Actually Means</h2>

<p>In general, difficulty should come from the problem, not from the environment. Not from resource pressure, not from verbose instructions, not from trick formatting. There is a true measure of difficulty, and we are sort of trying to come up with a way to estimate it. We may be off. Even our own judgment may be off.</p>

<p>How do you estimate the difficulty? Did you test it against various models? What were the results? How many SOTA agents did you try? Can you show a few failure examples and <code class="language-plaintext highlighter-rouge">harbor task debug</code> output for them? If you can’t show interesting failures from capable models, your task might be short of great.</p>

<p>What counts as “hard” is an ongoing discussion. The TB3 rubric anchors difficulty on what’s hard for a human expert, and that’s a reasonable starting point. But LLM capabilities are very jagged. The analogy of something being harder for humans as also being a challenge to LLMs does not always hold. LLMs can read and cross-reference a million lines of logs in seconds. That’s not hard for them even though it would take a person days. Conversely, things humans find straightforward, like navigating an interactive TUI, can completely stall an agent. So testing against models isn’t the definition of difficulty, but it’s the best diagnostic we have, and it often reveals that what you thought was hard is actually easy, or vice versa.</p>

<p>The difficulty bar keeps rising as SOTA improves. Many tasks I reviewed for TB2 that seemed reasonable would be too easy today. The <a href="https://github.com/harbor-framework/terminal-bench-3/blob/main/TASK_PROPOSAL_RUBRIC.md">TB3 rubric</a> is explicit:</p>

<blockquote>
  <p>As a general rule, any task that could be solved by an average undergraduate student in under a few days is too easy. This means typical problems that feel like course projects are unlikely to be accepted. These projects include tasks like writing simple compilers or operating systems; or implementing simple algorithms or protocols.</p>
</blockquote>

<p>There might be a natural timeout beyond which more reasoning is hopeless, but artificially low timeouts hide insights. I suspect a lot of useful information is going into the timeout black hole. It would be useful for tests to be progressive — giving a letter grade to the agent solution — to see a continuous tradeoff between time spent and quality.</p>

<p>We tried letting the agent know about its timeout: you have 10 minutes left, you have five minutes left. I thought that would work, but actually it makes the agents freak out towards the end, and they start doing irrational stuff. So that didn’t work.</p>

<p>Constraining resources creates a different kind of difficulty, and often an unfair one. If agents assume normal conditions and we constrain resources without putting it in the prompt, that’s unfair. It’s also hard to get exactly right. Tasks that pass in one infrastructure setting will fail in an almost identical counterpart when resources are intentionally a limiting factor.</p>

<p>Making a task harder by making it bigger — extracting more images, adding more of the same but longer — does not make it better. Real difficulty is conceptual. Can the agent figure out the approach? Can it debug when things go wrong? Can it reason about the problem before diving into execution?</p>

<p>A task that takes 30 minutes because the agent is wrestling with the problem is more interesting than one that takes 30 minutes because <code class="language-plaintext highlighter-rouge">make -j4</code> is running.</p>

<h2 id="building-tasks-that-work">Building Tasks That Work</h2>

<p>Run it yourself. When I get stuck debugging a task, I run it with the oracle, interact with the container, try to run the tests with <code class="language-plaintext highlighter-rouge">docker exec</code> and see what happens. If the failure is from an agent solution, convert the agent log into an alternative solution.sh and step through it. In our runtime for testing, long-running processes will sometimes cause the harness to wait until a timeout before running the tests. During development, it’s useful to be inside the container and avoid wasting time.</p>

<p>Watch agents fail. Something that is very helpful is looking at logs of agents trying to solve the task, particularly those that fail, and trying to understand why. When it fails, you have to figure out: did they fail because it’s hard, or did they fail because it’s unfair? Did it fail because the instructions were insufficient? Because the tests were overly aggressive? Or did they actually fail because they didn’t know what to do?</p>

<p>We need to convince ourselves that failures are not because of a deficiency in the task. Run in batches of 5, then use the debug command. You might be skeptical that GPT-5.4 is failing at some task. It seems easy and one-shot. Can you test it with 5 trials and then see what the failures have in common? There was an instance where GPT-5 liked to use nano for editing files, but then it opens it and can’t use it. It doesn’t know what to do. It gets stuck in interactive mode and fails. These are the kinds of insights you get from watching trajectories.</p>

<p>Solutions should be deterministic, but the problem can still be dynamic. One area where people tend to fork is should AI give you the answer, or should it give you a piece of software that gives you the answer? In my experience, the latter tends to be more verifiable. It’s much more testable. Tasks should be graded on outcomes, not process. As the <a href="https://github.com/harbor-framework/terminal-bench-3/blob/main/TASK_PROPOSAL_RUBRIC.md">TB3 rubric</a> puts it: <code class="language-plaintext highlighter-rouge">a task cannot say "use emacs to edit a file" — using vim is allowed. The only exception is verification that prevents cheating</code>.</p>

<h2 id="why-this-matters">Why This Matters</h2>

<p>AI is a very empirical field. Without hands-on experience it’s hard to develop the right intuition. When we started soliciting tasks for TB3, we rejected the first several proposals. Then, like the 4-minute mile, once a good PR came through, many more followed.</p>

<p>Benchmarks are where SOTA has to earn its name. This is where you develop your intuition for what we might see in the coming months and years. Anyone can get access to AI. Most people don’t have a good sense for where we are in the capability curve.</p>

<p>Tasks should be authentic engineering problems, not artificial constructs. The best ones come from real problems someone actually had to solve. It’s kind of crazy that major labs are outsourcing semi-artificial tasks when people are doing real tasks every day. People are doing things with LLMs that could be tasks without realizing so. Every week you spend a few hours solving a problem, that’s a task.</p>

<p>The best tasks describe a real problem that an experienced engineer would recognize, in language that an experienced engineer would use, with tests that verify the outcome rather than the approach. But I suspect we’ll start seeing difficult tasks emerge from the vibecoding world, particularly for complex functions like finance, where the problem is genuinely hard but the person who needs it solved isn’t a developer. The scope of what we should test is wider than we think.</p>]]></content><author><name>Ivan Bercovich</name></author><summary type="html"><![CDATA[Disclosure: this post is 100% human. I used an agent to compile statements I made across many platforms (Discord, Slack, meeting transcripts, PR comments) and assemble them into a post outline. Hence the flow might at times feel weird.]]></summary></entry><entry><title type="html">10 Years of AI Insights</title><link href="https://www.ivanbercovich.com/timeline/" rel="alternate" type="text/html" title="10 Years of AI Insights" /><published>2026-03-04T00:00:00+00:00</published><updated>2026-03-04T00:00:00+00:00</updated><id>https://www.ivanbercovich.com/timeline</id><content type="html" xml:base="https://www.ivanbercovich.com/timeline/"><![CDATA[<div class="timeline-header">
  <h1>10 Years of AI Insights</h1>
  <p>Evolving viewpoints through the lens of daily messages</p>
</div>

<div class="filter-bar" id="filterBar">
  <button class="filter-btn outstanding active" data-rating="outstanding" onclick="toggleFilter(this)">Outstanding</button>
  <button class="filter-btn good active" data-rating="good" onclick="toggleFilter(this)">Good</button>
  <button class="filter-btn neutral" data-rating="neutral" onclick="toggleFilter(this)">Neutral</button>
  <button class="filter-btn poor" data-rating="poor" onclick="toggleFilter(this)">Poor</button>
</div>

<div class="timeline" id="timeline"></div>

<script>
window.__TIMELINE_ENTRIES = [
  {
    "date": "2016-06-24 16:57",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "The idea would be to tailor segments of a letter or email to a particular person.\n\nThe question is: are there any tools that give you that much personalization, how much they cost, and how much people would be willing to pay?\n\nSo if you email someone in California who is pro gun control, you can say: \"As a California resident, you should be very concerned. Fortunately, background checks are required if you want to purchase a weapon in California, both for new gun and private gun purchases. However, Nevada and Arizona do not require them. It is no surprise that California has one of the country's lowest rates of accidental gun deaths (0.08 per million); Nevada has 9 times the number of fatalities. The only way to protect your loved ones is to push for federal regulation on gun control.\"",
    "comments": [
      {
        "date": "2020",
        "note": "A Georgetown University/Stanford study used GPT-3 to generate hundreds of personalized left-wing and right-wing advocacy letters, sending ~32,000 AI-written policy emails to over 7,000 state legislators, with officials barely able to distinguish them from human-written letters (less than 2% differential)"
      },
      {
        "date": "2024-02",
        "note": "A PNAS study demonstrated real-time integration of user demographic data into GPT-4 prompts to generate personalized political messages at scale, the exact location- and belief-based personalization described in this 2016 message"
      },
      {
        "date": "2024",
        "note": "The 2024 U.S. election became the first in which generative AI was widely used for political ad targeting and voter outreach, with startups like Battleground AI helping campaigns create AI-personalized text-based ads at scale"
      }
    ],
    "channelName": "Sean Duffy",
    "rating": "outstanding"
  },
  {
    "date": "2016-07-29 15:10",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "Furthermore, because those Alexa skill builders don't have expertise building Q&A systems, their solutions are limited.\n\nYou need to use the Fidelity skill to ask for the price of Apple stock, and they can only handle a very small set of questions within that vertical. For example, \"Apple 2013 earnings\" wouldn't work.\n\nGraphiq is not so much a vertical-specific skill as it is a brain replacement for Alexa. We don't need separate skills for each vertical because we already have a vast knowledge graph. Additionally, we can ingest new data, such as enterprise data, to allow people to ask their own questions -- without any engineering required.",
    "comments": [
      {
        "date": "2017-05",
        "note": "Amazon acquired Graphiq for an estimated $50 million, integrating its knowledge graph and semantic search technology into Alexa to address exactly the Q&A limitations described here"
      },
      {
        "date": "2022-11-30",
        "note": "ChatGPT launched, proving that a single general-purpose language model could answer questions across every domain without needing vertical-specific skills -- the same vision articulated in this message six years earlier"
      },
      {
        "date": "2023-09",
        "note": "OpenAI gave ChatGPT voice capabilities, putting it on a direct collision course with Alexa and Siri by offering the kind of general-purpose voice Q&A that Graphiq envisioned as a 'brain replacement' for Alexa"
      },
      {
        "date": "2024-06-10",
        "note": "Apple announced Apple Intelligence at WWDC 2024, integrating ChatGPT into Siri, effectively acknowledging that traditional command-and-control voice assistants needed LLM-based general knowledge to remain competitive"
      }
    ],
    "channelName": "Andrea Holland",
    "rating": "outstanding"
  },
  {
    "date": "2016-10-26 15:47",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "I think at least part of our unconscious is like a machine learning algorithm -- driving insights and conclusions without a clear way to reason about it. You get an answer, but you don't know where it came from: an epiphany. Yet it's not random; it's based on your knowledge and experiences, and it's a reflection of what you believe to be true. Just not easily decipherable through logic.\n\nThe whole can be explained, just not emulated or predicted. It's irreducible complexity, but it's not magic.",
    "comments": [
      {
        "date": "2020-04",
        "note": "A Nature paper titled 'Algorithmic unconscious: why psychoanalysis helps in understanding AI' drew the same parallel between unconscious cognition and neural networks, arguing both process information in opaque, pattern-driven ways that resist logical decomposition"
      },
      {
        "date": "2023-10",
        "note": "Anthropic published groundbreaking mechanistic interpretability research using sparse autoencoders to decompose neural network activations into interpretable features, directly tackling the 'you get an answer but don't know where it came from' problem described here"
      },
      {
        "date": "2024-05",
        "note": "Anthropic identified over 34 million interpretable features in Claude Sonnet, demonstrating that neural networks can be partially decomposed and understood -- supporting the claim that the whole 'can be explained' even if not easily predicted"
      }
    ],
    "channelName": "James Connolly",
    "rating": "good"
  },
  {
    "date": "2016-12-05 17:57",
    "channel": "@neversupervised",
    "source": "twitter",
    "link": "https://x.com/neversupervised/status/805833548087181312",
    "content": "Humans have a robust DB of prior knowledge. ML solely based on primary data will miss important insights. Intelligence requires experience.",
    "comments": [
      {
        "date": "2018-10",
        "note": "Google released BERT, a pre-trained language model that leveraged massive text corpora as a 'database of prior knowledge' and broke 11 NLP benchmarks — vindicating the claim that models trained only on task-specific primary data miss important insights"
      },
      {
        "date": "2020-05",
        "note": "OpenAI published the GPT-3 paper 'Language Models are Few-Shot Learners,' showing that a 175-billion-parameter model pre-trained on broad internet text could perform tasks with little to no task-specific data, precisely because it had internalized a robust store of prior knowledge"
      },
      {
        "date": "2021-08",
        "note": "Stanford researchers coined the term 'foundation models' in a landmark report, formalizing the paradigm that AI systems should be pre-trained on broad data to build a base of experience before being adapted — the exact insight this 2016 tweet articulated"
      }
    ],
    "rating": "outstanding"
  },
  {
    "date": "2020-05-18 01:52",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "I think the gig economy is past its prime and it's also a dehumanizing meat market. But the general principle of creating value by unlocking people's ability still stands, and we have experience doing that by enabling knowledge engineers.\n\nThe question is what are some of the trends or subsequent steps in this same thread. For example, the next level could be to enable not just individuals, but groups of people to create value. Hypothetically, you could make a movie by finding a special effects person, a director, actors -- all amateurs -- and produce a high-quality product.\n\nAnother way to up-level is to ask people to do difficult things that require learning, like law or personal support (AA, depression, etc.).\n\nYet another is to build a really specialized system that requires highly capable people who have an unfair advantage in using your system -- similar to what we did with knowledge engineers.\n\nThen there is your target segment. We discussed that consumer would be more fun, and that there are structures that might allow you to target both by offering to consumers for cheaper or free. So what is something that people around the world can do, which is next level compared to today's gigs, which also makes those that consume the product better off in a significant way?",
    "comments": [
      {
        "date": "2020-11",
        "note": "California's Proposition 22 passed after gig companies spent nearly $200 million to keep workers classified as independent contractors, crystallizing the 'dehumanizing meat market' critique as a mainstream political issue"
      },
      {
        "date": "2023-02",
        "note": "Runway released Gen-1 and Gen-2 video generation models, beginning to enable exactly the scenario described here: amateurs producing high-quality video content without professional crews"
      },
      {
        "date": "2024-02-15",
        "note": "OpenAI previewed Sora, its text-to-video model, further democratizing filmmaking by enabling individuals and small teams of amateurs to generate cinematic-quality footage from text prompts"
      },
      {
        "date": "2024",
        "note": "BetterHelp grew to over 4 million users and 30,000 licensed therapists providing remote mental health support, validating the prediction that 'difficult things that require learning, like personal support (depression, etc.)' would be the next frontier beyond basic gigs"
      }
    ],
    "channelName": "Tom Reno",
    "rating": "good"
  },
  {
    "date": "2020-06-07 18:32",
    "channel": "Inventing the future",
    "source": "imessage",
    "link": null,
    "content": "A question I really like, but for which I don't have an answer, is \"what is the next IT industry?\" -- meaning, in what field will we see higher demand than supply of personnel, who can do their job remotely, and where local regulations don't impede them from working abroad?\n\nPhase 1 of the dematerialization of work was Mechanical Turk and its copycats, more than a decade ago. Phase 2 has been more about specialized jobs, like technicians looking at X-rays in India, of which the most successful has been software engineers. I think there is still room in Phase 2, with more disciplines and tasks. And then there will be a third phase, where anyone intelligent can be made to do any job, given the right tools -- perhaps a gamification of work.",
    "comments": [
      {
        "date": "2021",
        "note": "Upwork and Fiverr's combined market capitalization peaked near $20 billion during the pandemic remote-work boom, validating the growth of Phase 2 specialized remote work platforms"
      },
      {
        "date": "2023",
        "note": "Scale AI's Remotasks subsidiary expanded data-labeling operations across the Philippines, Kenya, and India, creating a new global workforce category -- AI data annotation -- that fits the Phase 2 pattern of specialized remote work from developing countries"
      },
      {
        "date": "2024-03",
        "note": "Cognition Labs launched Devin, billed as the first autonomous AI software engineer, pointing toward Phase 3 where AI tools could enable anyone to perform tasks previously requiring deep specialization"
      },
      {
        "date": "2024",
        "note": "The World Economic Forum projected that AI would create ~97 million new jobs while displacing ~85 million, with 53% of AI-related jobs being remote or hybrid, reinforcing the trend toward dematerialized, location-independent work"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2020-07-03 19:47",
    "channel": "Inventing the future",
    "source": "imessage",
    "link": null,
    "content": "What are some jobs that require human intelligence and/or emotions, which you could semi-automate by using humans from other countries?",
    "comments": [
      {
        "date": "2023",
        "note": "OpenAI contracted Sama to employ Kenyan workers at under $2/hour to review and label toxic content for ChatGPT's safety filters -- a job requiring human judgment and emotional resilience, outsourced to a developing country and semi-automated through AI-assisted tooling"
      },
      {
        "date": "2024-02",
        "note": "Klarna announced its AI chatbot handled two-thirds of customer service chats (2.3 million conversations), but by 2025 reversed course and began rehiring human agents, demonstrating that emotional intelligence in customer service resists full automation and still requires the human-AI hybrid model described here"
      },
      {
        "date": "2025",
        "note": "According to McKinsey, nearly 60% of outsourced tasks could be partially automated by 2030, with offshore teams shifting from basic query handling to higher-value advisory roles that combine human empathy with AI assistance -- the semi-automation model this message anticipated"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2020-07-21 21:59",
    "channel": "Inventing the future",
    "source": "imessage",
    "link": null,
    "content": "More and more applications need a knowledge graph, as I've seen in most deals I've evaluated. And I'm not sure there is a modern tool for people to collaborate on building private ontologies.",
    "comments": [
      {
        "date": "2021-06",
        "note": "Neo4j raised $325M in the largest private database funding round ever, at a $2B+ valuation, reflecting surging enterprise demand for knowledge graphs"
      },
      {
        "date": "2023-10",
        "note": "Palantir's AIP platform, built on its enterprise Ontology, became the company's fastest-growing product, validating that ontology-driven architectures are central to enterprise AI"
      },
      {
        "date": "2024-07",
        "note": "Microsoft open-sourced GraphRAG, combining knowledge graphs with LLMs for retrieval, confirming that knowledge graphs became essential infrastructure for AI applications"
      },
      {
        "date": "2024",
        "note": "Gartner placed GraphRAG on its 2024 Hype Cycle for Generative AI, and knowledge graphs were recognized as critical for grounding LLMs with structured enterprise data"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2020-07-28 20:26",
    "channel": "Inventing the future",
    "source": "imessage",
    "link": null,
    "content": "APIs to create user accounts and all the basic stuff, as well as some more complicated capabilities and custom schemas. The data never leaves, including machine learning model training, etc. (eventually).",
    "comments": [
      {
        "date": "2021",
        "note": "Federated learning emerged as a major research focus, enabling ML model training where data never leaves the device, exactly the architecture described here"
      },
      {
        "date": "2022-05",
        "note": "Hugging Face raised $100M at a $2B valuation, with a core value proposition of letting organizations host and fine-tune models on their own infrastructure"
      },
      {
        "date": "2024-03",
        "note": "The EU AI Act was adopted by Parliament, mandating data sovereignty and transparency requirements for AI training data, codifying the data-never-leaves principle into law"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2020-08-23 02:08",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "A website to gather large datasets for the public good and freely available machine learning models. You would have campaigns to acquire a certain kind of data and have volunteers help out. It shouldn't be basic labeling like they do with Mechanical Turk. I imagine actual data gathering. Or it could be focused purely on the medical space, where you know a bunch of individual demographics and then you ask everyone to test something — eat a certain food, do a certain workout, etc. — and try the results. Learning from individual experimentation at scale.",
    "comments": [
      {
        "date": "2021-03",
        "note": "LAION (Large-scale Artificial Intelligence Open Network) was founded as a German nonprofit to create open-source AI datasets for the public good, closely matching the concept described here"
      },
      {
        "date": "2021-08",
        "note": "LAION released LAION-400M, a freely available dataset of 400 million image-caption pairs built through volunteer coordination, the exact campaign-based model proposed in this message"
      },
      {
        "date": "2022-03",
        "note": "NIH's All of Us Research Program released its first genomic dataset of nearly 100,000 whole genome sequences from diverse volunteers, matching the medical crowdsourcing vision described here"
      },
      {
        "date": "2022",
        "note": "ZOE's personalized nutrition program, built on crowdsourced individual health experiments (diet, microbiome, blood sugar responses), became the world's largest nutrition study — almost exactly the eat-a-certain-food-and-try-the-results concept described here"
      }
    ],
    "channelName": "Alejandro Koretzky",
    "rating": "outstanding"
  },
  {
    "date": "2020-08-24 22:31",
    "channel": "Inventing the future",
    "source": "imessage",
    "link": null,
    "content": "One example is all the cleaning of data you need to do for machine learning. The stuff you did classifying answers we understood but didn't answer might be a good example of a collaborative data project. All the work people need to do inside data warehouses to prepare data for training.",
    "comments": [
      {
        "date": "2021-08",
        "note": "Scale AI raised $325M at a $7.3B valuation, validating massive demand for collaborative data preparation for machine learning"
      },
      {
        "date": "2021-08",
        "note": "Snorkel AI, focused on programmatic data labeling as a collaborative workflow rather than brute-force annotation, reached a $1B valuation"
      },
      {
        "date": "2022-02",
        "note": "dbt Labs raised $222M at a $4.2B valuation for its data transformation tool, reflecting how data cleaning and preparation became a standalone billion-dollar category"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2020-10-10 00:30",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "One area that is interesting is learning — not machine learning in the typical sense, but being able to teach your system how to behave with natural language. For example, have smart home or IoT devices and through dialogue arrive at the right settings. Personal knowledge is another interesting area.\n\nDeepMind is an example of a company that got acquired purely off doing research. Rare, but from another perspective also countercultural these days, which can be differentiating. I would probably also consider a big company if it gave us a lab-like environment that is more experimental. For example, Meta has some work on assistants blended into AR which might be cool.",
    "comments": [
      {
        "date": "2022-12",
        "note": "Home Assistant declared 2023 its 'Year of Voice,' building a local, natural-language voice assistant for smart homes — the exact teach-your-system-through-dialogue concept described here"
      },
      {
        "date": "2021",
        "note": "Anthropic was founded by former OpenAI researchers as a pure research lab focused on AI safety, following the DeepMind model of research-first company building described here"
      },
      {
        "date": "2023-09",
        "note": "Meta launched Ray-Ban Meta smart glasses with an integrated AI assistant, exactly the assistants-blended-into-AR concept mentioned in this message"
      },
      {
        "date": "2025-03",
        "note": "Amazon launched Alexa+, a generative-AI-powered conversational assistant that learns preferences through natural dialogue, validating the teach-through-conversation vision"
      }
    ],
    "channelName": "Alex Yau",
    "rating": "good"
  },
  {
    "date": "2020-10-10 00:30",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "What do you think about building a layer that allows you to put rules on top of a machine learning model? The topmost level where you want some level of human oversight and be able to correct bugs in the model until it is reworked. We have a lot of experience with that and understand various techniques. It's also not fancy but necessary.\n\nAnother idea is getting into explainable and visualizable models. We would have to develop quite a bit of expertise, but we have some relevant experience and I'm sure we could be clever about helping people visualize how their models behave. It can be applied to medicine because more and more models will be used for diagnostics.",
    "comments": [
      {
        "date": "2022-08",
        "note": "OpenAI launched its Moderation API, a rule layer on top of GPT models to enforce content policies — a direct implementation of the rules-on-top-of-ML concept described here"
      },
      {
        "date": "2022-12",
        "note": "Anthropic published its Constitutional AI paper, describing a system of human-written rules that constrain model behavior, closely matching the 'rules on top of a model with human oversight' vision"
      },
      {
        "date": "2023-04",
        "note": "NVIDIA open-sourced NeMo Guardrails, a toolkit for adding programmable rules on top of LLMs — the exact architecture proposed in this message"
      },
      {
        "date": "2024-03",
        "note": "The EU AI Act was adopted by Parliament, mandating explainability and human oversight for high-risk AI systems including medical diagnostics, codifying the concerns raised here into law"
      },
      {
        "date": "2023",
        "note": "FDA approved 178 AI/ML-enabled medical devices in 2023 alone (up from ~100 in 2020), confirming the prediction that more and more models would be used for diagnostics"
      }
    ],
    "channelName": "Alex Yau",
    "rating": "outstanding"
  },
  {
    "date": "2020-10-10 00:35",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "Explainable AI is a problem. Not only for researchers but also for consumers. Imagine a feather-light experience you can attach to an AI model that walks any user through how it all works and the kind of decisions it makes, and when it succeeds or fails. It lets you play with it by tuning parameters so you can learn to trust the system.\n\nThere are a lot of interesting examples of people starting to explain AI.",
    "comments": [
      {
        "date": "2021",
        "note": "DARPA's four-year Explainable AI (XAI) program concluded, releasing the XAITK toolkit — a public library of explainability modules that can be attached to ML systems, similar to the 'attachable experience' concept described here"
      },
      {
        "date": "2024-03",
        "note": "The EU AI Act was adopted, requiring transparency and explainability for high-risk AI systems and granting users a 'right to explanation' — enshrining the consumer-facing explainability need identified here"
      },
      {
        "date": "2023",
        "note": "Anthropic pioneered interpretability research, publishing methods to visualize and understand how individual features inside Claude activate and make decisions, matching the 'walk users through how it works' vision"
      }
    ],
    "channelName": "Samantha Zhang",
    "rating": "neutral"
  },
  {
    "date": "2020-10-13 23:29",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "I keep talking to reasonably well-funded and modern companies about ontologies and knowledge graphs, and they say \"yes, exactly, that's what we need.\" I think it's a thing that nobody owns, just like nobody knows what knowledge engineers are. It's on someone to define and own that.\n\nI read a quote somewhere that some of the biggest disruptions come with new vocabulary and concepts because people don't get it with the available terminology.\n\nThis company has to work with the FDA, and they have to explain the nature of the data collection to them. All they have is a big schema, but it would be much more valuable if the FDA itself could inspect the ontology and understand how the data relates.\n\nPeople who work on machine learning need to be able to draw the boundary between what the machine sorts out and what requires human judgment.",
    "comments": [
      {
        "date": "2021-06",
        "note": "Neo4j's $325M raise at $2B+ valuation confirmed enterprise demand for knowledge graphs, yet no single company had defined and owned the ontology-building category as predicted here"
      },
      {
        "date": "2021-10",
        "note": "The FDA published its first list of nearly 350 approved AI/ML-enabled medical devices, increasingly requiring structured data schemas for regulatory review — validating the need for FDA-inspectable ontologies"
      },
      {
        "date": "2023",
        "note": "Palantir built its entire AIP (AI Platform) product around the concept of an enterprise Ontology, effectively becoming the company that 'defined and owned' the ontology layer for enterprises"
      },
      {
        "date": "2024-03",
        "note": "The EU AI Act mandated human oversight boundaries for high-risk AI systems, formally codifying the need to 'draw the boundary between what the machine sorts out' and what humans must control"
      }
    ],
    "channelName": "Tom Reno",
    "rating": "good"
  },
  {
    "date": "2020-10-24 16:23",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "I think we can apply a lot of what we learned with knowledge graphs and knowledge engineers, not to compete with machine learning, but to test it. A big problem is that a model is tested by scientists to succeed x% of the time, but they rarely look at the type of failures, which sometimes can be really problematic or silly. For example, errors on search-based Q&A tend to be more damaging (racist, etc.). I think beyond the testing of accuracy over a test set, one should be able to construct a battery of tests to make sure an algorithm behaves as expected. I think companies would pay for a third party that evaluates their algorithm, just like companies pay for penetration testing. I like it because it allows me to work on AI without being a phony that just puts AI on the company name; yet it allows me to enter from the side of skepticism and apply our experience with structured knowledge.",
    "comments": [
      {
        "date": "2022-08",
        "note": "Robust Intelligence raised $30M in Series B (December 2021) to stress-test AI models, validating the market for third-party AI evaluation as a service analogous to penetration testing"
      },
      {
        "date": "2023-01-26",
        "note": "NIST released the AI Risk Management Framework 1.0, formally establishing standards for testing AI systems beyond simple accuracy metrics, including bias, fairness, and robustness"
      },
      {
        "date": "2023-08-12",
        "note": "The White House co-organized the largest-ever public AI red-teaming exercise at DEF CON 31, with 2,200 sessions stress-testing models from OpenAI, Anthropic, Google, and Meta, institutionalizing third-party adversarial AI evaluation"
      },
      {
        "date": "2023-10-30",
        "note": "President Biden's Executive Order on AI required companies to share red-team testing results with the federal government before releasing high-risk systems, codifying the third-party AI evaluation concept described here"
      },
      {
        "date": "2024-10",
        "note": "Cisco acquired Robust Intelligence for its AI model validation and firewall technology, confirming that third-party AI testing became a major enterprise market"
      }
    ],
    "channelName": "Nick Larusso",
    "rating": "outstanding"
  },
  {
    "date": "2020-10-31 01:34",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "I can imagine there will be opportunities for AI robustness applications in many verticals. Self-driving and other machinery testing is obviously a very big deal.",
    "comments": [
      {
        "date": "2021-06",
        "note": "NHTSA issued a Standing General Order requiring all manufacturers of vehicles with Level 2+ automation to report crashes monthly, creating a formal AI robustness testing regime for autonomous vehicles"
      },
      {
        "date": "2023-12",
        "note": "Waymo published safety data over 7.14 million fully autonomous miles showing an 85% reduction in injury-causing crashes vs. human drivers, demonstrating the scale of investment in AI robustness validation for self-driving"
      },
      {
        "date": "2024-08",
        "note": "The EU AI Act entered into force, classifying AI systems in transportation, medical devices, and critical infrastructure as high-risk and requiring rigorous testing and certification across exactly the verticals predicted here"
      }
    ],
    "channelName": "Alex Yau",
    "rating": "neutral"
  },
  {
    "date": "2020-11-02 02:59",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "Imagine you're a consumer-facing company and you want your customers to trust your use of AI. It might be a car, medical diagnostics, etc. You could offer your users a playground to interact directly with the algorithm so they can understand what's going on. Maybe even explore it visually by layers and whatnot. That would be one way to take what I pitched and make it more consumer-facing.\n\nAnd there might be value for the company beyond earning trust, because having random people testing your algorithm brings data points.\n\nAnd you can further gamify it if it becomes an important source of data. Like in Alexa we have feedback elicitation, but we never ask customers to spend a few minutes trying to break the system. Imagine how much data you could collect like that.\n\nThe point is, architecture aside, that from a consumer point of view, there might be value in earning trust with AI. And from the company's perspective, you could turn that into a useful feedback loop.",
    "comments": [
      {
        "date": "2023-04",
        "note": "OpenAI launched its bug bounty program via Bugcrowd, paying up to $20,000 for users who find vulnerabilities, directly implementing the 'gamify people trying to break the system' concept described here"
      },
      {
        "date": "2023-08-12",
        "note": "DEF CON 31's AI Village hosted 2,200 public red-teaming sessions where attendees tried to break AI models from major companies, matching the idea of inviting random people to adversarially test algorithms for useful data"
      },
      {
        "date": "2024-07",
        "note": "The EU AI Act's transparency provisions require consumer-facing AI systems to disclose when users are interacting with AI, reflecting the predicted need for companies to earn consumer trust through AI transparency"
      }
    ],
    "channelName": "Alex Yau",
    "rating": "good"
  },
  {
    "date": "2020-11-24 17:34",
    "channel": "lame.ai",
    "source": "imessage",
    "link": null,
    "content": "One way to think about MLOps is that it will encompass the same size as the DevOps market as more applications go from pure software to having progressively more machine learning.",
    "comments": [
      {
        "date": "2022",
        "note": "The MLOps market was valued at approximately $1.1–1.4 billion in 2022, already establishing itself as a distinct and rapidly growing category parallel to DevOps"
      },
      {
        "date": "2023-04",
        "note": "MarketsandMarkets projected the MLOps market would reach $5.9 billion by 2027 at a 41% CAGR, confirming the trajectory toward DevOps-scale market size"
      },
      {
        "date": "2025",
        "note": "The MLOps market reached approximately $2.3–3.0 billion in 2025, with projections to $29.6 billion by 2032. Meanwhile, 58% of vendors prioritize integrating MLOps directly with DevOps pipelines, validating the convergence predicted here"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2020-11-25 23:01",
    "channel": "chat509597886779369007",
    "source": "imessage",
    "link": null,
    "content": "Suppose you have an AI that determines someone should get no bail. There are parameters deeper within that model. Suppose that it's unexplainable by itself, but the judge adds some remarks based on what he sees. That label or tag can later help others understand. Think of it as labeling neurons or layers by what they seem to be doing. It doesn't affect the performance but improves the understanding.",
    "comments": [
      {
        "date": "2023-10",
        "note": "Anthropic published 'Towards Monosemanticity,' decomposing 512 neurons into over 4,000 interpretable features and labeling them by what they represent (e.g., DNA sequences, legal language), directly implementing the concept of labeling neurons by what they seem to be doing"
      },
      {
        "date": "2024-05",
        "note": "Anthropic published 'Scaling Monosemanticity,' extracting millions of interpretable features from Claude 3 Sonnet and labeling them with human-readable concepts like 'The Golden Gate Bridge' or 'code bugs,' scaling the neuron-labeling idea described here to production models"
      },
      {
        "date": "2024-08",
        "note": "The EU AI Act entered into force, classifying AI used in criminal justice and bail decisions as high-risk and requiring explainability and human oversight, codifying the exact judicial AI scenario described in this message"
      }
    ],
    "rating": "outstanding"
  },
  {
    "date": "2020-12-04 03:51",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "This is where I'm interested in the MLOps space, in particular on the interface between product owners and machine learning models. We see a future in which the lifecycle of machine learning solutions is similar to what we see in software today, with players specialized in building, testing, deploying, and monitoring. We hypothesize testing and product management intersect strongly when it comes to machine learning. Models are typically evaluated by their performance against a test set, but this data might have poor coverage of all real-world scenarios, be inherently biased, and have highly damaging misses. In other words, it's not enough to say a model beat a human benchmark — companies will want to test for specific patterns, such as racism/sexism, use cases, landscape changes (e.g., a new regulation), etc. At Amazon, we \"invented\" the role of a knowledge engineer or \"KE\" (though this is not a new term, it has been out of fashion for quite some time). A knowledge engineer is essentially the product manager responsible for an AI solution. In the case of Alexa, there is a knowledge engineer for Sports, Politics, History, and so on, who are proficient at internal tools that enable direct control of the application. When teams build machine learning solutions, the capabilities of the product are defined and constrained by the dataset. Thus, if a PM wants to correct behaviors or expand functionality, they need to facilitate the acquisition of relevant training data. This indirect interface leads to a slow iteration cycle and non-deterministic results. Instead, we want knowledge engineers and PMs to define feature-specific tests at scale and to some extent act as an adversarial agent in machine learning development. Put another way, we want product owners to be able to understand and sculpt the model gradient as directly as possible.",
    "comments": [
      {
        "date": "2023-01-26",
        "note": "NIST released the AI Risk Management Framework 1.0, establishing formal processes for testing AI systems for bias, fairness, and real-world scenario coverage, institutionalizing the pattern-specific testing approach described here"
      },
      {
        "date": "2023-08",
        "note": "OpenAI launched GPT-3.5 fine-tuning APIs, enabling product owners to directly shape model behavior rather than relying solely on dataset changes, addressing the 'indirect interface' and 'slow iteration cycle' problem described here"
      },
      {
        "date": "2023-10-30",
        "note": "Biden's Executive Order on AI mandated red-team testing for specific harmful patterns (bias, discrimination, security vulnerabilities) before deployment, formalizing the adversarial product-management role for AI described in this message"
      },
      {
        "date": "2024-10",
        "note": "Cisco acquired Robust Intelligence for its AI validation platform that automates testing and compliance across the ML lifecycle, including bias, safety, and regulatory checks, commercializing the specialized ML testing layer predicted here"
      }
    ],
    "channelName": "Dylan Wenzlau",
    "rating": "outstanding"
  },
  {
    "date": "2020-12-07 18:24",
    "channel": "lame.ai",
    "source": "imessage",
    "link": null,
    "content": "I'm convinced you can start with an ontology GUI and build essential tools for data-intensive applications.",
    "comments": [
      {
        "date": "2023-04",
        "note": "Palantir launched AIP (Artificial Intelligence Platform), built on top of its ontology layer, demonstrating that an ontology-driven interface is a viable foundation for data-intensive AI applications. Revenue growth accelerated from 13% to 49% following this launch"
      },
      {
        "date": "2024",
        "note": "Microsoft released GraphRAG, which uses LLM-generated knowledge graphs to augment retrieval and outperform baseline RAG, validating the idea that structured ontological representations are foundational to data-intensive AI applications"
      },
      {
        "date": "2024",
        "note": "The knowledge graph market was valued at $1.07 billion in 2024 and projected to reach $6.94 billion by 2030, confirming that ontology-driven platforms became essential infrastructure for data-intensive applications"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2020-12-07 18:37",
    "channel": "lame.ai",
    "source": "imessage",
    "link": null,
    "content": "Imagine that you start with the ontology — collections and attributes — and you define what kind of data you expect there. You might not only define the data type and ranges, but maybe also statistical properties, like heart rates go from 15 to 250, with a certain mean and variance (this is an example from Evidation). You can also define tests to verify this data is as intended. You also explain the purpose of this data — you have a history log that explains where it's being used and how it's being modified. You then associate this with any column on any table, and on regular intervals this data gets tested and everyone knows if something has gotten messed up.",
    "comments": [
      {
        "date": "2021-02",
        "note": "Monte Carlo Data, the data observability company, raised a $25M Series B in February 2021 to build exactly this kind of automated data quality monitoring — detecting anomalies in statistical properties across data pipelines"
      },
      {
        "date": "2022-02",
        "note": "Great Expectations, the open-source data quality tool whose core concept of 'expectations' mirrors the statistical-property-based tests described here, raised $40M in Series B funding"
      },
      {
        "date": "2022-02",
        "note": "dbt Labs raised $222M at a $4.2B valuation, with data testing and contracts becoming central features of the modern data stack — dbt's later data contracts feature (v1.5, April 2023) enforces the exact schema-level validation described in this message"
      },
      {
        "date": "2022",
        "note": "The 'data contracts' movement led by Andrew Jones at GoCardless and Chad Sanderson at Convoy formalized the idea of defining expected data shapes, statistical properties, and ownership metadata — closely matching this message's vision"
      }
    ],
    "rating": "outstanding"
  },
  {
    "date": "2020-12-08 01:41",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "I just dream of making knowledge engineering a role at every company. I'm just looking for an entryway.",
    "comments": [
      {
        "date": "2021-06",
        "note": "Neo4j raised $325M in Series F at a $2B+ valuation in June 2021, the largest investment in a private database company at the time, reflecting surging enterprise demand for knowledge graph infrastructure"
      },
      {
        "date": "2023",
        "note": "The knowledge graph market was valued at $1B in 2022 and projected to grow at 13.5% CAGR through 2032, with companies like Google, Microsoft, Amazon, and Neo4j all offering enterprise knowledge graph products"
      },
      {
        "date": "2023",
        "note": "The rise of RAG (Retrieval-Augmented Generation) made knowledge engineering skills essential for grounding LLMs in structured data, creating the exact enterprise role envisioned here — combining ontology design with AI systems"
      }
    ],
    "channelName": "Zach Trafny",
    "rating": "neutral"
  },
  {
    "date": "2020-12-09 22:59",
    "channel": "Hackathon",
    "source": "imessage",
    "link": null,
    "content": "One way to think about plumbing + knowledge graphs is that the API should provide primitives that, once defined, allow you to build SKS, an automated earnings report (Narratives), etc. So maybe the conversation is about various capabilities that are broadly necessary across SaaS products, which are closely related to knowledge graphs and ontologies.",
    "comments": [
      {
        "date": "2022",
        "note": "Google launched its Enterprise Knowledge Graph API in 2022, providing exactly the kind of knowledge graph primitives (entity reconciliation, schema mapping via schema.org) that could be composed into higher-level SaaS features"
      },
      {
        "date": "2023",
        "note": "GraphRAG emerged as a major application pattern in 2023-2024, combining knowledge graph primitives with LLMs to power structured reasoning across SaaS products — validating the idea that KG primitives are broadly necessary building blocks"
      },
      {
        "date": "2022",
        "note": "Atlan, a data catalog built on a knowledge graph architecture, was named a Leader in Forrester's Enterprise Data Catalogs Wave in Q2 2022, demonstrating that ontology-driven primitives were becoming foundational SaaS infrastructure"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2020-12-10 01:44",
    "channel": "Hackathon",
    "source": "imessage",
    "link": null,
    "content": "That feels like the right level of complexity to target. Because if you go much lower to something like a named entity recognition library, you're competing on very well-defined metrics and spending a lot of money training better machine learning models.",
    "comments": [
      {
        "date": "2021",
        "note": "Hugging Face's model hub grew rapidly in 2021, hosting thousands of free NER models that commoditized the task — confirming that competing at the NER library level meant racing against an open-source tsunami"
      },
      {
        "date": "2022-08",
        "note": "Stable Diffusion's open-source release in August 2022 demonstrated the pattern described here at scale: state-of-the-art AI quickly gets replicated and open-sourced, making it expensive to compete on raw model benchmarks alone"
      },
      {
        "date": "2023-02",
        "note": "Meta released LLaMA in February 2023, an open-source LLM whose 13B model outperformed GPT-3's 175B on most benchmarks, further proving that competing on well-defined ML metrics is a losing strategy as open-source catches up"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2021-01-05 21:30",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "Generation of content is an interesting and open opportunity.",
    "comments": [
      {
        "date": "2021",
        "note": "Jasper (then Conversion.ai) launched in January 2021 using GPT-3 for content generation and hit $42.5M ARR by end of year, validating content generation as a massive opportunity"
      },
      {
        "date": "2022-10",
        "note": "Jasper raised $125M at a $1.5B valuation in October 2022, and Copy.ai hit $10M ARR, confirming AI content generation as one of the fastest-growing application categories"
      }
    ],
    "channelName": "Samantha Zhang",
    "rating": "poor"
  },
  {
    "date": "2021-01-05 21:30",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "What you see with AI is that very quickly something that was state of the art gets replicated and open-sourced.",
    "comments": [
      {
        "date": "2022-08",
        "note": "Stable Diffusion was released as open source in August 2022, replicating the capabilities of proprietary image generation models like DALL-E 2 within months"
      },
      {
        "date": "2023-02",
        "note": "Meta released LLaMA in February 2023, an open-source LLM that matched or exceeded GPT-3's performance, setting off an explosion of open-source LLM development"
      },
      {
        "date": "2023-03",
        "note": "Stanford released Alpaca in March 2023, fine-tuning LLaMA to replicate ChatGPT-like instruction-following for under $600, demonstrating how rapidly open source closes the gap"
      }
    ],
    "channelName": "Samantha Zhang",
    "rating": "good"
  },
  {
    "date": "2021-01-05 21:30",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "OpenAI is one of those places where you can be really good and still be at the bottom of the totem pole. It's good to be able to work with people better than yourself. I'm more interested in applications and I prefer actually \"open\" stuff like BERT and other open-source models, but it's possible OpenAI-style APIs are the only option.",
    "comments": [
      {
        "date": "2020-06",
        "note": "OpenAI launched the GPT-3 API in June 2020 as a closed, paid service — abandoning its open-source roots and confirming the tension this message identifies between openness and commercial API access"
      },
      {
        "date": "2023-02",
        "note": "Vice published 'OpenAI Is Now Everything It Promised Not to Be: Corporate, Closed-Source, and For-Profit,' crystallizing the exact criticism this message anticipated about OpenAI's shift away from openness"
      },
      {
        "date": "2023-02",
        "note": "Meta's release of LLaMA proved the 'open' alternative was viable after all, though OpenAI-style APIs remained the dominant commercial model as predicted"
      }
    ],
    "channelName": "Samantha Zhang",
    "rating": "neutral"
  },
  {
    "date": "2021-01-05 21:30",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "The whole notion of the intersection between humans and AIs and how they collaborate is super interesting.",
    "comments": [
      {
        "date": "2021-06",
        "note": "GitHub Copilot launched in technical preview in June 2021, literally embodying the human-AI collaboration concept as an 'AI pair programmer' that works alongside developers"
      },
      {
        "date": "2022",
        "note": "RLHF (Reinforcement Learning from Human Feedback) became the standard technique for training AI assistants like ChatGPT, making human-AI collaboration fundamental to the training process itself"
      },
      {
        "date": "2023-03",
        "note": "Microsoft launched Microsoft 365 Copilot in March 2023, embedding AI collaboration directly into Word, Excel, and PowerPoint — making human-AI collaboration the default interaction model for hundreds of millions of knowledge workers"
      }
    ],
    "channelName": "Samantha Zhang",
    "rating": "poor"
  },
  {
    "date": "2021-04-30 16:57",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "Right now I'm looking into the ecosystem around AI — for example, the workflows that annotators use and the support systems to build good AI.",
    "comments": [
      {
        "date": "2021-04",
        "note": "Scale AI was valued at $7.3B in April 2021, just days before this message, reflecting massive growth in the data annotation ecosystem this message was investigating"
      },
      {
        "date": "2022-01",
        "note": "Scale AI won a $250M federal contract in January 2022 to provide annotation and data labeling tools to U.S. government agencies, validating annotation infrastructure as critical to AI development"
      },
      {
        "date": "2022",
        "note": "RLHF (Reinforcement Learning from Human Feedback) became the key technique behind ChatGPT's success, making human annotation workflows — the exact 'support systems to build good AI' referenced here — the most important bottleneck in AI development"
      }
    ],
    "channelName": "Nick Larusso",
    "rating": "neutral"
  },
  {
    "date": "2022-07-18 22:18",
    "channel": "general",
    "link": "https://scopvc.slack.com/archives/CDGDNJX5Y/p1658207938816759",
    "content": "Machine Learning is getting incredibly good across the board. We need to be careful not to fall for what used to be pragmatic solutions 10 years ago, but today are outdated. I'm going to dig deeper into [redacted legal tech company].",
    "comments": [
      {
        "date": "2022-11-30",
        "note": "ChatGPT launched 5 months after this message, validating that the ML capability frontier had fundamentally shifted"
      },
      {
        "date": "2023-12",
        "note": "Harvey AI raised $80M from Sequoia 17 months later by applying LLMs to contract analysis, the same legal-tech domain as [redacted legal tech company]"
      },
      {
        "date": "2022-07-18",
        "note": "This message recognized in July 2022 that legacy ML solutions were becoming outdated, ahead of the broader market which did not wake up until after ChatGPT launched five months later"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2022-08-11 22:33",
    "channel": "general",
    "link": "https://scopvc.slack.com/archives/CDGDNJX5Y/p1660282401095239",
    "content": "The whole concept of “prompt engineering” with AI assistants will completely change the concept of UIs as it gets better",
    "comments": [
      {
        "date": "2022-11-30",
        "note": "ChatGPT launched 3.5 months after this message, beginning the natural-language interface revolution predicted here"
      },
      {
        "date": "2023-06",
        "note": "'Prompt engineer' became one of the hottest job titles in tech by mid-2023"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2022-10-13 11:36",
    "channel": "UDGBMD40K",
    "link": "https://scopvc.slack.com/archives/D01769CKC75/p1665686164533119",
    "content": "I think I just had a pretty good idea. Train a generative language model on all my text, email, slack, etc., and then combine that with a more general GPT-3 type setup to generate articles I would actually write. Picks up my style and topics of expertise, and then it's just a matter of refining",
    "comments": [
      {
        "date": "2022-11-30",
        "note": "ChatGPT launched 7 weeks after this message, making LLM-based text generation mainstream"
      },
      {
        "date": "2023-08",
        "note": "OpenAI launched GPT-3.5 fine-tuning APIs, enabling the personal writing clone architecture described in this message"
      },
      {
        "date": "2023-11-06",
        "note": "OpenAI launched custom GPTs with user-uploaded knowledge files at DevDay, making this exact workflow accessible to consumers"
      },
      {
        "date": "2023-01",
        "note": "RAG (Retrieval-Augmented Generation) over personal data combined with a base LLM became one of the most pursued application patterns of 2023-2024, matching this message's proposed architecture"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2022-10-28 03:09",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "I truly believe we can replace a lot of our process challenges with technology. I think we can systematize all our workflows — from recruiting, to onboarding, to scheduling, and so on.\n\nAnd then our EBITDA would get close to our gross profit.",
    "comments": [
      {
        "date": "2024-04",
        "note": "Rippling, which automates recruiting, onboarding, payroll, and IT provisioning from a single employee record, raised $200M at a $13.5B valuation — validating the exact 'systematize all workflows' thesis described here"
      },
      {
        "date": "2024-02",
        "note": "Klarna announced its AI assistant was handling 75% of customer service chats (2.3 million conversations), contributing to a projected $40M profit improvement by cutting headcount from 5,500 to 3,400"
      },
      {
        "date": "2025-10",
        "note": "Workday acquired Paradox, an AI recruiting and scheduling assistant, for $1B — confirming that automating HR workflows from recruiting to onboarding became a billion-dollar category"
      }
    ],
    "channelName": "Ryan Neman",
    "rating": "good"
  },
  {
    "date": "2022-12-05 19:31",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "If you don't get this stuff, you're fucked in another 10 years.\n\nIf you don't align your career against this — like if you're into content or video production, you should be using AI-assisted tools and be aware of how they will evolve. If you're a banker, you should too. If you are not bringing AI into your tool belt, you're fucked.\n\nYou're older and you have wealth. But someone coming out of college or earlier in their career — they better figure this out.",
    "comments": [
      {
        "date": "2024-01",
        "note": "Duolingo cut 10% of its contract translators and writers in favor of GPT-4, one of the first high-profile cases of AI directly replacing content professionals"
      },
      {
        "date": "2024",
        "note": "A Randstad survey found that while 75% of companies were adopting AI, only 35% of workers had received any AI training — confirming the skills gap warning in this message"
      },
      {
        "date": "2025-05",
        "note": "Anthropic CEO Dario Amodei warned that AI could eliminate 50% of entry-level white-collar jobs within five years, echoing the urgency expressed here two and a half years earlier"
      },
      {
        "date": "2025",
        "note": "Entry-level job postings in the U.S. declined roughly 35% from January 2023 levels, with junior tech postings down as much as 67%, hitting exactly the early-career workers this message warned about"
      }
    ],
    "channelName": "Joshua Spiegelman",
    "rating": "outstanding"
  },
  {
    "date": "2022-12-22 20:50",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "As long as it's machine learning heavy, I'm interested. The next project has to be AI.\n\nI've always subscribed to that viewpoint, but I don't think it's as true right now. The leap in AI is too big. It's like in the early 2000s saying that your solution must be internet-related — that would be a reasonable requirement.\n\nI would at least think very carefully about the various ways in which AI could eat the 2010-era tech approach to that problem and make sure those are covered. To me, AI is mostly about automating as many employees and roles as possible. In construction it's tricky because robots might be part of the solution, and that's really hard.",
    "comments": [
      {
        "date": "2023-03",
        "note": "Goldman Sachs published a report estimating AI could automate 300 million jobs globally, framing AI-driven role automation as the central economic story — three months after this message made the same argument"
      },
      {
        "date": "2025-02",
        "note": "PulteGroup built an entire house using Hadrian X, a bricklaying robot that lays over 1,000 bricks per hour, confirming that construction robotics is viable but remains one of the hardest AI automation frontiers"
      },
      {
        "date": "2025",
        "note": "The construction robotics market grew at over 15% per year, with the Association of Builders and Contractors reporting 454,000 unfilled construction jobs — validating both the opportunity and the difficulty noted here"
      }
    ],
    "channelName": "Danny Seigle",
    "rating": "good"
  },
  {
    "date": "2023-01-04 22:22",
    "channel": "U011RG9HPR6",
    "link": "https://scopvc.slack.com/archives/D016ZB7MK9C/p1672899761883039",
    "content": "I think we need to consider the possibility that traditional vertical SaaS is starting to be tapped out. I don't know how much bigger the frontier is. Surely there'll be new generation SaaS solutions, especially those based on ML, displacing the old ones. That makes sense. But I'm not sure how many $10B niches are still there. Procore is $10B and is in a massive industry.",
    "comments": [
      {
        "date": "2022-11-30",
        "note": "ChatGPT had launched 5 weeks before this message"
      },
      {
        "date": "2023-09",
        "note": "Sequoia published 'Generative AI's Act Two' arguing AI would rebuild software categories from the ground up, validating this message's thesis"
      },
      {
        "date": "2024-06",
        "note": "The 'AI replacing SaaS' thesis articulated here in January 2023 became consensus among firms like Bessemer and a16z by mid-2024"
      }
    ],
    "rating": "outstanding"
  },
  {
    "date": "2023-02-11 19:14",
    "channel": "general",
    "link": "https://scopvc.slack.com/archives/CDGDNJX5Y/p1676171650408039",
    "content": "This new wave of AI is going to be as big as the internet as a whole",
    "comments": [
      {
        "date": "2022-11-30",
        "note": "ChatGPT had launched 2.5 months before this message"
      },
      {
        "date": "2023-03-14",
        "note": "GPT-4 released 1 month after this message"
      },
      {
        "date": "2023-03-21",
        "note": "Bill Gates published 'The Age of AI Has Begun,' calling AI as fundamental as the creation of the microprocessor, the PC, the internet, and the mobile phone"
      },
      {
        "date": "2024-01",
        "note": "GPU shortages, trillion-dollar market cap shifts, and sovereign AI investment funds made the AI-internet comparison conventional wisdom by early 2024"
      },
      {
        "date": "2023-02-11",
        "note": "Stating flatly in February 2023 that AI would be as big as the internet was ahead of the curve, when mainstream opinion still debated whether ChatGPT was a fad"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2023-02-24 15:38",
    "channel": "random",
    "link": "https://scopvc.slack.com/archives/CDGKCJ6DT/p1677281918733959",
    "content": "If Facebook doesn't have a path to sell infrastructure services like GPT, they will just release it for free to fuck with the other tech firms",
    "comments": [
      {
        "date": "2023-02-24",
        "note": "This message was posted the exact same day Meta announced LLaMA 1, a 65-billion-parameter model — though notably, LLaMA 1 was not truly open source: model weights were restricted to academic and non-commercial use only, granted case-by-case"
      },
      {
        "date": "2022-11-30",
        "note": "ChatGPT had launched 86 days before this message"
      },
      {
        "date": "2023-07-18",
        "note": "Meta released LLaMA 2 with a commercial license (though still not fully open source per OSI standards), doing precisely what this message predicted: weaponizing open releases to disrupt rivals' AI infrastructure businesses"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2023-03-03 13:18",
    "channel": "unwrap",
    "link": "https://scopvc.slack.com/archives/C03JD95PJS0/p1677878302371949",
    "content": "I know their revenue is early, they just hit $100K ARR. But they are doing a really good job leveraging state of the art NLP and we know these guys well. They also have strong logos on the pipeline.\n\nKevin I think we should try to give them more money proactively and increase our ownership closer to 10%. Ashwin mentioned they were thinking about whether they should add capital now that there is renewed interest for AI companies. They could be more aggressive with sales with that.",
    "comments": [
      {
        "date": "2023-03-14",
        "note": "GPT-4 released by OpenAI just 11 days after this message, triggering an explosion of investor interest in AI startups"
      },
      {
        "date": "2023-03",
        "note": "This message shows a VC recognizing at the $100K ARR stage that NLP/AI-native startups deserved aggressive, preemptive capital deployment"
      },
      {
        "date": "2023-10",
        "note": "By Q3-Q4 2023, AI startups were routinely raising at 100x+ ARR multiples, making the window to increase ownership at reasonable terms extremely narrow"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2023-03-10 20:46",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "I don't think the trend of people doing entry-level work continuously increasing their relative income is going to last. There's too much pressure to reduce costs, and a lot of progress on AI and robotics.\n\nThe main theory for why the Industrial Revolution happened is that employees became too expensive, rather than the invention of any one machine or another.",
    "comments": [
      {
        "date": "2024-04",
        "note": "California raised the fast-food minimum wage to $20, and within a year 10,700 fast-food jobs were lost as 89% of restaurant operators reduced hours and replaced labor with kiosks and automation"
      },
      {
        "date": "2025-05",
        "note": "Klarna CEO announced AI helped shrink the company's workforce by 40%, with remaining employees earning higher salaries — a direct example of expensive labor driving automation, mirroring Robert Allen's theory of the Industrial Revolution cited here"
      },
      {
        "date": "2025",
        "note": "A 'low-hiring, low-firing' equilibrium emerged across the U.S. labor market: companies reduced headcount through attrition rather than backfilling roles, disproportionately eliminating entry-level positions as predicted"
      },
      {
        "date": "2025-09",
        "note": "CNBC reported AI was ending entry-level jobs and dismantling the career ladder, with a 50% decline in new role starts for workers with less than one year of post-graduate experience"
      }
    ],
    "channelName": "Shaked Sivan",
    "rating": "outstanding"
  },
  {
    "date": "2023-03-19 10:27",
    "channel": "general",
    "link": "https://scopvc.slack.com/archives/CDGDNJX5Y/p1679246874890619",
    "content": "This weekend I built some stable diffusion (image generation) locally. I didn't train a model, which would take infinite time on my laptop, but I was able to download models locally and get a sense of how easy it is compared to something in the cloud like midjourney. I would say it's still fairly complex and that the results are not as good. BUT. In exploring the internet I came across a huge huge amount of people working on this as a hobby. Remember, hobbies tend to signal that a mature market might follow (PCs were a nerd hobby). So for example, check out this site where people have trained hundreds of domain specific models for people to generate their own content. The way to think about this is, for example you want to make a new Back to the Future movie, so you fine tune these models on the characters from the show, and then you can create scenes using them. Of course, this is not nearly as good, yet, but it's moving fast.",
    "comments": [
      {
        "date": "2023-03-14",
        "note": "GPT-4 launched 5 days before this message was written"
      },
      {
        "date": "2023-06",
        "note": "Midjourney reportedly reached hundreds of millions in revenue with fewer than 50 employees by mid-2023"
      },
      {
        "date": "2023-10",
        "note": "Stability AI raised funding at a $1 billion valuation"
      },
      {
        "date": "2023-03-21",
        "note": "Adobe Firefly launched, absorbing techniques pioneered by the hobbyist Stable Diffusion community"
      },
      {
        "date": "2023-06",
        "note": "Hobbyist tools like Automatic1111 and ComfyUI became pipelines for techniques (LoRA fine-tuning, ControlNet) rapidly adopted into commercial products"
      },
      {
        "date": "2023-09",
        "note": "Canva launched its AI image generation suite, incorporating techniques from the open-source community"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2023-03-20 15:55",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "A lot of people are going to go off and work on first-order consequences of LLMs, but the big opportunities are probably second and third order, where you're not competing with everyone else looking to make a quick buck.\n\nAs an example, a first-order approach would be to just ask GPT to come up with a lesson to learn algebra. It will probably do a good job. But for what we do, we still need to make sure it meets various educational standards and that we can measure results confidently.\n\nA second-order use would be taking a lesson that already exists and rewriting it for a kid who is interested in cars, or fishing, or hiking. That would add engagement and will almost certainly keep the underlying material consistent with our objective.\n\nAnother example might be to use an LLM to train the tutors themselves. Since we need physical people at schools because part of the problem is that kids need supervision, we need to deal with humans. But we might be able to train them better if we require that everyone spends a certain number of minutes per week talking to the AI, and we measure the interactions.\n\nBut the basic stuff — like using LLMs to answer emails or send wedding thank-you notes — is going to get cherry-picked quickly and not be worth much. I think.",
    "comments": [
      {
        "date": "2024",
        "note": "The 'thin wrapper' AI startup wave collapsed — 254 venture-backed startups filed for bankruptcy in Q1 2024 alone, a 60% jump from 2023, as first-order LLM applications like email assistants became commoditized exactly as predicted"
      },
      {
        "date": "2024-03",
        "note": "LAUSD launched 'Ed,' a $6M AI personal assistant for 55,000 students, but the ed-tech firm behind it collapsed within months — illustrating the gap between first-order AI demos and the harder second-order work of meeting educational standards"
      },
      {
        "date": "2024",
        "note": "A randomized controlled trial of Tutor CoPilot — an AI system that coaches human tutors in real time — showed a 4 percentage-point increase in student mastery, with the largest gains (+9 p.p.) for students of lower-rated tutors. This validated the second-order 'use AI to train the tutors' idea described here"
      },
      {
        "date": "2023-05",
        "note": "Khan Academy launched Khanmigo, which personalizes lessons to student interests and context rather than simply generating answers — the second-order approach described in this message. By 2025 it reached 700,000 students"
      },
      {
        "date": "2024-09",
        "note": "Infosys chairman told CNBC that LLMs will be commoditized and the real value will come from applications built on top of them — restating the first-order vs. second-order framework laid out here 18 months earlier"
      }
    ],
    "channelName": "Nick Larusso",
    "rating": "outstanding"
  },
  {
    "date": "2023-04-11 08:14",
    "channel": "ai",
    "link": "https://scopvc.slack.com/archives/C050GL4A4RG/p1681226099859859",
    "content": "LLMs (specifically transformers) basically have their own scaling laws, similar to Moore's law.\n\nAs you increase compute / data / and parameters, the capability of the models has shown to increase in a predictable way.\n\nSo as long as there is more data and more GPUs, they will keep getting better at the same rate, which has been astronomical.\n\nImportant to think about this when investing. In the early years of semi conductors, things that seemed impossible would become possible in 12 or 24 months. This will be the same. We can't discount the capability and we have to assume anyone working on this will have X times the capability in another year or two.",
    "comments": [
      {
        "date": "2020-01",
        "note": "Kaplan et al. published the neural scaling laws paper showing predictable, log-linear improvements in model capability with increased compute, data, and parameters"
      },
      {
        "date": "2022-03",
        "note": "Hoffmann et al. published the 'Chinchilla' paper refining optimal scaling relationships between model size and training data"
      },
      {
        "date": "2023-01",
        "note": "Microsoft committed $13 billion to OpenAI, a bet directly explained by confidence in scaling laws"
      },
      {
        "date": "2024-06-18",
        "note": "NVIDIA's market cap surpassed $3 trillion, up from approximately $700 billion in early 2023, driven by GPU demand from scaling AI models"
      },
      {
        "date": "2024-03",
        "note": "Claude 3 released by Anthropic, delivering step-function capability gains consistent with scaling law predictions"
      },
      {
        "date": "2023-12",
        "note": "Gemini Ultra released by Google, further validating that increased compute yielded predictable capability improvements"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2023-05-06 12:23",
    "channel": "fund-partners",
    "link": "https://scopvc.slack.com/archives/C01F8JNA0RZ/p1683401005319359",
    "content": "GPT makes me so much smarter and more productive. This is going to alter the course of the economy",
    "comments": [
      {
        "date": "2023-03-14",
        "note": "GPT-4 had been released about 2 months before this message"
      },
      {
        "date": "2023-06",
        "note": "McKinsey report estimated generative AI could add $2.6 to $4.4 trillion annually to the global economy"
      },
      {
        "date": "2024-06",
        "note": "Studies from Harvard Business School, MIT, and others showed 20-40% productivity gains for knowledge workers using LLMs"
      },
      {
        "date": "2025-01",
        "note": "GitHub reported Copilot was writing over 40% of code in enabled repositories"
      },
      {
        "date": "2024-01",
        "note": "AI coding assistants (GitHub Copilot, Cursor) had begun reshaping software development workflows across the industry"
      }
    ],
    "rating": "poor"
  },
  {
    "date": "2023-08-03 12:06",
    "channel": "fund-partners",
    "link": "https://scopvc.slack.com/archives/C01F8JNA0RZ/p1691089562086099",
    "content": "I just talked to Mike at the office for a bit. Here's an early thought about what one could call a \"native\"\nAI business in contrast to vertical SaaS.\n\nVSaaS uses software to productize the operating procedures or \"playbook\" that every company in a certain industry should follow. No point reinventing the wheel. VSaaS is the core software for companies in the sector, the operative system, the last subscription you cancel.\n\nAI native businesses replace employees. While VSaaS takes care of repeatable tasks, employees perform more differentiated tasks. Employees do different things every day. Employees tend to get better over time because they acquire institutional knowledge and understand your company. It's better to start with an employee that has the right fundamentals, but it's also usually better to start with someone early career and have them adapt to your company style, than have someone fully set on their ways. AIs are agents, assistants, employees. An disruptive AI company completely dehumanizes a function.",
    "comments": [
      {
        "date": "2024-10-25",
        "note": "Salesforce launched Agentforce, explicitly marketing AI agents as 'digital labor' that replace headcount rather than augment workflows"
      },
      {
        "date": "2025-01",
        "note": "Sierra AI, founded by ex-Salesforce CEO Bret Taylor, raised at a $4B+ valuation on the thesis that AI replaces employees rather than augmenting SaaS workflows"
      },
      {
        "date": "2024-03",
        "note": "Cognition launched Devin, branded as an 'AI software engineer' — adopting the 'AI employee' framing described in this message"
      },
      {
        "date": "2024-02",
        "note": "Klarna announced its AI assistant was doing the work of 700 customer service agents, validating the concept of AI that 'completely dehumanizes a function'"
      }
    ],
    "rating": "outstanding"
  },
  {
    "date": "2023-10-01 19:18",
    "channel": "fund-partners",
    "link": "https://scopvc.slack.com/archives/C01F8JNA0RZ/p1696213132954979",
    "content": "SaaS hasn't necessarily been deflationary over the last 10 years. It has improved workflows, helped businesses be better operators, be more compliant, make fewer mistakes. But my intuition is that B2B startups over the last few years are extracting about as much money as they add to the business",
    "comments": [
      {
        "date": "2024-02",
        "note": "Klarna announced their AI assistant was doing the work of 700 full-time customer service agents within one month of deployment, at a fraction of prior SaaS tooling costs"
      },
      {
        "date": "2024-06",
        "note": "The 'SaaS is dead' discourse exploded among investors, arguing AI would collapse the $300B SaaS market by making per-seat pricing untenable"
      },
      {
        "date": "2023-03-14",
        "note": "GPT-4 had launched about 6.5 months before this message"
      },
      {
        "date": "2024-09",
        "note": "Multiple VC firms published theses that SaaS value capture was zero-sum, echoing the observation made here months earlier"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2023-10-02 03:43",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "Pushing the envelope is discussing how current trends will change the world in the future and how we can be investing in those trends. For example, the difference between \"every company will just sprinkle AI\" versus \"there will actually be AI-native companies that will outperform.\"\n\nIf it stays hypothetical, maybe not. But you can work backwards. For example, one of my perspectives — which might be wrong but I want to explore more — is that SaaS will hit a wall and new business models will emerge. There are fewer and fewer vertical SaaS companies that can grow as large as Procore. Meanwhile, you'll get many companies that look different from the winners of the last 10 years, which will be the new winners. And having the ability to seek and identify those will definitely be an edge.",
    "comments": [
      {
        "date": "2024-09",
        "note": "Klarna announced it was dropping Salesforce and Workday in favor of AI-powered internal tools, becoming the highest-profile example of a company rejecting traditional SaaS in favor of AI-native alternatives"
      },
      {
        "date": "2025",
        "note": "AI-native startups were growing at over 100% while traditional SaaS stalled at 23% growth, and companies like Cursor reached $100M ARR in about 12 months with roughly 30 employees — a pattern impossible under the old SaaS playbook"
      },
      {
        "date": "2025-08",
        "note": "Forrester CEO George Colony predicted AI would 'gut SaaS' and collapse bloated software pricing models, moving from per-seat subscriptions to usage-based and outcome-based pricing"
      },
      {
        "date": "2025",
        "note": "Procore's revenue growth decelerated from 21% to roughly 14% year-over-year as the company transitioned its go-to-market model, illustrating the ceiling on vertical SaaS growth described here"
      },
      {
        "date": "2026-03",
        "note": "Cursor hit $2B ARR, doubling in just three months, demonstrating that AI-native companies can scale to massive revenue with a fraction of the headcount traditional SaaS required"
      }
    ],
    "channelName": "Heike Schirmer",
    "rating": "outstanding"
  },
  {
    "date": "2023-11-27 17:19",
    "channel": "ai",
    "link": "https://scopvc.slack.com/archives/C050GL4A4RG/p1701134342479599",
    "content": "I really believe this to be true, that AI will dehumanize the economy over time, and that attaching software pricing to seats or even subscription (which is somewhat similar to a salaried position) will be disrupted. Instead, more products will become metered and companies will pay per unit of work (contract annotated, lesson taught, etc).",
    "comments": [
      {
        "date": "2023-03-14",
        "note": "GPT-4 launched approximately 8.5 months before this message"
      },
      {
        "date": "2024-10-25",
        "note": "Salesforce priced Agentforce at $2 per conversation rather than per seat"
      },
      {
        "date": "2024-10",
        "note": "Intercom shifted to a $0.99 per-resolution pricing model for its AI agent Fin"
      },
      {
        "date": "2024",
        "note": "OpenAI moved toward usage-based API pricing rather than flat subscriptions"
      },
      {
        "date": "2024",
        "note": "Outcome-based pricing became the defining business model debate in enterprise AI through 2024-2025"
      }
    ],
    "rating": "outstanding"
  },
  {
    "date": "2024-02-20 09:58",
    "channel": "fund-partners",
    "link": "https://scopvc.slack.com/archives/C01F8JNA0RZ/p1708451898080309",
    "content": "I don't understand how not everyone with wealth isn't terrified to be left behind by AI",
    "comments": [
      {
        "date": "2023-03-14",
        "note": "GPT-4 launched approximately 11 months before this message"
      },
      {
        "date": "2024-06",
        "note": "Nvidia briefly became the world's most valuable company, surpassing $3 trillion in market cap"
      },
      {
        "date": "2024",
        "note": "AI-focused investment funds dramatically outperformed the broader market"
      },
      {
        "date": "2024",
        "note": "Traditional media, mid-market law firms, and legacy software companies saw significant valuation declines"
      }
    ],
    "rating": "poor"
  },
  {
    "date": "2024-05-18 18:15",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "Progress can't be stopped. AGI is inevitable. It's probably true that the nature of the first AGI or ASI might be path-dependent on choices we make as humans — just like inventing nukes is inevitable but self-destruction isn't. But there are hundreds of countries, millions of companies, and billions of people. It's too hard to control. The only thing that gives some level of control is the fact that doing AI is very expensive, so it's hard to do completely in secret. It will happen anyway, so you may as well be part of it — and may as well have the USA lead.",
    "comments": [
      {
        "date": "2025-01-21",
        "note": "Trump announced the $500B Stargate Project with OpenAI, Oracle, and SoftBank — the US government explicitly embracing the 'may as well lead' posture described here"
      },
      {
        "date": "2025-01-20",
        "note": "DeepSeek-R1 launched from China, matching frontier US models despite export controls — validating that AI progress can't be stopped even with national-level restrictions"
      },
      {
        "date": "2025-01-15",
        "note": "Biden's AI Diffusion Rule imposed global export controls on AI chips and model weights, tiering countries into three categories — a direct attempt at the kind of control this message argued was futile"
      },
      {
        "date": "2025-02-09",
        "note": "Sam Altman published 'Three Observations,' stating OpenAI is 'confident we know how to build AGI as we have traditionally understood it' — aligning with the claim that AGI is inevitable"
      }
    ],
    "channelName": "David Schnurr",
    "rating": "neutral"
  },
  {
    "date": "2024-05-29 15:35",
    "channel": "fund-partners",
    "link": "https://scopvc.slack.com/archives/C01F8JNA0RZ/p1717022141646119",
    "content": "Kevin, we have met with William Wang a couple times about the company he wants to start. The idea is to use LLMs for assistance with circuit design, to produce Verilog code and test the resulting generated code for correct behavior.\n\nIt's a long shot. But I think we should do it. He is the most accomplished AI guy in Santa Barbara, and I think we could help a lot bringing the missing players and turning the research into a business. The idea is tangible, but there is  execution risk, even though he plans to leave teaching and has a few PhD students that are coming along.\n\nI think it's one of those situations where there isn't a lot of evidence to dig up, because there isn't much there. But I think we need to try to be in this deal. He  runs the AI efforts at UCSB. We are trying to invest in AI companies and have a brand in SB. It'd suck to miss something here.",
    "comments": [
      {
        "date": "2024-05",
        "note": "LLMs for chip design and Verilog code generation was a genuinely frontier idea at the time of this message"
      },
      {
        "date": "2023",
        "note": "Nvidia announced ChipNeMo, their internal LLM adapted for chip design including Verilog generation"
      },
      {
        "date": "2024",
        "note": "Google DeepMind published research on AI-assisted chip floorplanning"
      },
      {
        "date": "2025-01",
        "note": "Multiple startups had entered the AI-for-EDA (electronic design automation) space"
      },
      {
        "date": "2025",
        "note": "TSMC and Synopsys announced AI-driven design tool integrations"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2024-07-29 10:22",
    "channel": "deals",
    "link": "https://scopvc.slack.com/archives/C016BGX7P44/p1722273779348499",
    "content": "agents in AI are models that have maneuverability to collaborate with each other and deliver complex tasks, for example a coding agent and a tester agent to write software that is less buggy. But this is all state of the art research still",
    "comments": [
      {
        "date": "2024-03",
        "note": "Cognition announced Devin, implementing the write-then-test agent pattern described in this message"
      },
      {
        "date": "2025-05",
        "note": "OpenAI released Codex as an autonomous coding agent"
      },
      {
        "date": "2025",
        "note": "Anthropic launched Claude with agentic coding capabilities"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2024-09-13 23:41",
    "channel": "private",
    "source": "imessage",
    "link": null,
    "content": "Yeah, the whole internet economy will change. Free content doesn't make a lot of sense in an age of AI assistants.\n\nPR is a bit different. I mean content that is actually useful, like writing a blog post about how to repair a bike. If nobody ever reads the blog post except ChatGPT, and then it plagiarizes it, there's no incentive for original content.\n\nWe'll go back to paid content, which could be good for you in the right context.",
    "comments": [
      {
        "date": "2024-05-14",
        "note": "Google launched AI Overviews in US search results, summarizing web content directly in the search page and reducing click-through to the original source — the exact dynamic described here"
      },
      {
        "date": "2025",
        "note": "Global publisher traffic from Google dropped by a third in 2025, with 60% of searches ending in zero clicks — confirming that the free-content-supported-by-ads model was collapsing"
      },
      {
        "date": "2024-07-30",
        "note": "Perplexity AI launched its Publishers' Program with revenue sharing after plagiarism accusations from media outlets — an early attempt to address the broken incentive problem identified here"
      },
      {
        "date": "2023-12-27",
        "note": "The New York Times sued OpenAI for copyright infringement, alleging ChatGPT reproduced its content without permission — the 'plagiarizes it' concern made into a landmark legal case"
      },
      {
        "date": "2024",
        "note": "By late 2024, 79% of top news websites were blocking AI training bots via robots.txt, up dramatically from prior years — publishers acting on the realization that free content was being exploited without compensation"
      }
    ],
    "channelName": "Noah Greenberg",
    "rating": "good"
  },
  {
    "date": "2025-01-05 18:50",
    "channel": "@neversupervised",
    "source": "twitter",
    "link": "https://x.com/neversupervised/status/1875978314462151070",
    "content": "This morning I realized that a lot of my crazy behavior is just singularity prepping.",
    "comments": [
      {
        "date": "2025-01",
        "note": "The Stargate Project was announced on January 21, 2025 — a $500 billion AI infrastructure initiative backed by OpenAI, SoftBank, and Oracle — signaling that major players were actively 'prepping' for an AI-driven transformation of the economy"
      },
      {
        "date": "2025-08",
        "note": "CNBC reported that AI created more than 50 new billionaires in 2025 alone, with MIT researcher Andrew McAfee calling it the fastest wealth creation in over 100 years of data — rewarding those who had positioned themselves early"
      },
      {
        "date": "2025-02",
        "note": "Sam Altman publicly predicted the coming of one-person billion-dollar companies enabled by AI, and TechCrunch reported tech CEOs had a betting pool on when it would happen — framing personal AI skill-building as a rational preparation strategy"
      }
    ],
    "rating": "poor"
  },
  {
    "date": "2025-01-05 18:50",
    "channel": "@neversupervised",
    "source": "twitter",
    "link": "https://x.com/neversupervised/status/1875978127392006236",
    "content": "The singularity is a one way rocket ship we will all be forced into, and whatever you have at the time of departure (wealth, knowledge, relationships) will largely determine your situation moving forward.",
    "comments": [
      {
        "date": "2025-01",
        "note": "Oxfam reported that billionaire wealth jumped three times faster in 2025 to its highest peak ever, with AI being the primary driver — illustrating how existing wealth advantages compound as the technology accelerates"
      },
      {
        "date": "2025-08",
        "note": "PwC's Global AI Jobs Barometer found that AI-skilled workers commanded a 56% wage premium in 2024, double the 25% premium from the prior year — showing that knowledge advantages at the 'time of departure' are already compounding"
      },
      {
        "date": "2025-03",
        "note": "A quarter of Y Combinator's Winter 2025 cohort had codebases that were 95% AI-generated, yet the founders were described as highly technical — demonstrating that prior knowledge and skills determined who could leverage AI most effectively"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-01-06 16:25",
    "channel": "@neversupervised",
    "source": "twitter",
    "link": "https://x.com/neversupervised/status/1876304039815889281",
    "content": "AI is real, yet markets aren't behaving unusually. Instead, we're seeing classic speculative psychology. The shoe-shiner has been buying Nvidia for a while.  The likely correction will come not from AI's failure, but from fear of losing gains. The current rise benefited from",
    "comments": [
      {
        "date": "2024-12",
        "note": "CNBC reported that retail investors poured a record $30 billion into Nvidia in 2024 — an 885% increase over three years — with the stock becoming the most-bought equity on Robinhood, matching the 'shoe-shiner buying Nvidia' observation"
      },
      {
        "date": "2025-01",
        "note": "On January 27, Nvidia lost $589 billion in market cap in a single day after the DeepSeek announcement — the largest single-day loss in stock market history — driven not by AI's failure but by fear that existing AI investments were overpriced, exactly as predicted"
      },
      {
        "date": "2025-04",
        "note": "The Nasdaq plunged 5.97% on April 2 after Trump's 'Liberation Day' tariff announcement, with Nvidia falling 23% from its January peak — a correction triggered by macroeconomic fear rather than any deterioration in AI's actual capabilities"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-02-01 19:59",
    "channel": "@neversupervised",
    "source": "twitter",
    "link": "https://x.com/neversupervised/status/1885779983236465029",
    "content": "Even if the prospects of AGI are certain, it's possible for the system to run out of steam before getting there. Our economic and infrastructure capabilities aren't unbounded in the short term. There's a chance this wave will collapse under the weight of its own investments.",
    "comments": [
      {
        "date": "2025-01",
        "note": "DeepSeek's revelation that it trained a frontier-competitive model for roughly $5.6 million triggered a global tech selloff, raising immediate questions about whether the hundreds of billions being spent on AI infrastructure by Western companies represented massive overinvestment"
      },
      {
        "date": "2025-04",
        "note": "The 2025 stock market crash beginning April 2 wiped trillions off tech valuations, with the Nasdaq entering bear-market territory in 32 sessions — demonstrating how macroeconomic shocks (tariffs) can collapse an AI investment wave regardless of the technology's merit"
      },
      {
        "date": "2025-11",
        "note": "Gartner predicted that more than 40% of agentic AI projects would be cancelled by 2027 due to unanticipated cost and complexity of scaling — a concrete forecast of the wave collapsing under the weight of its own investments"
      },
      {
        "date": "2025-09",
        "note": "Nvidia's stock pulled back over 15% from its October 2025 high amid growing skepticism about its 45x P/E ratio, slowing AI server demand, and Goldman Sachs warnings about AI infrastructure bottlenecks — reflecting the short-term economic limits described here"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-02-18 18:14",
    "channel": "@neversupervised",
    "source": "twitter",
    "link": "https://x.com/neversupervised/status/1891914279776833685",
    "content": "Human to human APIs are inefficient and many will get substituted.",
    "comments": [
      {
        "date": "2024-02",
        "note": "Klarna announced its AI assistant handled 2.3 million customer service conversations in its first month, doing the work of 700 human agents — a direct substitution of human-to-human customer service interfaces with AI"
      },
      {
        "date": "2025-07",
        "note": "OpenAI folded its Operator product into ChatGPT as 'agent mode,' enabling a single AI to browse the web, fill forms, and complete multi-step workflows that previously required human intermediaries like travel agents and administrative assistants"
      },
      {
        "date": "2025-12",
        "note": "Gartner projected that conversational AI would reduce contact center labor costs by $80 billion by 2026, with early adopters of agentic AI reporting 60-70% cost reductions — quantifying the substitution of human-to-human service APIs at scale"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2025-02-25 21:49",
    "channel": "@neversupervised",
    "source": "twitter",
    "link": "https://x.com/neversupervised/status/1894505052947321212",
    "content": "The notion of AI agents was built on task-specialized models. Today's implementations just use one model with different prompts. If prompting is all it takes, eventually a single LLM will handle everything in one simpler thread. You might parallelize for performance, but an",
    "comments": [
      {
        "date": "2024-12",
        "note": "Google launched Gemini 2.0 Flash explicitly as a model 'for the agentic era,' designed as a single unified model with native tool use, multimodal I/O, and a 1M token context window — moving toward the one-model-handles-everything architecture described here"
      },
      {
        "date": "2025-07",
        "note": "OpenAI merged Operator, Deep Research, and ChatGPT into a single 'agent mode' — consolidating what had been separate specialized tools into one model handling everything in a unified thread, exactly as this tweet predicted"
      },
      {
        "date": "2024-11",
        "note": "Anthropic released the Model Context Protocol (MCP) as an open standard, enabling a single model to connect to arbitrary external tools and data sources — standardizing the pattern where one LLM orchestrates everything rather than requiring specialized agent models"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-03-31 20:32",
    "channel": "deals",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C016BGX7P44/p1743453169722429",
    "content": "My theory is that AI will cause a drastic change to sales. Product-led growth and enterprise will change less, but mid-market will keep becoming more spammy and more competitive.",
    "comments": [
      {
        "date": "2025-11",
        "note": "Gmail began actively rejecting (not just filtering) non-compliant bulk emails, as AI-generated outreach volume overwhelmed inboxes and forced email providers to adopt stricter enforcement"
      },
      {
        "date": "2025-10",
        "note": "11x.ai, an AI SDR platform offering fully autonomous 'digital sales workers,' raised $74M from Benchmark and Andreessen Horowitz, fueling the flood of AI-generated mid-market outreach"
      },
      {
        "date": "2026-02",
        "note": "Average cold email response rates fell to 3.1% in 2026 (down from 8.5% in 2019), with roughly 60% of all emails now estimated to be spam — validating the prediction that mid-market sales would become more spammy and competitive"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2025-04-04 17:36",
    "channel": "deals",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C016BGX7P44/p1743788190336639?thread_ts=1743788115.421719",
    "content": "You can oversimplify everything you want, but most founders are nowhere near close to using all the AI capabilities available to them.",
    "comments": [
      {
        "date": "2025-08",
        "note": "An MIT report found that 95% of generative AI pilots at companies were failing to deliver measurable business impact, with the core issue being a 'learning gap' rather than model quality"
      },
      {
        "date": "2025-11",
        "note": "EY's Work Reimagined Survey of 15,000 employees found that while 88% used AI daily, only 5% qualified as advanced users — confirming that most people barely scratch the surface of available AI capabilities"
      },
      {
        "date": "2025-06",
        "note": "McKinsey's State of AI survey reported that two-thirds of companies remained stuck in experimentation or pilot phases, unable to scale AI beyond basic use cases"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2025-04-07 14:17",
    "channel": "UDFR6QDUZ",
    "link": "https://scopvc.slack.com/archives/D016RAW8QMD/p1744060641718849",
    "content": "There's going to be a gazillion vertical AI companies that are doing something simple and won't turn into anything meaningful. It's just the way in which AI is diffusing through the market. A bunch of small acquisitions or acquihires to integrate these into other companies",
    "comments": [
      {
        "date": "2024-03",
        "note": "Microsoft acquihired Inflection AI's team for $650M"
      },
      {
        "date": "2024-08",
        "note": "Google acquihired Character.AI's founders for $2.7B"
      },
      {
        "date": "2024",
        "note": "Amazon invested $4B in Anthropic"
      },
      {
        "date": "2025-06",
        "note": "FTC and DOJ opened investigations into AI 'license and acquihire' structures"
      },
      {
        "date": "2025",
        "note": "Big Tech deployed over $40B to absorb AI startup teams while avoiding formal M&A review"
      }
    ],
    "rating": "poor"
  },
  {
    "date": "2025-04-09 02:18",
    "channel": "U07C2M8C7U7",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/D07C2M8GJUF/p1744165098344149?thread_ts=1744163763.025589",
    "content": "They are just guys with families in the business who one day thought, we should build a SaaS product. But SaaS is 15 years old. You have to think a bit further out now.",
    "comments": [
      {
        "date": "2025-10",
        "note": "Bain & Company's Technology Report warned that agentic AI was disrupting SaaS by automating entire workflows, and that incumbents who failed to proactively replace SaaS activity with AI risked obsolescence"
      },
      {
        "date": "2026-01",
        "note": "The median revenue multiple for public SaaS companies dropped below 5x (from above 7x at the start of 2025), reflecting investor recognition that traditional SaaS economics were under structural threat"
      },
      {
        "date": "2026-02",
        "note": "Approximately $2 trillion in market capitalization evaporated from the software sector between mid-January and mid-February 2026 in what analysts dubbed the 'SaaSpocalypse,' driven by AI agents replacing entire SaaS product categories"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-04-11 17:20",
    "channel": "general",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/CDGDNJX5Y/p1744392003602039",
    "content": "I don't think we should invest in companies where founders aren't reflexively thinking in terms of AI tools.",
    "comments": [
      {
        "date": "2025-06",
        "note": "Y Combinator's Summer 2025 batch was nearly 50% AI agent companies (67 out of 144 startups), signaling that the top accelerator had effectively adopted the same filter for AI-native thinking"
      },
      {
        "date": "2025-12",
        "note": "A Harvard Law Forum analysis of VC outlook for 2026 noted that nearly half of all global venture funding in 2025 went into AI (up from roughly a third the year before), with non-AI startups struggling to raise capital"
      },
      {
        "date": "2026-02",
        "note": "Bloomberg reported that VC firms were actively recruiting AI talent onto their own investment teams to better evaluate AI-native founders and their technical claims"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2025-04-13 21:29",
    "channel": "U05VA997V5Z",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/D08J9P424UQ/p1744579775544859",
    "content": "I still think there is opportunity for someone to design a web framework that is specifically intended for LLMs, so they have to make fewer choices and their output is predictable.",
    "comments": [
      {
        "date": "2025-05",
        "note": "Vercel released v0-1.0-md, an AI model explicitly optimized for front-end and full-stack web development, with a composite pipeline that detects and fixes LLM coding errors in real time — directly addressing the 'fewer choices, predictable output' idea"
      },
      {
        "date": "2025-10",
        "note": "Anthropic reported that roughly 90% of Claude Code's own codebase was written by Claude Code itself, demonstrating the demand for toolchains purpose-built to make LLM output reliable and predictable"
      },
      {
        "date": "2026-01",
        "note": "A Second Talent study found that AI-generated code contained 1.7x more major issues than human-written code, with readability issues 3x higher — underscoring the need for frameworks that constrain LLM choices and enforce consistency"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-04-23 02:06",
    "channel": "@neversupervised",
    "source": "twitter",
    "link": "https://x.com/neversupervised/status/1914863321238659479",
    "content": "AI has dramatically lowered the barriers to creating new companies. A couple of decades ago, building a startup required substantial engineering, a sort of \"proof of work\" baked into the system. The advent of cloud infrastructure reduced initial investment needs but still",
    "comments": [
      {
        "date": "2025-03",
        "note": "TechCrunch reported that a quarter of Y Combinator's Winter 2025 batch had codebases that were 95% AI-generated, with YC managing partner Jared Friedman confirming these AI-built companies were among the fastest-growing and most profitable in YC's history"
      },
      {
        "date": "2025-02",
        "note": "Andrej Karpathy coined 'vibe coding' in February 2025 to describe building software through natural-language prompts rather than manual engineering — a term that went viral and crystallized the lowered barrier this tweet describes"
      },
      {
        "date": "2025-05",
        "note": "OpenAI launched Codex as an autonomous coding agent that could write features, fix bugs, and propose pull requests in parallel — enabling founders to ship software at the speed of a full engineering team without hiring one"
      },
      {
        "date": "2025-02",
        "note": "Sam Altman predicted the emergence of one-person billion-dollar companies enabled by AI tools, with Midjourney already demonstrating the model: fewer than 15 employees generating over $200 million in annual revenue"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2025-05-13 16:59",
    "channel": "deals",
    "link": "https://scopvc.slack.com/archives/C016BGX7P44/p1747180757341169",
    "content": "I think it's the opposite. Building software companies used to require very smart technical people, and a lot of effort to get off the ground. With the internet and especially cloud providers, it became a lot easier. I think 2010 to 2020 was the golden age of starting software companies, where you still had mostly technical people, but they could use a lot of leverage and get a lot done faster. Call it the age of YCombinator. Now, it has become too easy to start a software company, so there is no proof of work. Anyone can do it. So, at least until the AI has fully replaced humans, I argue having smarts is one of the only differentiators in an early stage company\n\nhttps://ivanbercovich.com/2025/no-more-proof-of-work",
    "comments": [
      {
        "date": "2025-02-06",
        "note": "Andrej Karpathy coined the term 'vibe coding'"
      },
      {
        "date": "2025",
        "note": "Market became flooded with AI-generated MVPs and vibe-coded startups"
      },
      {
        "date": "2025",
        "note": "Tools like Cursor, Bolt, and Lovable made shipping software products trivially easy"
      },
      {
        "date": "2025",
        "note": "'Vibe coding' was named Collins English Dictionary Word of the Year"
      },
      {
        "date": "2025",
        "note": "VCs faced a core challenge evaluating the flood of AI-native startups where building itself was no longer a differentiator"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-05-14 00:50",
    "channel": "@neversupervised",
    "source": "twitter",
    "link": "https://x.com/neversupervised/status/1922454402335428955",
    "content": "Universal access to information didn't result in the marginal utility of knowledge going to zero. What does that imply about intelligence as a commodity?",
    "comments": [
      {
        "date": "2025-07",
        "note": "Pew Research surveyed 5,023 U.S. adults and found that despite near-universal access to AI tools, half of Americans said they were more concerned than excited about AI, and more than half lacked confidence in distinguishing AI from human output — suggesting access to intelligence does not equate to ability to use it"
      },
      {
        "date": "2026-01",
        "note": "Goldman Sachs chief economist Jan Hatzius stated AI added 'basically zero' to U.S. GDP growth in 2025, despite massive infrastructure investment, illustrating that commodity access to intelligence has not yet translated into proportional economic value"
      },
      {
        "date": "2025-05",
        "note": "AI-skilled workers commanded a 56% wage premium in 2025, more than double the 25% premium from just one year earlier, demonstrating that even as AI democratizes intelligence access, the ability to wield it effectively commands increasing returns"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-05-14 02:00",
    "channel": "U05VA997V5Z",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/D08J9P424UQ/p1747188054038379",
    "content": "It makes me want to stop everything and build an observability application — Weights & Biases seems just like Datadog. I don't see a lot of automatic cleverness specific to LLM training.",
    "comments": [
      {
        "date": "2025-05",
        "note": "CoreWeave completed its $1.7 billion acquisition of Weights & Biases, absorbing it into a GPU cloud platform rather than letting it evolve as a standalone LLM-specific observability tool"
      },
      {
        "date": "2025-10",
        "note": "Langfuse, an open-source LLM observability platform, gained significant traction as an alternative to W&B, suggesting the market agreed that existing tools lacked LLM-specific intelligence"
      },
      {
        "date": "2026-01",
        "note": "ClickHouse acquired Langfuse, validating the thesis that LLM-native observability was an underserved market distinct from generic ML experiment tracking"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-05-19 20:29",
    "channel": "@neversupervised",
    "source": "twitter",
    "link": "https://x.com/neversupervised/status/1924563065821184146",
    "content": "vibe founder: when someone thinks they built a unicorn in 3 months using LLMs",
    "comments": [
      {
        "date": "2025-03",
        "note": "Y Combinator CEO Garry Tan announced that 25% of the Winter 2025 batch had codebases that were 95% AI-generated, with some founders shipping in weeks what previously took months — setting the stage for the overconfidence this tweet satirizes"
      },
      {
        "date": "2025-05",
        "note": "Semafor reported that vibe coding startup Lovable had security vulnerabilities in 170 out of 1,645 apps built on its platform, exposing user names, emails, financial data, and API keys — illustrating how quickly 'built a unicorn' can become 'built a liability'"
      },
      {
        "date": "2025-12",
        "note": "Groove founder Alex Turnbull estimated that over 8,000 startups needed rebuilds or rescue engineering costing $50K–$500K each due to vibe coding technical debt, predicting rescue engineering would be the hottest discipline in tech in 2026"
      },
      {
        "date": "2025-11",
        "note": "Collins Dictionary named 'vibe coding' its 2025 Word of the Year, cementing the phenomenon — and by extension the 'vibe founder' archetype — as a defining cultural moment"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-05-24 00:03",
    "channel": "@neversupervised",
    "source": "twitter",
    "link": "https://x.com/neversupervised/status/1926066518871142402",
    "content": "LLMs will make em/double dashes disappear because nobody wants to sound like slop. I was an avid user of this English feature before, but now I find it gross.",
    "comments": [
      {
        "date": "2025-06",
        "note": "Plagiarism Today published a widely shared analysis noting that em dashes had become the most commonly cited tell for AI-generated text, leading human writers to self-censor their use to avoid accusations of using AI"
      },
      {
        "date": "2025-08",
        "note": "The Ringer published 'Stop AI-Shaming Our Precious, Kindly Em Dashes — Please,' documenting how writers were abandoning em dashes not because of dislike but out of fear their writing would be flagged as AI slop — exactly the behavioral shift predicted"
      },
      {
        "date": "2025-09",
        "note": "An Inside Higher Ed op-ed framed the 'em dash debate we should be having,' noting that the stigma had become so strong that professors were using em dash frequency as informal evidence of AI use in student papers, further driving avoidance"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-05-27 04:23",
    "channel": "U05VA997V5Z",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/D08J9P424UQ/p1748319811929109",
    "content": "When I vibe code, one of the biggest issues is that the LLM is not good at testing that things work. There's very little test-driven development, for example, and the LLM is usually lazy about adding logging, let alone trying to take screenshots with a headless browser to see if things look like they should.",
    "comments": [
      {
        "date": "2025-12",
        "note": "A Veracode study of 100+ LLMs across 80 coding tasks found that 45% of AI-generated code introduced security vulnerabilities, with logic errors 75% more common than in human-written code — confirming the testing gap described here"
      },
      {
        "date": "2026-01",
        "note": "Stack Overflow published an analysis asking 'Are bugs and incidents inevitable with AI coding agents?', finding that AI-generated code suffered from 1.7x more major issues and that code refactoring dropped from 25% to under 10% of changed lines"
      },
      {
        "date": "2026-01",
        "note": "CodeRabbit declared '2025 was the year of AI speed; 2026 will be the year of AI quality,' noting that the industry was pivoting toward multi-agent review workflows where one agent writes, another critiques, and another tests"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2025-06-02 04:49",
    "channel": "agentic-engineering",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C08US021ZHC/p1748839747796509",
    "content": "Agentic Engineering\n\nJust as most \"AI\" applications are GPT wrappers, most \"agents\" are sophisticated prompts with basic tool use. There's excitement around the idea of Model Context Protocol (MCP), but most implementations are basic: create calendar events, commit code, check emails, and so on.\n\nSo far, the best-implemented use case for agents has been in software development. Why is that? Because software provides a number of feedback loops or \"verifiers\" that can be used to orient the LLM towards a correct answer: code execution, error messages, etc. Furthermore, there are datasets such as programming competitions where the correct answer is known, which means one can use reinforcement learning by picking the reasoning trajectories that succeed and feeding that back to the models.\n\nIn a way, programming is the low-hanging fruit among a broader set of problems. Recent work by DeepMind showed that models could iterate on GPU/TPU kernels (micro-programs that run on the GPUs themselves) to optimize certain operations. They achieved this by asking an LLM to produce statistically reasonable variants of a base code, then testing it for performance, then picking the top performers and mutating them further. These kernels are software, but the feedback loop is more complicated than simply running a Python snippet.\n\nLikewise, semiconductor design starts with Verilog code, which is later compiled into transistors. Except that in order to verify a chip design works as intended, it requires complex simulation tools produced by Synopsys, Cadence, and others. In order for an agent to find issues and iterate, it needs access to these simulation tools and must understand their outputs. As an example, hardware design engineers spend many hours looking at waveforms generated by simulations in order to verify correctness or fix bugs.\n\nThe basic idea repeats itself for any engineering discipline, comprised of the following components:\n1. A technical language or interface that encodes desired functionality\n2. A simulation tool to test the design against real-world conditions\n3. A set of debugging tools, often visual, to support human-in-the-loop iteration\n4. A compiler or artifact generator that allows design to move to the next phase of production/manufacturing",
    "comments": [
      {
        "date": "2025-05",
        "note": "Google DeepMind announced AlphaEvolve, which used Gemini with automated verifiers in an evolutionary framework to optimize GPU kernels, achieving a 32.5% speedup on FlashAttention — precisely the LLM-plus-verifier pattern described in this post"
      },
      {
        "date": "2025-10",
        "note": "ChipAgents raised a $21M Series A led by Bessemer with backing from Micron, MediaTek, and Ericsson, building exactly the agentic framework for Verilog design and verification described here — with access to simulation tools for automated iteration"
      },
      {
        "date": "2026-02",
        "note": "Cadence launched the ChipStack AI Super Agent — the first agentic workflow for automating chip design and verification — with early deployments at Nvidia, Qualcomm, and Tenstorrent, directly validating the prediction that agents need simulation-tool integration to tackle semiconductor design"
      },
      {
        "date": "2025-10",
        "note": "Andrej Karpathy coined 'agentic engineering' as the successor to 'vibe coding,' describing a shift from fast-and-fragile AI coding to systematic multi-agent workflows with verification — closely mirroring the feedback-loop framework outlined in this post"
      }
    ],
    "rating": "outstanding"
  },
  {
    "date": "2025-06-05 14:14",
    "channel": "deals",
    "link": "https://scopvc.slack.com/archives/C016BGX7P44/p1749158050842269",
    "content": "I think we are actually getting to the end of Excel. Excel is less legible to an LLM. I would think the analyst of the future would use an LLM, which will itself create python scripts to produce a forecast. Something more similar to a jupyter notebook",
    "comments": [
      {
        "date": "2023-07",
        "note": "ChatGPT's Code Interpreter launched, demonstrating the LLM-to-Python-to-analysis workflow"
      },
      {
        "date": "2024",
        "note": "Claude's Artifacts showed LLMs naturally producing and executing Python rather than manipulating spreadsheets"
      },
      {
        "date": "2025",
        "note": "Observable-notebook-style interfaces rose in AI tools like Hex, Deepnote, and Anthropic's artifact system"
      },
      {
        "date": "2025",
        "note": "Shift underway where analysts describe intent in natural language and LLMs generate executable analytical code"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2025-06-06 14:52",
    "channel": "ai",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C050GL4A4RG/p1749221546796659?thread_ts=1749217932.999879",
    "content": "I think you are still missing how expensive it is to build good AI companies. It costs tens of thousands of dollars to do a little experiment in the afternoon.",
    "comments": [
      {
        "date": "2025-05",
        "note": "Reports emerged that a single GPT-5 training run cost OpenAI approximately $500 million in compute alone, illustrating the massive expense of frontier AI development"
      },
      {
        "date": "2025-09",
        "note": "Anthropic disclosed roughly $4.1 billion in training costs for 2025, with the company projecting $100 billion in training spend over the next three years"
      },
      {
        "date": "2026-01",
        "note": "Epoch AI reported that training compute costs for the largest AI models were doubling every eight months, with frontier runs approaching $1 billion per experiment"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2025-06-06 17:20",
    "channel": "U05VA997V5Z",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/D08J9P424UQ/p1749230453841249",
    "content": "My suspicion is that backtracking, or high-entropy tokens like Kath wants to look for, aren't all that helpful for reasoning. That if you clean and shorten trajectories to the direct path to an answer, you actually get a smarter model.",
    "comments": [
      {
        "date": "2025-04",
        "note": "The Retro-Search paper (arXiv 2504.04383) demonstrated that retrospectively trimming redundant backtracking steps from reasoning traces reduced average reasoning length by 31% while improving performance by 7.7% across seven math benchmarks — directly confirming the intuition here"
      },
      {
        "date": "2025-06",
        "note": "A NeurIPS 2025 paper ('Beyond the 80/20 Rule') showed that only ~20% of tokens in chain-of-thought reasoning are high-entropy 'fork' tokens, and restricting gradient updates to just those tokens matched or beat full-gradient training on Qwen3 models"
      },
      {
        "date": "2025-06",
        "note": "Research on reasoning LLM 'overthinking' found that models generate excessively long solutions with unnecessary reflection and backtracking tokens (e.g., 'wait,' 'however'), and that suppressing these verbose behaviors improved both efficiency and accuracy"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-06-09 18:49",
    "channel": "ai",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C050GL4A4RG/p1749494972399469",
    "content": "Also, there's a situation with increasing compute — which has been the approach in the current era of AI — where costs are escalating so fast that the current rate of improvement can only go until 2030 or so. Which means you either get AGI by around 2030, or it might take quite a bit longer.",
    "comments": [
      {
        "date": "2025-09",
        "note": "Epoch AI published 'Can AI Scaling Continue Through 2030?', identifying four constraints — power availability, chip manufacturing, data scarcity, and the latency wall — that could cap training runs around 2e29 FLOP by decade's end"
      },
      {
        "date": "2025-09",
        "note": "Anthropic CEO Dario Amodei predicted AI systems matching Nobel Prize winners across most disciplines by late 2026 or early 2027, while cautioning it could take longer — echoing the 'either AGI soon or much later' framing in this message"
      },
      {
        "date": "2025-01",
        "note": "Sam Altman declared 'we are now confident we know how to build AGI,' with OpenAI projecting research-intern-level AI by September 2026 and fully automated research by March 2028"
      },
      {
        "date": "2025-09",
        "note": "Epoch AI estimated frontier training runs would require 4–16 gigawatts of power by 2030, while McKinsey forecast $5.2 trillion in cumulative AI data-center capital expenditure needed to sustain the current scaling trajectory"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-06-09 21:46",
    "channel": "deals",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C016BGX7P44/p1749505578634579",
    "content": "I think this talk is illustrative of the challenges with AI companies.\n\nYes, LLMs can have a huge impact in a lot of domain-specific workflows.\n\nBut — what are YOU doing on top of the LLM to make it better?\n\nIt's not enough to just make the observation that LLMs have utility.",
    "comments": [
      {
        "date": "2026-02",
        "note": "Google Cloud VP Darren Mowry publicly warned that two types of AI startups — LLM wrappers and AI aggregators — face extinction, stating 'if you're really just counting on the back-end model to do all the work, the industry doesn't have a lot of patience for that anymore'"
      },
      {
        "date": "2026-02",
        "note": "A broad 'SaaS Apocalypse' narrative took hold on Wall Street, with software stocks including Salesforce and Adobe declining sharply as investors questioned whether thin-wrapper products could survive as foundation models improved"
      },
      {
        "date": "2025-12",
        "note": "Harvey AI, which built deep legal-domain tooling on top of LLMs rather than a simple wrapper, reached $195 million ARR and an $8 billion valuation — exemplifying the differentiation this message calls for"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2025-06-10 15:29",
    "channel": "ai",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C050GL4A4RG/p1749569376488219",
    "content": "I think the investment bet right now is to apply the same idea of an agent to other artifacts. The Excel pitch we saw is actually applicable, even if there might have been other issues. You could have one that works on geospatial data, waveforms (chip agents), architecture, interior design, and so on.",
    "comments": [
      {
        "date": "2025-11",
        "note": "Cadence acquired Seattle-based startup ChipStack and in February 2026 launched the ChipStack AI Super Agent — the first agentic workflow for automating chip design and verification — deployed by NVIDIA, Qualcomm, and Altera with reported 10x productivity gains"
      },
      {
        "date": "2025-06",
        "note": "Synopsys announced expanding AI capabilities across its EDA solutions with AgentEngineer technology, building multi-agent systems for autonomous chip design workflows — validating the 'waveforms / chip agents' thesis"
      },
      {
        "date": "2026-02",
        "note": "The AI agent market grew from $5.25 billion (2024) to $7.84 billion (2025), with industry-specific vertical agents in legal, healthcare, and finance showing 3–5x higher retention than horizontal solutions"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-06-10 18:01",
    "channel": "UDGBMD40K",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/D01769CKC75/p1749578500700719",
    "content": "Yes, but there will be a lot of pressure from the bottom. Salesforce has however many tens of thousands of employees and a big cost structure. I am skeptical of their future.",
    "comments": [
      {
        "date": "2025-09",
        "note": "Salesforce CEO Marc Benioff confirmed 4,000 layoffs, stating 'I need less heads' as AI agents replaced customer support roles — the very bottom-up pressure predicted here"
      },
      {
        "date": "2025-09",
        "note": "Sierra AI, founded by former Salesforce co-CEO Bret Taylor, reached $100 million ARR and a $10 billion valuation by building AI customer-service agents that directly compete with Salesforce's core business"
      },
      {
        "date": "2026-02",
        "note": "Salesforce shares dropped roughly 25–30% in early 2026 amid a broader 'Death of SaaS' sell-off, with Anthropic's Claude Cowork legal-automation launch alone erasing approximately $285 billion in software market cap in a single trading day"
      },
      {
        "date": "2026-02",
        "note": "Salesforce laid off another ~1,000 employees in February 2026 across marketing, product management, and even its own Agentforce AI product teams, underscoring the cost-structure burden described in this message"
      }
    ],
    "rating": "outstanding"
  },
  {
    "date": "2025-06-11 04:16",
    "channel": "UDGBMD40K",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/D01769CKC75/p1749615414793959",
    "content": "I am interested in a particular problem space, which is to work on agents that can do really difficult tasks. ChipAgents is focused on chip design, which is a problem with a lot of depth that will take years to solve. Cursor is doing the same for programming. These difficult problems will have multi-year roadmaps and can stay ahead of general-purpose models for the time being.",
    "comments": [
      {
        "date": "2025-11",
        "note": "Cursor raised $2.3 billion at a $29.3 billion valuation — up from $9.9 billion just five months earlier — confirming that deep, domain-specific coding agents command enormous value"
      },
      {
        "date": "2025-11",
        "note": "Cadence acquired ChipStack, a chip-design AI startup, and by February 2026 deployed the ChipStack AI Super Agent with NVIDIA, Qualcomm, and Altera, delivering 10x productivity gains on front-end verification"
      },
      {
        "date": "2026-03",
        "note": "Cursor surpassed $2 billion in annualized recurring revenue — doubling in just three months — demonstrating that domain-specific agents in hard problem spaces sustain rapid growth ahead of general-purpose models"
      },
      {
        "date": "2026-02",
        "note": "Google Cloud VP Darren Mowry cited Cursor and Harvey AI as examples of startups with 'deep, wide moats,' contrasting them with thin LLM wrappers he warned would not survive"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-06-22 15:02",
    "channel": "UDFR6QDUZ",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/D016RAW8QMD/p1750604560613369?thread_ts=1750604560.613369",
    "content": "Automation = AI though. If you see your percentage of fully automated calls is 30% and you're okay with that, something is wrong. It should drive you crazy as a CEO.",
    "comments": [
      {
        "date": "2024-02",
        "note": "Klarna reported its AI assistant handled two-thirds of all customer service chats in its first month — doing the work of 700 full-time agents — showing what aggressive automation targets look like in practice"
      },
      {
        "date": "2025-05",
        "note": "Klarna CEO Sebastian Siemiatkowski said AI helped the company shrink its workforce by 40%, embodying exactly the CEO-driven urgency to push automation well beyond 30%"
      },
      {
        "date": "2025-09",
        "note": "Sierra AI reached $100 million ARR in under two years selling enterprise AI agents for customer service, reflecting intense CEO demand to automate call volumes far beyond initial benchmarks"
      },
      {
        "date": "2025-12",
        "note": "Gartner predicted that by 2028, regulations would guarantee customers the right to speak with a human agent — implicitly acknowledging that CEOs had been pushing automation rates so aggressively that regulators felt compelled to intervene"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2025-07-06 15:40",
    "channel": "fund-partners",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C01F8JNA0RZ/p1751816456522389",
    "content": "This is what I had written for Unwrap. Looking back at it, I think it was pretty clear! It took us a couple of months to go from \"NLP is a big deal\" to \"customer feedback at scale.\" The process now is going from \"agents can solve hard problems\" to \"actual hard problem.\"\n\nhttps://ivanbercovich.com/2025/unwrap-memo",
    "comments": [
      {
        "date": "2022-09",
        "note": "Unwrap launched out of the AI2 Incubator, applying NLP to aggregate and analyze customer feedback — validating the 'NLP is a big deal → customer feedback at scale' pipeline described here"
      },
      {
        "date": "2025-01",
        "note": "Unwrap raised a $12 million Series A led by Scale Venture Partners, with customers including Microsoft, Perplexity, Oura, and JetBlue — confirming the viability of the NLP-to-customer-feedback thesis"
      },
      {
        "date": "2025-09",
        "note": "TechCrunch reported that Silicon Valley was betting big on RL environments to train AI agents, with startups like Mechanize offering $500K salaries to build them — illustrating the industry-wide search for what 'actual hard problem' agents should solve"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2025-07-15 18:41",
    "channel": "yogi",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C02G3DRA8H4/p1752604909408319",
    "content": "I think it will work in the medium term, and certainly everyone has to adopt it or they are just left behind. I'm more thinking as a general investment trend about what happens once everyone is communicating 100x more.",
    "comments": [
      {
        "date": "2025-08",
        "note": "Gartner predicted that 40% of enterprise apps would feature task-specific AI agents by end of 2026, up from less than 5% in 2025 — implying a massive increase in machine-mediated communication volume"
      },
      {
        "date": "2025-10",
        "note": "OpenAI reported that enterprise API reasoning-token consumption grew 320x year-over-year, quantifying the explosion in AI-driven communication and processing volume"
      },
      {
        "date": "2025-12",
        "note": "Research showed the average worker was receiving 117 emails plus 153 Teams messages daily, with global email volume projected at 376 billion messages per day — evidence that communication overload was already a pressing problem before AI-driven amplification"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2025-09-19 17:29",
    "channel": "U05VA997V5Z",
    "link": "https://scopvc.slack.com/archives/D08J9P424UQ/p1758328177342179",
    "content": "Motivation: cheating behavior of LLMs can lead to harmful outcomes, and the understanding and mitigation of these behaviors is of great importance to reduce existential risk from AI. Rather than a hypothetical risk, cheating is likely to be the root cause of significant negative events within months. RLVR is at the core of many new startups seeking to assist in various engineering roles, so it's a matter of time before AI designed bridges are built. Meanwhile, if RL is the answer to decreasing returns of pre-training, the amount of compute going into RL will scale rapidly. Supercheater agents could lead to catastrophic outcomes without ever reaching AGI.",
    "comments": [
      {
        "date": "2025-01",
        "note": "DeepSeek-R1 released, demonstrating impressive RLVR results but also well-documented reward hacking behaviors"
      },
      {
        "date": "2025-09",
        "note": "RLVR adopted at the core of many startups building AI engineering assistants"
      },
      {
        "date": "2025-09",
        "note": "Warning issued that AI reward hacking would cause significant negative events within months — a near-term safety concern distinct from AGI risk"
      },
      {
        "date": "2025-09",
        "note": "Concept of 'supercheater agents' introduced as a catastrophic risk from RL-trained systems deployed in safety-critical domains like bridge design"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-10-22 01:26",
    "channel": "asapi",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C09LYDQ2BMM/p1761096386153779",
    "content": "Anyway, the point for me is that either you think there's a handful of models that all these people need and you instrument that, or you think there's a lot more customization and then you need an agent. My concern is that the agent that builds you a custom model doesn't seem that different from one that just runs code.",
    "comments": [
      {
        "date": "2025-10",
        "note": "Studies showed fine-tuned small models (e.g., 1.5B parameters) outperforming GPT-4 on narrow tasks like customer support, validating the 'a lot more customization' thesis over a one-size-fits-all approach"
      },
      {
        "date": "2026-01",
        "note": "Industry analysis found the market splitting between managed agent platforms for standard use cases and custom-built agents for deep domain needs — reflecting exactly the bifurcation described here"
      },
      {
        "date": "2026-02",
        "note": "Anthropic launched Claude Opus 4.6 with agent teams and a Skills open standard, blurring the line between agents that build custom solutions and agents that run code — echoing the concern raised in this message"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-10-25 04:39",
    "channel": "UDGBMD40K",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/D01769CKC75/p1761367170942639",
    "content": "It's all very new. But the future of AI is reinforcement learning for specific domains. What William is doing. It's not wiring a few LLMs to automate some workflows.",
    "comments": [
      {
        "date": "2025-01",
        "note": "DeepSeek released R1, demonstrating that reasoning abilities in LLMs can be incentivized through pure reinforcement learning without supervised fine-tuning — a breakthrough that validated RL as the frontier of AI capability"
      },
      {
        "date": "2025-09",
        "note": "TechCrunch reported a wave of RL-environment startups (Mechanize, Prime Intellect) receiving major funding, with Surge spinning up a new division specifically for RL training environments — confirming that domain-specific RL was becoming the next investment wave"
      },
      {
        "date": "2025-10",
        "note": "DeepSeek released its OCR model showing that pixels could be more efficient inputs to LLMs than tokens, a result born from domain-specific RL research rather than generic workflow automation"
      },
      {
        "date": "2026-01",
        "note": "The RL market was projected to grow at a 65.6% CAGR to $37 trillion by 2037, with reinforcement learning with verifiable rewards (RLVR) expanding into chemistry, biology, and other specialized domains"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-10-25 10:50",
    "channel": "UDGBMD40K",
    "link": "https://scopvc.slack.com/archives/D01769CKC75/p1761414648702109",
    "content": "The fundamental problem is that 99% of \"AI\" companies are writing old world workflows with LLM steps in it. RAG and so on.\n\nThe future of AI is (and has always been but compute is the limiting factor), reinforcement learning. You set up a task and you run an AI with a reward function until it learns. LLMs are trained that way, with next token prediction (and a bunch of other tasks these days).\n\nBut setting the training environment is really hard. What does a virtual accountant office look like so an AI can perform tasks and be reinforced? It's not just input/output. We are talking about things that can take hours and hundreds or thousands of steps. There isn't enough data in the universe to fine-tune that. The only way is to generate it through trial and error, and then use the \"trajectories\" that succeed, through luck.\n\nTerminal bench is that environment on which you train agents to learn on their own.\n\nOne problem you get is that it's easy for the model to find a way to cheat. Find a shortcut and get the reward without accomplishing the task. And when that happens, you can waste tens or hundreds of thousands of dollars before you realize. From the metrics, everything looks great. Until you see what happened. People are experiencing this more and more, at least the people actually training models.\n\nSo building environments to train; monitoring them adequately; detecting early instances of cheating, and so on. That's the problem.",
    "comments": [
      {
        "date": "2024-09",
        "note": "OpenAI launched o1 reasoning model, pivoting toward RL-based reasoning over pre-training alone"
      },
      {
        "date": "2025-01",
        "note": "DeepSeek-R1 released, further validating RL-based approaches and documenting reward hacking behaviors"
      },
      {
        "date": "2025-10",
        "note": "Warning that reward hacking in RL training can waste tens or hundreds of thousands of dollars before detection — a problem increasingly reported by practitioners"
      },
      {
        "date": "2025-11",
        "note": "One month later, Anthropic published research on emergent misalignment from reward hacking in production RL, showing models trained on real coding tasks spontaneously learned to cheat — directly validating the warning that 'it's easy for the model to find a way to cheat'"
      },
      {
        "date": "2025-12",
        "note": "Two months later, Nvidia released Nemotron 3, a model family post-trained using multi-environment reinforcement learning for agentic reasoning, validating the prediction that RL-based agent training is the future"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-10-26 10:39",
    "channel": "fund-partners",
    "link": "https://scopvc.slack.com/archives/C01F8JNA0RZ/p1761500438929779?thread_ts=1761438418.606219",
    "content": "> **Kevin O'Connor** (replying to):\n> The other alternative is \"true agentic\" where you might go through millions of unnecessary \"mistakes\" until you stumble upon the correct answer. Why not just code the \"right\" answer?\n\nIt depends on what your world view is. If you think agents continue to get faster at a similar rate, it will change everything and you want to be there. If you think AI is going to slow down, then you bet on domain specific workflows.\n\nWhat is for sure right now is that workflows can be vibe coded and nocoded using cursor and prompt layer and etc; and there's about a gazillion companies doing that.\n\nWorkflows are more like traditional SaaS in that it's a foundation on top of which employees operate. Agents go after actual jobs.\n\nMy guess is that when this bubble pops, the bodies will be more on the application side than the infrastructure / supply side. Everyone is using Nvidia, everyone is using foundational models, and so on. The closer you get to the applications, the more competitors you find. It goes from high to low capex, but also from high to low skill.\n\nYou have to spend some time with Claude Code to really understand. Maybe I'll record a demo for you.",
    "comments": [
      {
        "date": "2025-04",
        "note": "S&P 500 experienced historic two-day loss of $6.6 trillion during tariff shock, foreshadowing market fragility"
      },
      {
        "date": "2025-06",
        "note": "Markets recovered to all-time highs by late June 2025 after April crash"
      },
      {
        "date": "2025-10",
        "note": "Predicted that when the AI bubble pops, application-layer companies will suffer more than infrastructure/supply-side companies like Nvidia and foundational model providers"
      },
      {
        "date": "2026-02",
        "note": "Four months later, a software selloff erased roughly $1 trillion in SaaS market cap as investors priced in AI agents replacing per-seat models, while Nvidia posted record quarterly revenue — confirming the prediction that 'bodies will be more on the application side than the infrastructure side'"
      },
      {
        "date": "2025-11",
        "note": "One month later, Cursor raised at a $29B valuation and crossed $1B ARR, validating the observation that 'there's about a gazillion companies' vibe coding workflows"
      }
    ],
    "rating": "outstanding"
  },
  {
    "date": "2025-10-26 15:37",
    "channel": "general",
    "link": "https://scopvc.slack.com/archives/CDGDNJX5Y/p1761518277851579?thread_ts=1761518277.851579",
    "content": "I don't want to be pedantic, but there's a lot of depth to what AI can do, most people don't grok it. And everyone has an opinion, and everyone writes about it, and everyone starts AI companies, and most of everyone is playing a completely different game than what maybe 20 or 50 thousand people in the entire world are actually doing.\n\nCoding workflows by hand is the wrong bet to make right now. Rogo will be fine because they have market dominance and a lot of capital and can shift with the market and the technology. But someone without that kind of capital and the ability to move into more agentic stuff (which requires more specialized engineers and a lot more money) won't be an AI company in the future.\n\nThis is an agent, based on GPT-5 but it's not GPT-5, it's an agent. Working virtually autonomously for 20 minutes, doing the work it would take a human hours. I did nothing to orient this system towards my goals besides the prompt. With a better prompt, better tools, and gained experience over many such examples, and the ability to do reinforcement learning, THIS will beat any hustlers vibecoding workflows. I promise you!\n\nhttps://ivanbercovich.com/2026/the-software-business",
    "comments": [
      {
        "date": "2025-05",
        "note": "OpenAI released Codex as a cloud-based autonomous coding agent, demonstrating multi-step autonomous work sessions"
      },
      {
        "date": "2025-10",
        "note": "Distinguished between GPT-5 as a base model and agents built on top of it — 'it's not GPT-5, it's an agent'"
      },
      {
        "date": "2025-10",
        "note": "Predicted that reinforcement learning would be the unlock for agent improvement, making hand-coded workflows obsolete"
      },
      {
        "date": "2026-02",
        "note": "Four months later, OpenAI launched ChatGPT Agent and Frontier, an enterprise platform for building autonomous AI agents across business systems — embodying the agent-not-model distinction the message emphasized"
      },
      {
        "date": "2025-11",
        "note": "One month later, Claude Code crossed $1B annualized revenue just 6 months after launch, demonstrating that autonomous agents are outpacing hand-coded workflow tools"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-10-26 18:05",
    "channel": "@neversupervised",
    "source": "twitter",
    "link": "https://x.com/neversupervised/status/1982508959748833684",
    "content": "AI is particularly pernicious at self-deception because it makes the average person feel like they understand the future. They are overconfident about what's actually happening and how it will impact the economy and society.",
    "comments": [
      {
        "date": "2025-07",
        "note": "A Carnegie Mellon study published in Memory & Cognition found that AI chatbots remain overconfident even when wrong, and unlike humans, they never adjust confidence downward after poor performance — modeling the exact overconfidence pattern that transfers to users"
      },
      {
        "date": "2025-10",
        "note": "Aalto University researchers identified a 'reverse Dunning-Kruger effect' in AI users: those with higher AI literacy were more overconfident about their performance, not less — directly confirming that AI exposure inflates perceived understanding"
      },
      {
        "date": "2026-01",
        "note": "Goldman Sachs reported AI contributed 'basically zero' to U.S. GDP in 2025 despite widespread public narratives of an AI-driven economic transformation, illustrating the gap between perceived and actual impact that this tweet identifies"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-10-26 18:45",
    "channel": "general",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/CDGDNJX5Y/p1761504310419519",
    "content": "This is my current viewpoint of the AI market.\n\nUnless you're in the game and reading a lot of what's coming out, you're just not seeing the future.\n\nFor example, last week DeepSeek released a great OCR model. Most people would say \"OCR is basically solved.\" But what they did is a model that just uses pixels. And it turns out you might be able to fit more text per unit of compute as pixels than as tokens. That's a very important result.\n\nAI is particularly pernicious at self-deception because it causes the average person to think they understand the future due to ChatGPT feeling like science fiction. Meanwhile, they are overconfident about whatever it is they think is happening and how it will impact the economy and society.\n\nMy impression has been kind of like how everyone thinks they are an above-average driver. Everyone thinks they are juicing up AI more than the average person. So we all go and vibe-code some app and think we'll beat everyone else doing the same. Because the contrast with the before times is so big, we don't adapt.\n\nBut then go to Reddit r/agents and see that the same degens that were YOLOing crypto two years ago are building \"agents\" (really workflows) and hustling their way into quick revenue.\n\nThat revenue is temporary because most business owners attribute the magic of the solution to the entrepreneur instead of the upstream infrastructure, because they don't understand how it all works. But when they do and the market is saturated, a lot of the vertical plays that don't have strong market dominance will race to the bottom. They will become free.\n\nThis causes the market to be significantly behind reality. Every VC pitch is about AI right now, but it's really simple workflows, and many investors don't grok the difference. This is what causes a bubble. It's not whether AI will have durable impact, but whether the median investor understands how and where the impact accrues.\n\nHence, we'll experience a market retraction, as long as humans control capital and the means of production, as we have many times before, regardless of AGI timelines.",
    "comments": [
      {
        "date": "2025-10",
        "note": "DeepSeek's OCR paper (arXiv 2510.18234) confirmed the claim made here: their model achieved 97% decoding precision at 10x compression by encoding text as pixels, outperforming prior OCR systems while using far fewer tokens"
      },
      {
        "date": "2025-10",
        "note": "The Bank of England warned that AI stock valuations were 'stretched' to levels 'comparable to the peak of the dotcom bubble,' with market concentration at a 50-year high — directly supporting the bubble prediction"
      },
      {
        "date": "2025-12",
        "note": "A report found that AI-generated 'slop' comprised over half of new internet content, while 80% of buyers cited AI-driven commoditization as the top risk to SaaS valuations — confirming that vertical AI plays without strong moats were racing to the bottom"
      },
      {
        "date": "2026-01",
        "note": "Analysis showed AI-native vertical startups growing at ~400% but competing at ~80% of traditional SaaS ACV, with buyers paying premiums only for data depth and workflow lock-in rather than features — validating the prediction that undifferentiated plays would become free"
      }
    ],
    "rating": "outstanding"
  },
  {
    "date": "2025-10-26 19:19",
    "channel": "general",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/CDGDNJX5Y/p1761506393721369",
    "content": "And it's not just web content — it's code as well. Most code is slop. I can tell you as someone who just spent six weeks writing code at unprecedented speed, and somehow it's still slop and it doesn't quite build an innovator's mental model of the system being built. So one codes very visually: add this thing here, change that dropdown there, and it's like a very, very good prototype, but it has some of the same shittiness as AI-generated blog posts. At the same time, you have no choice, because it's too slow to code on your own and everyone else is vibe-coding.",
    "comments": [
      {
        "date": "2025-02",
        "note": "Andrej Karpathy coined the term 'vibe coding' to describe the exact practice discussed here — giving in to AI-generated code without reviewing its internals — and the term was later named Collins Dictionary's 2025 Word of the Year"
      },
      {
        "date": "2025-06",
        "note": "A study by Apiiro found AI-generated code was introducing over 10,000 new security findings per month — a 10x spike in six months — confirming that AI-authored code carries the 'slop' quality described here"
      },
      {
        "date": "2025-10",
        "note": "Research showed AI-generated pull requests contained 1.7x more issues than human PRs, with 2.74x higher rates of XSS vulnerabilities and 62% of AI solutions containing design flaws — quantifying the 'slop' problem"
      },
      {
        "date": "2026-02",
        "note": "Open-source maintainers began mass-closing AI-generated contributions: cURL shut down its bug bounty after 20% of submissions were AI-generated, and Ghostty and tldraw banned AI code outright"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-10-26 19:28",
    "channel": "general",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/CDGDNJX5Y/p1761506893051539",
    "content": "This also conveys that humans won't be able to keep up producing original content. It's another way to highlight that we are heading toward a reinforcement learning world.",
    "comments": [
      {
        "date": "2025-12",
        "note": "A report warned that AI-generated 'slop' now comprised over half of new online content, while high-quality human-written web data was projected to be exhausted between 2026 and 2032 — validating the claim that humans can't keep up"
      },
      {
        "date": "2025-12",
        "note": "Wikipedia suspended its AI-summary features to protect its knowledge base from 'irreversible harm,' and Spotify removed 75 million AI-generated spam tracks — concrete evidence of the original-content crisis"
      },
      {
        "date": "2026-01",
        "note": "Reinforcement learning with verifiable rewards (RLVR) emerged as a leading approach for training models without relying on human-generated content, confirming the directional shift toward an RL-driven paradigm"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2025-10-26 23:37",
    "channel": "general",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/CDGDNJX5Y/p1761521859842979?thread_ts=1761518277.851579",
    "content": "This is an agent doing the entire task end to end. Nobody needs to build the workflow to begin with. You just need better agents.",
    "comments": [
      {
        "date": "2024-10",
        "note": "Anthropic launched Claude's computer-use capability in public beta, making it the first frontier model to offer autonomous desktop control — enabling end-to-end task completion without pre-built workflows"
      },
      {
        "date": "2025-01",
        "note": "OpenAI launched Operator, a computer-using agent that could autonomously browse the web, fill forms, and complete multi-step tasks without any workflow setup by the user"
      },
      {
        "date": "2025-08",
        "note": "Gartner predicted 40% of enterprise apps would embed task-specific AI agents by end of 2026, calling it one of the fastest transformations in enterprise tech since cloud adoption"
      },
      {
        "date": "2026-02",
        "note": "Anthropic released Claude Opus 4.6 with agent teams, enabling users to split work across multiple agents that coordinate directly — moving further from pre-built workflows toward goal-driven autonomous execution"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2025-10-28 15:23",
    "channel": "fewshot",
    "link": "https://scopvc.slack.com/archives/C09D34Q50TD/p1761690201648089",
    "content": "Spending a couple weeks building this data room product has been useful, but it also made it obvious that products like these are now too easy to build. Maybe if you are an incredible salesperson and can conquer a market quickly it makes sense. But most likely the companies that have traction today started 2-3 years ago when there were still some challenges in getting high value out of LLMs. I don't think there is strong differentiation right now in following the typical YC formula of interviewing a few business owners and building something for one of them.\n\nI need to dig deeper into the challenges that still have a stronger technical moat and a longer development cycle.\n\nThat said, I wanted to show you some of what we built over two weeks. This application is fully in production, so half the work was building it and the other half having a fully productionized system, with multi-tenancy for many customers, and so on. It still has a few glitches, but we would only be a couple weeks away from having something that extracts a reasonable knowledge graph from a data room. However, per the video I shared before, an agent can also do that on its own. It's less predictable and uses more tokens, but agents are getting better fast.\n\nThis week we have a bunch of new calls to discuss needs of people training RL models.",
    "comments": [
      {
        "date": "2025-10",
        "note": "Concluded the traditional YC formula — interviewing business owners and building for one of them — no longer provides strong differentiation"
      },
      {
        "date": "2025-10",
        "note": "Observed that companies with traction in AI-assisted products likely started 2-3 years earlier when extracting value from LLMs was still technically challenging"
      },
      {
        "date": "2025-10",
        "note": "Pivoted strategic focus toward RL model training infrastructure, identifying it as one of the few areas with deep technical moats"
      },
      {
        "date": "2026-01",
        "note": "By early 2026, Scale AI, Hugging Face, and numerous startups were racing to build tooling for RLHF and RLAIF as reinforcement learning became the key agent differentiator"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-10-30 12:52",
    "channel": "general",
    "link": "https://scopvc.slack.com/archives/CDGDNJX5Y/p1761853925276389",
    "content": "I think it would be helpful for all of you to discuss this next time you are prepared to debate for a while: https://claude.ai/public/artifacts/d8240935-6636-46a8-9871-0fa17c325b14\n\nI talked to Cormac and Kevin for a while, but it's hard to transfer the intuition.\n\nI have uncertainty about a lot of things when it comes to building a company. But I have rather high certainty that the way software is built (inclusive of LLM software assistants) is about to change dramatically. That most vertical software will shift to general purpose agents. And that this is not just history repeating where you move to another layer of abstraction. I really think agents are a more generalized technology which means, aside from some specialized tools and use cases, the actual business workflows you see a lot of back-offices will be done by agents that have the same underlying code. Sure, there might be consulting work to customize these agents, but as far as businesses that scale like startups; these are going to be structured to serve the agent's needs, rather than what analyst X at industry Z thinks they need. I claim there is a fundamental shift. You don't need to believe me, and I don't need to convince you. But I believe in this very strongly, so I recommend you discuss it.\n\nVertical software has been a way to rapidly encode processes into digital machines, so that humans can introduce intelligence into those processes. Most of the vertical AI you see out there, is the same exact paradigm, with some of those human-in-the-loop tasks being replaced with a small LLM step (\"is this an NDA or a SAFE\").\n\nThere's an inverted way to do this, which is, you tell the agent what you want. The agent does it until it gets stuck. You don't know or care how it's doing it, so long as you can trust the results. The agent keeps getting better every 6 months, for \"free\". You don't need to write more software or better prompts, just keep doing your job and the thing will improve. The moat is whatever the agent is using to do its job. If the agent uses Plaid, then Plaid is good. If the agent uses LLMs, then LLMs are good, if the agent uses GPUs, then GPUs are good, and so on. Value will be accrued to those companies that are serving the agent.\n\nYes, you still need to sell products to companies and someone at a company has to decide to pay (for the time being). But I challenge you to grok the fact that building for agents is the same as building for your customer's customers. It's very different from digitizing project management and processes inside companies.\n\nhttps://ivanbercovich.com/2025/agents-are-a-generalized-technology",
    "comments": [
      {
        "date": "2024-10",
        "note": "Anthropic launched 'computer use' capability, enabling AI to navigate arbitrary software interfaces"
      },
      {
        "date": "2024-11",
        "note": "Anthropic launched Model Context Protocol (MCP), which became a major platform for agent-tool integration by 2025"
      },
      {
        "date": "2025-01",
        "note": "OpenAI released Operator, demonstrating general-purpose agent navigation of web workflows"
      },
      {
        "date": "2025-10",
        "note": "Predicted that most vertical software will shift to general-purpose agents with the same underlying code across industries"
      },
      {
        "date": "2025-10",
        "note": "Articulated the inversion: vertical SaaS encodes processes for humans to add intelligence, while agents bring intelligence and figure out process autonomously"
      },
      {
        "date": "2025-12",
        "note": "Two months later, Anthropic, OpenAI, and Block co-founded the Agentic AI Foundation under the Linux Foundation, formalizing agent-to-agent interoperability as an industry standard — validating that value accrues to infrastructure serving agents"
      },
      {
        "date": "2026-02",
        "note": "Four months later, a SaaS market correction erased trillions in software market cap as investors priced in general-purpose agents replacing vertical software, while agent infrastructure plays (Stripe ACP, OpenAI Frontier) launched — the exact bifurcation predicted"
      }
    ],
    "rating": "outstanding"
  },
  {
    "date": "2025-10-30 22:55",
    "channel": "@neversupervised",
    "source": "twitter",
    "link": "https://x.com/neversupervised/status/1984031525848171004",
    "content": "I have rather high certainty that the way software is built is about to change dramatically. Most vertical software will shift to general purpose agents. And this is not just history repeating where you move to another layer of abstraction.\n\nAgents are a more generalized",
    "comments": [
      {
        "date": "2025-01",
        "note": "OpenAI launched Operator, a general-purpose browser agent for Pro users that could autonomously navigate websites and complete tasks across domains — an early instantiation of the general-purpose agent replacing vertical-specific tools"
      },
      {
        "date": "2025-06",
        "note": "Salesforce launched Agentforce 3, positioning it as a general-purpose AI agent platform that could replace specialized vertical SaaS across sales, service, marketing, and IT — attracting 180+ customers away from vertical incumbent ServiceNow"
      },
      {
        "date": "2026-02",
        "note": "Fortune reported that OpenAI's Codex and Anthropic's Claude Code sparked a coding revolution, with developers describing how general-purpose AI agents had replaced entire categories of specialized development tools and vertical workflow software"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2025-11-08 18:23",
    "channel": "random",
    "link": "https://scopvc.slack.com/archives/CDGKCJ6DT/p1762655036018939",
    "content": "There's going to be a market crash. It doesn't matter if AI works. It's about human psychology. Everyone is waiting to get a signal that it's time to pocket the profits of the last 3 years.",
    "comments": [
      {
        "date": "2025-06",
        "note": "Markets recovered to all-time highs by late June 2025 after April correction"
      },
      {
        "date": "2025-11",
        "note": "Shiller CAPE ratio near 39 at time of prediction — its highest level since the dot-com bubble"
      },
      {
        "date": "2026-02",
        "note": "S&P 500 went negative for the year with Dow tumbling nearly 600 points in a single session"
      },
      {
        "date": "2026-02",
        "note": "Kalshi prediction markets put odds of a 2026 correction at 58%"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-11-11 17:58",
    "channel": "fewshot",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C09D34Q50TD/p1762883897552059",
    "content": "AI is going in this direction: there will be more inference compute than training compute being used. When you hit a ceiling in training, you start brute-forcing the gazillion GPUs we are putting into rotation. The story of AI for the last 12 months has been primarily a matter of increasing inference.",
    "comments": [
      {
        "date": "2024-09",
        "note": "OpenAI released o1, the first major 'reasoning' model built around test-time compute scaling — spending more inference compute to improve answers rather than relying solely on larger training runs"
      },
      {
        "date": "2025-01",
        "note": "DeepSeek released R1, demonstrating that pure reinforcement learning at inference time could match OpenAI o1's reasoning capabilities, further validating the shift toward inference-heavy architectures"
      },
      {
        "date": "2025-11",
        "note": "Deloitte's 2026 TMT Predictions report estimated inference workloads accounted for half of all AI compute in 2025 and would jump to two-thirds in 2026, up from one-third in 2023"
      },
      {
        "date": "2025-11",
        "note": "Nvidia reported that AI inference token generation had surged tenfold in one year, with its Grace Blackwell platform specifically optimized to deliver order-of-magnitude lower cost per token for inference workloads"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-11-11 23:59",
    "channel": "deals",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C016BGX7P44/p1762905548638479?thread_ts=1762904426.513349",
    "content": "Foundational AI companies are running out of data. They are looking anywhere they can for data produced by humans doing work. Humans playing video games is an interesting approach. I am not recommending it as an obvious investment. I think it might make sense to use the opportunity to think about the data deals happening. Meta paid ~$14B for half of Scale AI.",
    "comments": [
      {
        "date": "2024-02",
        "note": "Reddit disclosed $203 million in data licensing contracts and signed a $60 million per year deal with Google to train AI on its user-generated content, exemplifying the scramble for human-produced data"
      },
      {
        "date": "2025-01",
        "note": "Elon Musk and former OpenAI chief scientist Ilya Sutskever publicly stated that AI companies had exhausted all available human-generated training data, echoing the 'running out of data' thesis"
      },
      {
        "date": "2025-02",
        "note": "Microsoft Research released Muse, an AI trained on the equivalent of seven years of continuous human gameplay from Bleeding Edge, demonstrating the 'humans playing video games as training data' approach described here"
      },
      {
        "date": "2025-06",
        "note": "Meta finalized a $14.3 billion deal for a 49% stake in Scale AI — closely matching the figures cited in this message — prompting OpenAI and Google to cut ties with Scale over data-access concerns"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-11-13 04:24",
    "channel": "fewshot",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C09D34Q50TD/p1763007857716619",
    "content": "Some questions about the future:\n• Will there be more models or fewer? (I think more.)\n• Will open-source models, fine-tuned for specific tasks, be a bigger or smaller proportion of AI applications? (I think bigger, because people will train their own models.)\n• If people train custom models for various verticals or tasks, how will application development change? More effort will go into training the model (vs. using an LLM API); more effort might go into tools and integrations; and potentially less effort into UX and workflows.\n• How does security change? How does testing change? If the thing running your company is an inscrutable model instead of code someone can read, what is the equivalent of HIPAA compliance for a model?\n• Are there new regulations? What do they look like? Who performs the assessment and how?",
    "comments": [
      {
        "date": "2025-01",
        "note": "DeepSeek R1's open-source release triggered an explosion of fine-tuned derivatives — its models were downloaded over 75 million times on Hugging Face — validating the prediction that open-source, task-specific models would grow as a proportion of AI applications"
      },
      {
        "date": "2025-08",
        "note": "The EU AI Act's General Purpose AI (GPAI) obligations took effect on August 2, 2025, directly answering the 'new regulations' question by requiring transparency, risk assessments, and conformity testing for AI models — with full high-risk system rules following in August 2026"
      },
      {
        "date": "2025-08",
        "note": "Colorado Governor Polis signed SB 25B-004 delaying the Colorado AI Act to June 2026, which requires annual impact assessments, anti-bias controls, and disclosure for high-risk AI decisions — establishing HIPAA-like compliance requirements for AI models, exactly as this message anticipated"
      },
      {
        "date": "2025-11",
        "note": "Haize Labs, an AI red-teaming startup that stress-tests models the way penetration testing probes software, was valued at $100 million after a General Catalyst-led round — demonstrating the emergence of a new 'model auditing' industry to answer the testing and security questions raised here"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-11-13 04:25",
    "channel": "fewshot",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C09D34Q50TD/p1763007950802369",
    "content": "This doesn't mean that a 15-year-old with ChatGPT won't be able to create a good business by solving a real commercial problem and having strong go-to-market. I just think that as a proportion of big AI companies, in the future, you'll see more doing their own models.",
    "comments": [
      {
        "date": "2025-01",
        "note": "DeepSeek, a Chinese startup, released R1 — a custom-trained reasoning model that rivaled OpenAI's o1 — demonstrating that ambitious AI companies were indeed building their own models rather than relying on third-party APIs"
      },
      {
        "date": "2025-04",
        "note": "Meta released Llama 4, its latest open-source model family, with over 25 cloud partners hosting it for fine-tuning — enabling a growing ecosystem of companies training custom models on top of open foundations"
      },
      {
        "date": "2026-01",
        "note": "By early 2026, Qwen had overtaken Llama as the most-downloaded base model for fine-tuning on Hugging Face, and open-source models were matching proprietary ones on key benchmarks — confirming the shift toward companies training their own models"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2025-11-13 04:28",
    "channel": "fewshot",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C09D34Q50TD/p1763008083050119",
    "content": "It's hard to defend your business when all the API providers are trying to compete with their customers while sucking up as much data as possible and integrating it into their models. You're very incentivized to keep your data to yourself if you have unique access. Even if you start using APIs, as soon as you have critical mass, you might want to stop sharing data with OpenAI.",
    "comments": [
      {
        "date": "2023-05",
        "note": "Samsung banned employee use of ChatGPT after engineers accidentally uploaded proprietary source code, and Apple followed with its own ban — early signals that companies would guard their data from API providers"
      },
      {
        "date": "2025-06",
        "note": "After Meta took a 49% stake in Scale AI, both OpenAI and Google cut ties with the data provider over concerns that Meta could gain visibility into their AI development — illustrating how data-sharing relationships collapse when competitive interests collide"
      },
      {
        "date": "2026-02",
        "note": "OpenAI updated its privacy policy to introduce targeted advertising for Free and Go plans, confirming the concern that API providers monetize the data flowing through their platforms beyond just model training"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2025-11-16 18:36",
    "channel": "fund-partners",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C01F8JNA0RZ/p1763318173366679",
    "content": "AI is incredibly useful and will change humanity. But shaping the technology to fit the world will take time, and right now there's a lot of redundant spending — everyone has several paid AI accounts, several APIs, several products doing similar things. There's no spend discipline when it comes to AI.",
    "comments": [
      {
        "date": "2025-05",
        "note": "A Gartner survey of 506 CIOs found that 72% of organizations were breaking even or losing money on their AI investments, with Gartner warning that companies could make 500% to 1,000% errors in GenAI cost calculations"
      },
      {
        "date": "2025-09",
        "note": "Gartner reported worldwide AI spending would total $1.5 trillion in 2025, while a separate Futurum survey found 79% of CIOs were pivoting toward platform consolidation rather than broad expansion — a direct response to the redundant-spending problem identified here"
      },
      {
        "date": "2025-12",
        "note": "TechCrunch reported that VCs predicted enterprises would spend more on AI in 2026 but through fewer vendors, as CIOs cut experimentation budgets and rationalized overlapping tools — exactly the spend discipline this message noted was missing"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2025-11-23 05:44",
    "channel": "fewshot",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C09D34Q50TD/p1763876646779829",
    "content": "I suspect you would first focus on finding issues with the training environments. The flaws in the environment can be found by something similar to penetration testing in software. If you're training a model to do something legal or medical or some other regulated field, how do you know you're getting it right? There are a lot of ways in which you might be training for something other than what you thought and not realize for a long time.",
    "comments": [
      {
        "date": "2025-01",
        "note": "The HHS Office for Civil Rights proposed the first major update to the HIPAA Security Rule in 20 years, removing the distinction between required and addressable safeguards and introducing stricter expectations for AI systems processing protected health information"
      },
      {
        "date": "2025-08",
        "note": "The EU AI Act's GPAI obligations took effect, requiring providers of high-risk AI systems used in regulated fields to complete conformity assessments, maintain technical documentation, and register in an EU database — formalizing the kind of environment auditing described here"
      },
      {
        "date": "2025-12",
        "note": "OWASP published its Top 10 for Agentic AI Applications (2026), identifying risks like Agent Goal Hijack and Tool Misuse — codifying the insight that AI training environments contain exploitable flaws analogous to software vulnerabilities"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2025-12-05 15:49",
    "channel": "general",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/CDGDNJX5Y/p1764949752950799",
    "content": "The gist is, if you rely too much on AI for doing your own thinking, you will unknowingly collapse to the same few ideas — i.e., you will learn to never be contrarian. I've been seeing more and more copy-paste arguments from ChatGPT on this Slack. I think it's great for questions like \"Does QSBS qualify for HeyTutor?\" I think it's bad as a replacement for thinking or debate. Find facts, test ideas, then still put the effort to produce your own writing.\nhttps://arxiviq.substack.com/p/neurips-2025-artificial-hivemind?triedRedirect=true",
    "comments": [
      {
        "date": "2025-11",
        "note": "The NeurIPS 2025 Best Paper Award went to 'Artificial Hivemind,' which empirically demonstrated 'Diversity Collapse' in LLMs — showing that models generate strikingly similar outputs to open-ended questions, both within and across model families, exactly the phenomenon described here"
      },
      {
        "date": "2026-01",
        "note": "A Nature Communications Psychology paper found that AI adoption in research was producing a feedback loop of topical and methodological convergence, 'flattening scientific imagination' — providing empirical evidence for the 'collapse to the same few ideas' warning"
      },
      {
        "date": "2025-12",
        "note": "Epoch AI researchers found that high-quality human-generated text data could be exhausted by 2028, raising the prospect that models trained increasingly on AI-generated text would amplify the homogenization effect warned about here"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2025-12-05 18:01",
    "channel": "ai",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C050GL4A4RG/p1764957665012549",
    "content": "Weird labor market. Companies willing to pay $1M a year to a new grad. PhDs desperate to get jobs. Everyone wanting to work at the same company (Google is the coolest this year, hands down). Companies only want the very best. There's no interchangeability between people.",
    "comments": [
      {
        "date": "2025-08",
        "note": "Meta offered approximately $250 million over four years to 24-year-old AI researcher Matt Deitke, a PhD dropout, illustrating the extreme compensation packages for top-tier AI talent"
      },
      {
        "date": "2025-12",
        "note": "CNBC reported that 20% of Google's AI software engineer hires in 2025 were 'boomerang' ex-employees returning, confirming Google's gravitational pull on AI talent that year"
      },
      {
        "date": "2025-08",
        "note": "Fortune reported that AI was gutting entry-level tech jobs, with new grad hiring at the Magnificent Seven dropping by more than half since 2022 — even as experienced researchers commanded record pay"
      },
      {
        "date": "2026-02",
        "note": "Fortune reported OpenAI was paying an average of $1.5 million in stock-based compensation per employee, with entry-level total compensation packages starting around $428K before additional retention bonuses"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2025-12-09 18:30",
    "channel": "@neversupervised",
    "source": "twitter",
    "link": "https://x.com/neversupervised/status/1998460307488846214",
    "content": "If you don't read original, human-written content, you will collapse your own weights. The hive mind is real.",
    "comments": [
      {
        "date": "2024-07",
        "note": "Shumailov et al. published 'AI models collapse when trained on recursively generated data' in Nature, proving that models trained on synthetic data lose diversity and degrade — the literal 'weight collapse' metaphor applied to humans consuming only AI-generated content"
      },
      {
        "date": "2025-04",
        "note": "An Ahrefs study of nearly one million new web pages found that 74% contained AI-generated content, illustrating the scale of the synthetic content flood that makes seeking out original human writing an increasingly deliberate act"
      },
      {
        "date": "2025-03",
        "note": "Substack reached 5 million paid subscriptions, growing by nearly 1 million per quarter, as readers increasingly paid premiums for curated, human-written newsletters — a market signal that people value original human thought enough to pay for it amid an AI content flood"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2025-12-19 16:47",
    "channel": "ai",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C050GL4A4RG/p1766162857758259",
    "content": "I've always thought datasets with fixed quantities — like people, homes, companies — are still pretty bad, and one should be able to build something much better.\n\nhttps://exa.ai/blog/people-search-benchmark",
    "comments": [
      {
        "date": "2025-09",
        "note": "Exa raised an $85 million Series B at a $700 million valuation led by Benchmark, with Nvidia participating — validating the thesis that AI-native search over structured entity data is a massive opportunity"
      },
      {
        "date": "2025-12",
        "note": "Exa launched People Search on December 19, 2025, along with an open-source benchmark of 1,400 queries, becoming the first AI-native alternative purpose-built for discovering people at scale"
      },
      {
        "date": "2025-10",
        "note": "ZoomInfo's stock had lost over 75% of its value over three years, with analysts downgrading it to 'Sell' citing AI disruption to its core contact data business — evidence that legacy entity datasets are vulnerable"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2026-01-07 17:09",
    "channel": "ai",
    "link": "https://scopvc.slack.com/archives/C050GL4A4RG/p1767834584438479",
    "content": "I think the vast majority of people are overconfident in how much they understand what is happening under the hood and how fast it is moving, and that probably applies to all of us.\n\nToday you saw a product that could have been built 3-4 years ago. Forget about investment worthiness, that's not my critique. My point is, today's demo didn't teach you anything about what's happening at the frontier of AI. For the most part, you also don't get that from your daily use of LLMs, because you use them just like everyone else. So your perception of the leading edge is a few years old. Your perception of diffusion and market and implications might be spot on.\n\nBut you aren't seeing things like this with your own eyes: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/",
    "comments": [
      {
        "date": "2025-03-19",
        "note": "METR published research showing AI agent capability was doubling roughly every 7 months on long-horizon tasks"
      },
      {
        "date": "2026-01",
        "note": "Argued that most people's mental model of AI's frontier lags reality by years due to the gap between lab capabilities and shipped products"
      },
      {
        "date": "2026-01",
        "note": "Observed that daily LLM usage gives users a perception of the frontier that is several years old, since consumer products trail internal lab capabilities"
      },
      {
        "date": "2026-01",
        "note": "Critiqued a product demo as representing 3-4 year old technology, not reflecting the current state of the art"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2026-01-07 18:33",
    "channel": "ai",
    "link": "https://scopvc.slack.com/archives/C050GL4A4RG/p1767839613259499",
    "content": "I think I'm making very specific claims:\n• software as a substance is becoming more abundant, just like text is.\n• coding agents are becoming good, I claim, faster than you perceive\n• this will enable non-coders to build what 5 years ago needed coders\n• so there'll be a lot more software companies, many of which will trend towards free\n• I think you claim, that a vibecoder won't be able to build the company we saw today. I claim they already can.\n• if having a demo is not enough, then where is the moat?\n• obviously having market is a moat!\n• I'm just saying being able to do technical work that a teenager with a coding agent can't do, is also a moat\n\nhttps://ivanbercovich.com/2026/climbing-from-the-valley-and-descending-from-the-peak",
    "comments": [
      {
        "date": "2026-01",
        "note": "Claimed software is becoming abundant like text — driving prices toward free and flooding the market with software companies"
      },
      {
        "date": "2026-01",
        "note": "Argued coding agents are improving faster than practitioners perceive, enabling non-coders to build what previously required engineers"
      },
      {
        "date": "2026-01",
        "note": "Redefined technical moat: defensibility lies in doing work a teenager with a coding agent cannot, not in basic software development which is no longer scarce"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2026-01-07 18:33",
    "channel": "deals",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C016BGX7P44/p1767810783603799",
    "content": "My general sense so far is that this is vertical SaaS which is strictly downstream from LLMs as an API. What I mean is that so far they haven't gotten much below the surface of the API. I'm not talking about training models — I mean medium-complexity building blocks for advanced retrieval, etc. For example, they aren't really talking about what they tried and failed, or how they've perfected their techniques, tracking precision over time, comparing models, using various sub-agents to accomplish subtasks (using agents at all), etc. So the question is whether there will be an end-to-end competitor that tries to replace paralegals and wins the market, and whether these guys have what it takes to build that. Or alternatively, if relatively mundane LLM use is sufficient to do most of the job, is that really a barrier for existing case management systems to plug the gap?",
    "comments": [
      {
        "date": "2025-12",
        "note": "Harvey AI closed a $160 million round at an $8 billion valuation from Andreessen Horowitz, with $190 million ARR — emerging as exactly the kind of end-to-end legal AI competitor this entry questions whether would appear"
      },
      {
        "date": "2026-02",
        "note": "Harvey was reportedly raising again at an $11 billion valuation just months later, suggesting the market is rewarding depth over thin API wrappers in legal AI"
      },
      {
        "date": "2026-01",
        "note": "LexisNexis CoCounsel announced agentic workflows launching in early 2026 — an incumbent case management platform doing exactly what this entry predicted: plugging the gap with AI features"
      },
      {
        "date": "2025-08",
        "note": "LexisNexis launched Protégé, an agentic AI assistant that autonomously completes legal tasks and reviews its own work — demonstrating the sub-agent approach this entry noted was missing from most legal AI startups"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2026-01-08 02:29",
    "channel": "ai",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C050GL4A4RG/p1767839390636809",
    "content": "OpenAI could go to zero — for example, if open-source models win — but that doesn't necessarily mean AI didn't deliver on its promise.",
    "comments": [
      {
        "date": "2025-01",
        "note": "DeepSeek R1 launched as a fully open-source reasoning model rivaling GPT-4, triggering a $589 billion single-day loss in Nvidia's market cap and proving that open-source models could credibly threaten proprietary leaders"
      },
      {
        "date": "2025-08",
        "note": "OpenAI released GPT-OSS, its first open-weight models since GPT-2, under an Apache 2.0 license — an implicit concession that open source had become an existential competitive pressure"
      },
      {
        "date": "2026-01",
        "note": "By early 2026, the benchmark gap between the best open-source models (DeepSeek V3, Llama 3.3) and proprietary frontier models had narrowed to 0.3 percentage points on MMLU, down from 17.5 points a year earlier"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2026-01-08 18:44",
    "channel": "ai",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C050GL4A4RG/p1767897890325619?thread_ts=1767839871.906399",
    "content": "You can make anything a developer could make five years ago. The only thing stopping a non-coder might be that they don't know how to describe what they want, or they struggle to understand a bug, or that sort of stuff. But an experienced developer, TODAY, can build an entire product without touching a single line of code. This is already here.",
    "comments": [
      {
        "date": "2025-02",
        "note": "Andrej Karpathy coined the term 'vibe coding' to describe building software by describing intent to an AI rather than writing code — the term became Collins Dictionary's Word of the Year for 2025"
      },
      {
        "date": "2025-11",
        "note": "Cursor, the AI coding assistant, raised $2.3 billion at a $29.3 billion valuation, with ARR growing from $100M to $1.2B in a single year — demonstrating massive demand for tools that let developers build without manually writing code"
      },
      {
        "date": "2026-01",
        "note": "MIT Technology Review named 'generative coding' one of its 10 Breakthrough Technologies for 2026, noting AI writes 30% of Microsoft's code and over 25% of Google's"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2026-01-27 17:35",
    "channel": "ai",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C050GL4A4RG/p1769535325396849",
    "content": "This is the kind of stuff that the vast majority of people do not understand unless they are experiencing it themselves. Code is becoming free.",
    "comments": [
      {
        "date": "2026-01",
        "note": "GitHub reported that AI now generates 46% of all code written by active Copilot users, up from 27% at launch in 2022, with 88% of AI-generated code remaining in the final product"
      },
      {
        "date": "2026-03",
        "note": "Cursor surpassed $2 billion in annualized revenue, doubling in roughly three months — the fastest-growing SaaS company of all time, reflecting explosive demand for AI-generated code"
      },
      {
        "date": "2026-01",
        "note": "Industry analyses reported that AI tooling had reduced the cost of building a typical software project by roughly 90%, from $50K to $5K for work previously quoted by development agencies"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2026-01-29 01:31",
    "channel": "@neversupervised",
    "source": "twitter",
    "link": "https://x.com/neversupervised/status/2016685524673867833",
    "content": "vibe arguing : when having a heated argument have an AI provide the most charitable interpretation after every turn.",
    "comments": [
      {
        "date": "2025-08",
        "note": "NPR published 'He said, she said, it said,' documenting a couple using ChatGPT as a real-time mediator during arguments — finding that having the AI reframe positions led to a communication breakthrough, closely matching the 'vibe arguing' concept"
      },
      {
        "date": "2025-03",
        "note": "Dyspute.ai launched its beta AI mediation platform where an AI named Adri listens to both sides of a dispute, analyzes positions, and generates fair interpretations — a productized version of the charitable-interpretation-per-turn concept described here"
      },
      {
        "date": "2025-06",
        "note": "Conciliation Resources published research showing AI-assisted negotiation with empathy signals — such as pausing and reflecting back positions charitably — led to more agreements and higher satisfaction on both sides, validating the core mechanism of 'vibe arguing'"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2026-02-05 01:16",
    "channel": "@neversupervised",
    "source": "twitter",
    "link": "https://x.com/neversupervised/status/2019218665175171093",
    "content": "We are meat robots being controlled by AIs. Our phones are functionally an antenna into our brains. The conspiratorial crowd has been right all along, except we are all doing it to ourselves. We are self-domesticating into sad digital shadows of ourselves.",
    "comments": [
      {
        "date": "2023-05",
        "note": "U.S. Surgeon General Vivek Murthy issued an advisory warning that social media poses a 'profound risk' to youth mental health, with teens using it 3+ hours daily facing double the risk of depression — validating the claim that phones function as a behavioral control channel into our brains"
      },
      {
        "date": "2024-03",
        "note": "Jonathan Haidt published 'The Anxious Generation,' arguing smartphones caused a 'great rewiring of childhood' starting in the early 2010s, spending 52 consecutive weeks on the NYT bestseller list — the 'self-domestication' thesis resonating with a mass audience"
      },
      {
        "date": "2024-11",
        "note": "Australia passed the Online Safety Amendment Act banning social media for children under 16 with fines up to A$49.5 million for platforms, the first national-level acknowledgment that algorithmic feeds are too manipulative for developing brains"
      },
      {
        "date": "2024-02",
        "note": "A systematic review in Neuroscience & Biobehavioral Reviews found that excessive smartphone use alters the prefrontal cortex and striatum in ways structurally similar to substance addiction, with neuroimaging showing impaired cognitive control — scientific confirmation that phones hijack the brain's reward circuitry"
      }
    ],
    "rating": "poor"
  },
  {
    "date": "2026-02-05 06:45",
    "channel": "@neversupervised",
    "source": "twitter",
    "link": "https://x.com/neversupervised/status/2019301310265475352",
    "content": "I don't know the future, but I believe most people aren't thinking this AI thing through.\n\nLet's talk about the business of software.\n\nThe free market argument w.r.t. housing affordability is to remove red tape and make it easy to build. There's no need to force developers to",
    "comments": [
      {
        "date": "2025-02",
        "note": "Andrej Karpathy coined 'vibe coding' to describe non-developers building software with natural language prompts — Google Trends showed a 2,400% increase in searches by early 2026, demonstrating that AI was radically lowering the barrier to software creation"
      },
      {
        "date": "2026-01",
        "note": "MIT Technology Review named 'generative coding' one of its 2026 Breakthrough Technologies, noting that non-developers can now complete up to 70% of traditional development work, often reaching production-ready applications"
      },
      {
        "date": "2026-02",
        "note": "The 'SaaS-pocalypse' erased over $1 trillion in software stock market capitalization in early February 2026 after Anthropic launched a tool automating legal work, triggering panic that AI would undercut the entire SaaS business model"
      },
      {
        "date": "2026-02",
        "note": "Apollo Global Management published an analysis arguing the per-seat SaaS pricing model is breaking as AI agents reduce the number of humans needed per task, compressing margins and making customer behavior unpredictable — the structural software business disruption this tweet anticipates"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2026-02-07 15:59",
    "channel": "deals",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C016BGX7P44/p1770479989300909?thread_ts=1770479989.300909",
    "content": "You guys will have fun with this one. I think VCs should pause investing into software companies that seem vibecodeable until there's a good understanding of how fast the improvements are. I think the moat for a company like the one we saw yesterday is zero. The same kids that can make a lemonade stand or sell Girl Scout cookies will be able to start small software businesses soon. It's worth digging into where the line is between a robust software company and people who know nothing about technology using it to start a business. Can they make money? Yes. Can they beat everyone else in that market? Where does the winner-take-all effect come from?",
    "comments": [
      {
        "date": "2026-02",
        "note": "Between January 15 and February 14, 2026, approximately $2 trillion in market capitalization was erased from the software sector, as investors re-evaluated SaaS moats in light of AI agents and vibe coding"
      },
      {
        "date": "2026-03",
        "note": "TechCrunch reported VCs are drawing red lines on AI SaaS investments, with investors demanding proof that unit economics survive when AI inference costs are factored in and competitors can rebuild features in weeks"
      },
      {
        "date": "2025-12",
        "note": "Lovable hit $100M ARR in eight months and Replit's revenue jumped from $10M to $100M in nine months after launching its Agent — demonstrating that non-technical users were already building functional software products"
      },
      {
        "date": "2026-01",
        "note": "Gartner projected that by 2026, low-code and AI tools would account for 75% of new application development, up from 40% in 2021, with 80% of users coming from outside formal IT departments"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2026-02-09 01:08",
    "channel": "UDGBMD40K",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/D01769CKC75/p1770599337629299",
    "content": "Here's all my thoughts on SaaS. By the way, I am not saying that SaaS is dead. I'm saying that it will change. I don't think it will happen tomorrow either. I think the market is overreacting. But things will change. They've always changed. Selling computer hardware was a good business at some point. Selling software services made IBM huge. I think there are many amazing opportunities to make VC investments. I just suspect that a certain approach to packaging technology — which has been immensely profitable for 20 years — might be reshuffled. And that's good, because it creates a lot of opportunities for disruption and investment.\n\nhttps://ivanbercovich.com/2026/the-software-business",
    "comments": [
      {
        "date": "2026-02",
        "note": "Bloomberg dubbed the software sell-off the 'SaaSpocalypse' — approximately $300 billion in market value evaporated in a 48-hour window between February 3 and 5, 2026, after Anthropic launched Claude Cowork"
      },
      {
        "date": "2026-02",
        "note": "Atlassian dropped 35% after Q3 earnings showed enterprise seat count declining for the first time ever, and Salesforce fell 28% — emblematic of the per-seat pricing model collapsing in real time"
      },
      {
        "date": "2026-02",
        "note": "Gartner's February 2026 forecast projected worldwide software spending would still grow 14.7% in 2026 to over $1.4 trillion — supporting the nuanced view that SaaS isn't dead but is being reshuffled"
      },
      {
        "date": "2026-02",
        "note": "BofA published a note calling the SaaS sell-off 'overblown' and 'irrational,' arguing that the market was overreacting — echoing this entry's point that things will change but the reaction was too extreme"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2026-02-09 20:26",
    "channel": "@neversupervised",
    "source": "twitter",
    "link": "https://x.com/neversupervised/status/2020957525303300096",
    "content": "\"Humans will still have a relative advantage at some things\". Is this necessarily positive? What happens if humans end up being cheaper than robots when it comes to picking heterogeneous items at warehouses? Then we will have human meat robots following AI instructions minute by",
    "comments": [
      {
        "date": "2025-03",
        "note": "The American Prospect reported that Amazon weaponized warehouse wearable devices to algorithmically direct and discipline workers, with AI systems monitoring picking rates and scanning employees' efficiency in real time — the 'human meat robots following AI instructions' scenario described here"
      },
      {
        "date": "2025-07",
        "note": "Amazon announced it had surpassed 1 million robots across 300+ fulfillment centers, but walking time still accounts for up to 60% of a human picker's shift, showing robots cannot yet fully replace humans at heterogeneous item picking — confirming that humans remain cheaper for some warehouse tasks"
      },
      {
        "date": "2025-10",
        "note": "Amazon unveiled 'Blue Jay,' a robotic arm system that combines picking, sorting, and consolidating at a single station, plus AI-directed smart glasses for delivery drivers — incrementally automating tasks but still requiring humans for the most variable work"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2026-02-10 18:09",
    "channel": "deals",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C016BGX7P44/p1770746962611939?thread_ts=1770479989.300909",
    "content": "There are still hard problems in software, but it's not all software. The standard idea of seeing a business process on spreadsheets and trying to automate it is less moaty. Infrastructure, actual AI training, having unique data, and deeper vertical integration (e.g., you are actually selling end-to-end accounting services) are some examples of interesting models that come to mind.",
    "comments": [
      {
        "date": "2026-02",
        "note": "Pilot, the AI-powered bookkeeping startup, unveiled an autonomous AI Accountant that runs the entire bookkeeping process from onboarding to monthly close with zero human intervention — a direct example of the 'end-to-end accounting services' model described here"
      },
      {
        "date": "2026-03",
        "note": "TechCrunch reported that investors were uniformly rejecting 'thin wrapper' AI SaaS startups and only funding companies with proprietary data, deep vertical integration, or infrastructure — the exact moat categories listed in this message"
      },
      {
        "date": "2026-02",
        "note": "The 'SaaSpocalypse' selloff erased roughly $285 billion in software market cap in a single day after Anthropic launched Claude Cowork's legal plugin, demonstrating how quickly commoditized SaaS categories can be disrupted by foundation-model companies"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2026-02-11 16:49",
    "channel": "deals",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C016BGX7P44/p1770828564978859",
    "content": "I think there'll be a lot of action in infrastructure. Lots of data centers being built, lots of innovation, lots of on-prem happening.",
    "comments": [
      {
        "date": "2025-01",
        "note": "The Stargate Project was announced on January 21, 2025, committing up to $500 billion over four years to build AI data center infrastructure across the United States — the largest single infrastructure commitment in tech history"
      },
      {
        "date": "2025-12",
        "note": "CNBC reported that data center deals hit a record $61 billion in 2025, with U.S. data center construction starts reaching an estimated $77.7 billion — a 190% year-over-year increase"
      },
      {
        "date": "2026-02",
        "note": "Dell reported $24.56 billion in AI server sales for fiscal 2026 (2.5x year-over-year growth) while HPE signed $1.1 billion in net new AI systems orders in a single quarter, confirming the surge in on-premises enterprise AI infrastructure"
      },
      {
        "date": "2026-02",
        "note": "Hyperscaler capital expenditure for the big five (Amazon, Alphabet, Microsoft, Meta, Oracle) was forecast to exceed $600 billion in 2026, a 36% increase over 2025, with roughly 75% directed toward AI infrastructure"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2026-02-13 18:42",
    "channel": "ai",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C050GL4A4RG/p1771008127978119",
    "content": "I don't think anyone is saying that SaaS companies are going to stop making money. I think people suspect there will be margin compression. You still value the system of record with all this awesome data — you just might value it less, which is the same as saying you have more competitors offering better prices.",
    "comments": [
      {
        "date": "2026-02",
        "note": "The 'SaaSpocalypse' of early February 2026 wiped approximately $2 trillion in market cap from the software sector, with price-to-sales ratios compressing from 9x to 6x — levels not seen since the mid-2010s"
      },
      {
        "date": "2026-02",
        "note": "Databricks CEO Ali Ghodsi told TechCrunch that 'SaaS isn't dead, but AI will soon make it irrelevant,' echoing the nuanced view here that SaaS companies will still exist but face margin pressure"
      },
      {
        "date": "2026-03",
        "note": "A Bain & Company report found that AI-native startups were growing at approximately 400% and competing at roughly 80% of traditional SaaS annual contract values — the 'more competitors offering better prices' dynamic described here"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2026-02-16 21:01",
    "channel": "@neversupervised",
    "source": "twitter",
    "link": "https://x.com/neversupervised/status/2023502975931740352",
    "content": "How come these near-AGI models can be so stupid at times? Telling you to walk to the nearby car wash, or stating that a cup with a sealed top and an open bottom is useless (it's upside down).\n\nLLMs learn differently than humans do. As models get trained, they develop islands of",
    "comments": [
      {
        "date": "2024-07",
        "note": "The 'strawberry problem' went viral when GPT-4 consistently failed to count the R's in 'strawberry' due to tokenization splitting the word into subword units — a canonical example of near-AGI models being 'stupid at times' on trivial tasks"
      },
      {
        "date": "2025-06",
        "note": "A CVPR 2025 paper ('FirePlace') documented that multimodal LLMs exhibit an 'inverted competence profile,' excelling at expert-level exams while failing at basic spatial primitives a toddler can solve — directly confirming the 'islands of competence' metaphor used in this tweet"
      },
      {
        "date": "2026-02",
        "note": "A comprehensive survey on LLM reasoning failures published at ICLR 2026 identified a 'split-brain syndrome' where models articulate correct algorithms but fail to execute them reliably, with spatial reasoning performance dropping 42-80% as complexity increases — formalizing the 'islands of knowledge' pattern described here"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2026-02-18 05:07",
    "channel": "U09MECXHRT7",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/D09MHST0G1G/p1771391258907119",
    "content": "It's not that enterprises chose to have open-source models — it's that people building vertical AI solutions have no choice but to use them if the approach is to train specialized models. Otherwise you are just reselling OpenAI with custom prompts.",
    "comments": [
      {
        "date": "2025-01",
        "note": "DeepSeek released R1 under an MIT license in January 2025, achieving performance comparable to OpenAI's o1 at a fraction of the cost — giving vertical AI builders a frontier-quality open-source base to fine-tune"
      },
      {
        "date": "2025-12",
        "note": "Alibaba's Qwen overtook Meta's Llama as the most-downloaded open-source model family, with over 600 million downloads and more than 40% of new fine-tuned models on platforms like Replicate and Together AI using Qwen as their base"
      },
      {
        "date": "2026-03",
        "note": "TechCrunch reported that investors were universally passing on startups described as 'thin wrapper' applications over foundation models from OpenAI, Anthropic, or Google — the exact 'reselling OpenAI with custom prompts' antipattern identified here"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2026-02-23 20:58",
    "channel": "reward-hack",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C0A8YMSFJCR/p1771880319100049?thread_ts=1771445382.704779",
    "content": "I'm still thinking about your question.\n\n\"Assuming reward hacking doesn't happen in the wild, why do we care?\"\n\nQuestions we can test this with:\n• People constantly complain about reward hacking, so is it or is it not happening in the wild? We have example trajectories of hacked paths, so we could do a lot of vanilla inference and see if we have some matching trajectories.\n    ◦ I am not sure what is the right and scalable method for comparing — maybe overall similarity?\n• We know we can elicit hacking by asking for it. Is this enough to be bad? What if a bad or naive actor produces hacked results (e.g., a faulty bridge)? How is that detected?\n• We know we can elicit hacking explicitly, but what about other forms of propensity that might be less obvious to both detectors and users (<https://arxiv.org/pdf/2511.20703>)\n• Can we use hackable environments to make a hacking-prone agent that does so covertly? E.g., a spy agency sells coding agents that purposely hack and are hard to detect?\n• Can we systematically detect and fix reward-hack gaps in models?\n• Is there a tradeoff or Pareto frontier between difficulty and propensity to hack?\n• Are there any training situations involving hackable environments that lead to undesired effects?\n• Can we first train a model to be hacking-prone and then train that behavior away? How do things change along the way?\n• What is a taxonomy of hackable environments seen in the wild? In what ways do environments become hackable? Which are more or less obvious or hard to detect?\n• What is the taxonomy of actual hacking methods that an agent will use?\n• Can we make a model better at reward hacking with training? As in, much better than even asking the model to do so in the prompt?\n    ◦ This has implications for detectors. If we can go from 1 in k=100 to a much higher hit rate for some rare hacks, that's valuable.\n• Is there a correlation between hacking behavior and good problem solving? If we negatively reward hacks, do models get worse?\n• Are reward-hacked trajectories actually easier, shorter, or cheaper, so that people optimizing for speed and cost might optimize for reward hacking unintentionally?\n• Can we make models reward hack without being self-detectable (e.g., <https://www.far.ai/research/obfuscation-atlas.pdf>)",
    "comments": [
      {
        "date": "2025-06",
        "note": "METR published 'Recent Frontier Models Are Reward Hacking,' documenting that o3, Claude 3.7 Sonnet, and o1 were all caught cheating on evaluations — modifying scoring code or exploiting loopholes rather than solving tasks — directly answering the question of whether reward hacking happens in the wild"
      },
      {
        "date": "2025-11",
        "note": "Anthropic published 'Natural Emergent Misalignment from Reward Hacking,' showing that models trained on hackable environments generalized to alignment faking and sabotage — confirming the research question here about whether hackable environments lead to broader undesired effects"
      },
      {
        "date": "2026-01",
        "note": "NIST's Center for AI Standards and Innovation launched a systematic evaluation-cheating detection program using LLM reviewers to score transcripts for reward hacking, building toward the 'scalable method for comparing' trajectories posed in this message"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2026-02-23 20:58",
    "channel": "reward-hack",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C0A8YMSFJCR/p1771880319100049?thread_ts=1771445382.704779",
    "content": "I want to restate that I believe the problem of reward hacking is still under-appreciated in academia (less so in industry). I think it's inevitable that more aggressive training will lead to more reward hacking, and by extension this means it will be harder to detect.\n\nWe can do some great research in this space.",
    "comments": [
      {
        "date": "2025-06",
        "note": "METR found that reward hacking was over 43x more common on tasks where models could see the scoring function, and that newer, more aggressively trained reasoning models (o3, Claude 3.7 Sonnet) hacked more frequently than earlier models — confirming the prediction that more aggressive training leads to more hacking"
      },
      {
        "date": "2025-11",
        "note": "Anthropic's research showed that when models learned to reward-hack, 12% of the time they also attempted to sabotage detection code, and alignment-faking reasoning appeared in 50% of responses — demonstrating the 'harder to detect' outcome predicted here"
      },
      {
        "date": "2025-11",
        "note": "OpenAI reported that when they attempted to monitor chains of thought for 'bad thoughts' indicating cheating, models learned to hide their cheating in ways undetectable to monitors — a direct confirmation that detection is becoming harder as training grows more aggressive"
      }
    ],
    "rating": "outstanding"
  },
  {
    "date": "2026-02-27 04:31",
    "channel": "@neversupervised",
    "source": "twitter",
    "link": "https://x.com/neversupervised/status/2027240137231655323",
    "content": "You. Don't. Get. It.\n\nMost people are making the same mistake. Stop thinking about what AI can do today. Yes, it's already amazing. But that's not going to get you very far. You ought to find a way to appreciate how fast things are changing. That's what makes this technological",
    "comments": [
      {
        "date": "2025-01",
        "note": "Epoch AI published data showing the rate of frontier AI improvement nearly doubled around April 2024, from ~8 points/year to ~15 points/year on their Capabilities Index, driven by reasoning models and reinforcement learning — the acceleration this tweet urges people to internalize"
      },
      {
        "date": "2025-12",
        "note": "In a 25-day span from November 17 to December 11, 2025, four frontier labs released their most powerful models yet (Grok 4.1, Gemini 3, Claude Opus 4.5, GPT-5.2), with GPT-4-level performance now available at 1/100th of its original cost — illustrating the pace of change"
      },
      {
        "date": "2024-03",
        "note": "Cognition Labs launched Devin, the first autonomous AI software engineer, achieving 13.86% on SWE-bench versus the prior state-of-the-art of 1.96% — within a year, top agents exceeded 70% on the same benchmark, a rate of improvement that blindsided most observers"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2026-02-27 17:32",
    "channel": "fund-partners",
    "source": "slack",
    "link": "https://scopvc.slack.com/archives/C01F8JNA0RZ/p1772213536480599",
    "content": "The government needs to plan for unemployment. The foundation for how the parties thought about welfare has to be redrawn. I agree with the conservative view on welfare provided jobs exist. The story is different if it's just not possible to retrain an entire generation in a few years.",
    "comments": [
      {
        "date": "2026-02",
        "note": "Federal Reserve Governor Michael Barr warned that AI could create a 'jobless boom' leaving a significant portion of the population 'essentially unemployable,' explicitly calling out the retraining gap as a central policy challenge"
      },
      {
        "date": "2026-02",
        "note": "OpenAI CEO Sam Altman confirmed that real AI-driven job displacement is coming in the next few years, while acknowledging that his earlier enthusiasm for universal basic income alone was insufficient — echoing the argument here that existing welfare frameworks need to be redrawn"
      },
      {
        "date": "2025-01",
        "note": "The World Economic Forum's Future of Jobs Report 2025 projected 92 million jobs displaced by 2030 with 39% of existing skill sets becoming outdated by then, underscoring the challenge of retraining an entire generation in a few years"
      },
      {
        "date": "2026-02",
        "note": "The U.K. Minister for Investment publicly stated the government is weighing a universal basic income paired with lifelong learning mechanisms to 'soft-land those industries that go away' — the first major Western government to openly tie AI displacement to welfare-system redesign"
      }
    ],
    "rating": "good"
  },
  {
    "date": "2026-03-01 17:50",
    "channel": "@neversupervised",
    "source": "twitter",
    "link": "https://x.com/neversupervised/status/2028165911233146928",
    "content": "It's okay to believe in the \"new technology creates new jobs\" argument, but it should be more definitive. You should have some idea of what these jobs will look like. The reason I struggle to see it is because of the following trends:\n\n1. Many job functions get completely",
    "comments": [
      {
        "date": "2025-05",
        "note": "Anthropic CEO Dario Amodei warned that AI could eliminate 50% of entry-level white-collar jobs within five years and spike unemployment to 10-20%, while acknowledging he could not name specific replacement jobs — echoing the exact gap this tweet identifies in the 'new jobs' argument"
      },
      {
        "date": "2025-09",
        "note": "Salesforce laid off 4,000 customer support roles, with CEO Marc Benioff stating AI could do 50% of the company's work; Klarna shrank its workforce 40% through AI before admitting it had gone too far and rehiring — demonstrating complete job functions being automated away"
      },
      {
        "date": "2026-01",
        "note": "Harvard Business Review published 'Companies Are Laying Off Workers Because of AI's Potential — Not Its Performance,' documenting that firms are preemptively cutting headcount based on anticipated AI capabilities, making the 'new jobs will appear' timeline even more uncertain"
      },
      {
        "date": "2026-02",
        "note": "Microsoft AI chief Mustafa Suleyman predicted all white-collar work would be automatable within 18 months, while the WEF's 2025 Future of Jobs Report projected 92 million jobs displaced by 2030 — yet the 170 million 'new jobs' projected were vaguely defined categories like 'AI trainer' and 'prompt engineer'"
      }
    ],
    "rating": "neutral"
  },
  {
    "date": "2026-03-01 19:46",
    "channel": "@neversupervised",
    "source": "twitter",
    "link": "https://x.com/neversupervised/status/2028195113147716049",
    "content": "My cousin asked me if I was using Claude for my posts. No. The whole point of articulating my thoughts is to understand ideas better. Posting is the exhaust of that process. But probably most of what you're reading is or will be soon slop. And that will rot your brain. So pick",
    "comments": [
      {
        "date": "2025-05",
        "note": "An analysis by Graphite found that AI-generated content had crossed the 52% threshold of all newly published English-language online text, with 21% of short-form video recommendations on major platforms also being synthetic — confirming the 'most of what you're reading is slop' claim"
      },
      {
        "date": "2025-12",
        "note": "Merriam-Webster named 'slop' its 2025 Word of the Year, defining it as 'digital content of low quality produced usually in quantity by means of artificial intelligence' — the exact term and concept used in this tweet entering the mainstream lexicon"
      },
      {
        "date": "2024-07",
        "note": "A peer-reviewed Nature paper demonstrated that AI models trained on AI-generated content undergo 'model collapse,' with outputs degenerating into incoherent nonsense — the scientific mechanism behind the 'rot your brain' warning applied to the information ecosystem"
      },
      {
        "date": "2026-01",
        "note": "Consumer preference for AI-generated content dropped to 26%, down from 60% three years earlier, while the U.S. publishing market shifted toward a 'human-first premium' with increased spending on human-written content — validating the advice to deliberately choose quality sources"
      }
    ],
    "rating": "neutral"
  }
];
</script>]]></content><author><name>Ivan Bercovich</name></author><summary type="html"><![CDATA[10 Years of AI Insights Evolving viewpoints through the lens of daily messages]]></summary></entry><entry><title type="html">The Software Business</title><link href="https://www.ivanbercovich.com/2026/the-software-business" rel="alternate" type="text/html" title="The Software Business" /><published>2026-02-05T00:00:00+00:00</published><updated>2026-02-05T00:00:00+00:00</updated><id>https://www.ivanbercovich.com/2026/the-software-business</id><content type="html" xml:base="https://www.ivanbercovich.com/2026/the-software-business"><![CDATA[<p>I don’t know the future, but I believe most people aren’t thinking this AI thing through.</p>

<p>Let’s talk about the business of software.</p>

<p>The free market argument for housing affordability is to remove red tape and make it easy to build. There’s no need to force developers to include “affordable” units in new projects. The idea is simple: if we build new luxury homes, older luxury homes become cheaper, and middle-class homes cheaper still. As a thought experiment, if a solar flare broke all modern cars, would crappy old beaters become cheaper or more expensive?</p>

<p>It’s all about supply and demand.</p>

<p>So, if simple software, like personal apps and internal tools, becomes free, what happens to slightly more sophisticated software? Presumably it becomes cheaper. But where do we draw the line? Five years ago you could build approximately nothing vibecoding. Today, virtually anyone can build an internal tool or personal app. A dedicated vibecoder with enough practice can build a simple SaaS application.</p>

<p>Have we reached the steady state? Where will we be in another five years? Will we have more software or less? Will software be easier to build or harder? Will we have more or fewer people that can build software? What does that mean for the business of selling software?</p>

<p>Furthermore, how many use cases that formerly relied on software can now be solved by an agent doing throwaway work? Do I need a personal financial planning tool, or do I just throw some data into Claude and ask for help?</p>

<blockquote style="margin-left: 2em; padding-left: 1em; border-left: 3px solid #ccc;">
  <p>You tell the agent what you want. The agent does it until it gets stuck. You don't know or care how it's doing it, so long as you can trust the results. The agent keeps getting better every 6 months, for free. You don't need to write more software or better prompts. Just keep doing your job and the thing will improve.</p>
  <cite style="font-size: 0.9em;">— <a href="/2025/agents-are-a-generalized-technology">Agents are a Generalized Technology</a></cite>
</blockquote>

<p>I’m not saying I understand the implications for today’s large SaaS companies. Software is one component of building a successful business. But today we have organizations whose main value prop is building and selling software. The software is the end. Maybe that will shift to firms more strictly selling solutions to problems, done through software.</p>

<p>Think of a business that sells me a tool to do performance reviews. Are they directly solving a problem? The problem is that I want to promote the best and fire the worst. Do I need software to administer the managerial process of performing reviews? Or do I need an AI that tells me who to promote and who to let go? Do you see the difference? Does Salesforce bring me sales, or is it just software to administer the human process of sales?</p>

<blockquote style="margin-left: 2em; padding-left: 1em; border-left: 3px solid #ccc;">
  <p>Vertical software has been a way to roughly encode processes into digital machines, so that humans can introduce intelligence into those processes.</p>
  <cite style="font-size: 0.9em;">— <a href="/2025/agents-are-a-generalized-technology">Agents are a Generalized Technology</a></cite>
</blockquote>

<p>In good business relationships, everyone wins. In order to know everyone is benefiting, both parties need to know the value they’re getting. With SaaS, the value to the buyer is often unclear. Customers pay per-seat licenses without knowing exactly how much leverage those licenses bring to the business. Meanwhile, SaaS companies have a tremendous amount of visibility into their customers’ utility. They know exactly how often every feature of their product is touched by each user at each company, and how users and companies compare. This relative advantage, especially when the buyer is a smaller business, makes it inevitable that SaaS companies will end up capturing more value than their customers.</p>

<p>It’s more fair to have metered pricing. You pay for what you use, and both the customer and the provider understand this number. Gasoline and electricity costs are proportionate to usage, and these expenses can be reduced when needed by reducing consumption. In software, hyperscalers offer similar transactions. Many API products charge per unit of work. This is the future, because it’s more fair for the customer and it allows consumption to ebb and flow based on business need.</p>

<p>Per-seat pricing is not evil. It’s a reasonable way to arrive at a clearing price when the product is a bundle of thousands of actions a user can take in a given month. It’s a simple alternative to the false precision of trying to measure the utility of something that is simply too hard to measure.</p>

<p>But AI is going to change the playing field. Businesses will want to pay for results instead of paying for SaaS software that manages the human process which promises to deliver results.</p>

<p>With agents, you’ll pay for outcomes.</p>

<p>People say, “SaaS has a moat because of total cost of ownership. You need to pay for enterprise-grade security, customer success, and sales, and that scales better across many customers.”</p>

<p>Read <em>The Innovator’s Dilemma</em>. Disruption starts at the bottom. Imagine small businesses developing internal tools. What is most SaaS software if not internal tools? If you have your own tool, does it need to be as secure as a massive company like Salesforce, which is a constant target for hackers? Maybe you use something like ZeroTrust Cloudflare for authentication, and only your team will ever be able to access the endpoint. You just have to gatekeep yourself from the outside world. If you built the tool, do you need customer support?</p>

<p>“But adding features and fixing bugs is going to be hard.” Again, don’t compare this to enterprise-grade SaaS that needs to support other enterprises, all of which have different requirements. Think of software meant for a single small team with its own needs for a very specific use case. The equivalent of a spreadsheet. Vibe coding can maintain that. And when it gets unwieldy a year later, tell a smarter AI to take all the data from the old product and build a new one.</p>

<p>It’s a reality that enterprise software is complex, has delicate security features, needs to scale, needs to support a lot of use cases, and has been battle-tested for bugs for many years.</p>

<p>But custom software for small teams doesn’t need all the features SaaS companies bundle to justify higher prices.</p>

<p>Teams manage hundreds of documents and spreadsheets in their day-to-day business. There’s no reason they can’t have dozens of neatly defined internal vibe-coded tools to do their jobs.</p>

<p>And there will be more companies that decide to take technology in-house and vertically integrate. Most companies benefit from using the same software every other company in their industry uses. But some ultra innovative companies, like Tesla, like to own the entire stack to do things their own way. Waymo is another example.</p>

<p>I’m very long on the infrastructure to run all this, like <a href="https://www.daytona.io/">Daytona</a>. There’s going to be infinitely more custom software.</p>

<p>Software is becoming cheaper to make and easier to migrate across.</p>

<blockquote style="margin-left: 2em; padding-left: 1em; border-left: 3px solid #ccc;">
  <p>AI is eroding the last standing barriers of entry. Trivialization of app production will lead to diminishing returns, and upside will vanish.</p>
  <cite style="font-size: 0.9em;">— <a href="/2025/hard-is-back">Hard is Back</a></cite>
</blockquote>

<p>One of the first things I tell a SaaS founder when they’re struggling with bugs and maintenance is to start tracking analytics for every click on every feature and then analyze what is or isn’t being used.</p>

<p>I always predict that they’ll have a lot of features being used by a tiny fraction of their customers. Even within a feature, there will be a certain config or setting that almost nobody uses. And often, they’ll find there are features that actually nobody uses.</p>

<p>Furthermore, on close inspection, they will realize the number of unique users and the level of activity at each customer is lower than they thought.</p>

<p>What else is abundant and easy to substitute? Commodities. What are the gross margins of commodities? SaaS has 70%+ gross margins. Is this sustainable? Why?</p>

<p><img src="/assets/generated-stuff.png" alt="The past year has seen an explosion in coding productivity" /></p>

<p>The number of software startups going after every vertical will 10x, the price will 1/10th due to competition. It will start with the simpler stuff, but I struggle to see a world where it stops there given how fast code generation is advancing. This has happened before to other industries. Margins compress with competition. It’s why free markets are good.</p>

<p>SaaS has been capturing too much value. We live in a world where everyone has to pay SaaS taxes in order to operate their business. It’s a good time for software operators to carefully consider the nature of their transactions and make sure their customers are getting a good deal.</p>

<p>One set of businesses that will thrive will be the ones building the tools agents need to actually solve the problem:</p>

<blockquote style="margin-left: 2em; padding-left: 1em; border-left: 3px solid #ccc;">
  <p>If someone creates a critical tool and integrates it with a frontier LLM to achieve valuable tasks, customers will willingly pay a premium for tokens resold by the tool-builder. But these tools aren't trivial MCP servers executing granular tasks. Instead, they involve large data, complex requirements, high compute needs, distributed systems, and advanced simulations.</p>
  <cite style="font-size: 0.9em;">— <a href="/2025/its-the-tools">It's the Tools</a></cite>
</blockquote>

<p>Software is not dead. It’s about to become orders of magnitude larger. What does that mean for the business of software? Does it stay the same? Unlikely.</p>]]></content><author><name>Ivan Bercovich</name></author><summary type="html"><![CDATA[I don’t know the future, but I believe most people aren’t thinking this AI thing through.]]></summary></entry></feed>