Roam Notes on “I Could Do That in a Weekend! by Dan Luu

  • "Author::" [[Dan Luu]]
  • "Source::" https://danluu.com/sounds-easy/
  • "Tags:: " #software #programming #[[Dunning-Kruger Effect]] #[[Big Business]]
  • Summary

    • Outside developers often look at a large software company and think "that’s easy, I could build that in a weekend". This article describes why that’s misguided.
    • Reasons why are divided into two categories: technical and organizational. #[[To Ankify]]
      • Technical: Large businesses find it profitable to do lots of optimization and build new features that are often more complex than outsiders realize. This requires hiring engineers. You want to keep hiring engineers until the marginal benefit of hiring 1 more engineer = marginal cost of hiring 1 more engineer. Companies that are smart do this math and often find they should hire many, many engineers.
      • Organizational: Large organizations have complex communication barriers that outsiders underestimate.
  • Excerpts

    • I can’t think of a single large software company that doesn’t regularly draw internet comments of the form “What do all the employees do? I could build their product myself.”
    • Businesses that actually care about turning a profit will spend a lot of time (hence, a lot of engineers) working on optimizing systems, even if an MVP for the system could have been built in a weekend. There’s also a wide body of research that’s found that decreasing [[latency]] has a roughly linear effect on revenue over a pretty wide range of latencies for some businesses. Increasing performance also has the benefit of reducing costs. Businesses should keep adding engineers to work on [[optimization]] until the cost of adding an engineer equals the revenue gain plus the cost savings at the margin. This is often many more engineers than people realize. #[[To Ankify]] #Hiring
    • And that’s just performance. Features also matter: when I talk to engineers working on basically any product at any company, they’ll often find that there are seemingly trivial individual features that can add integer percentage points to revenue. Just as with performance, people underestimate how many engineers you can add to a product before engineers stop paying for themselves. #features
    • Additionally, features are often much more complex than outsiders realize. If we look at search, how do we make sure that different forms of dates and phone numbers give the same results? How about [[internationalization]]? Each language has unique quirks that have to be accounted for.
    • It’s fine to ignore this stuff for a weekend-project [[MVP]], but ignoring it in a real business means ignoring the majority of the market! Some of these are handled ok by open source projects, but many of the problems involve open research problems.
    • Everything we’ve looked at so far is a technical problem. Compared to [[organizational problems]], [[technical problems]] are straightforward.
    • Distributed systems are considered hard because real systems might drop something like 0.1% of messages, corrupt an even smaller percentage of messages, and see latencies in the microsecond to millisecond range. When I talk to higher-ups and compare what they think they’re saying to what my coworkers think they’re saying, I find that the rate of lost messages is well over 50%, every message gets corrupted, and latency can be months or years. #communication #[[communication issues]]
    • It’s entirely plausible that someone will have an innovation as great as PageRank, and that a small team could turn that into a viable company. But once that company is past the VC-funded hyper growth phase and wants to maximize its profits, it will end up with a multi-thousand person platforms org, just like Google’s, unless the company wants to leave hundreds of millions or billions of dollars a year on the table due to hardware and software inefficiency.
    • It’s not that all of those things are necessary to run a service at all; it’s that almost every large service is leaving money on the table if they don’t seriously address those things. #[[To Ankify]]
    • This reminds me of a common fallacy we see in unreliable systems, where people build the happy path with the idea that the happy path is the “real” work, and that [[error handling]] can be tacked on later. For [[reliable systems]], error handling is more work than the happy path. The same thing is true for large services — all of this stuff that people don’t think of as “real” work is more work than the core service