You’re reading Startup Cities, a newsletter about startups that build neighborhoods and cities.
This week: a very exploratory essay on A.I. neural networks and cities.
The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin.” - Richard Sutton
The Bitter Lesson for A.I.
In Artificial Intelligence, big, dumb, simple models keep beating “smart” models imbued with expert knowledge. In 2019, pioneer Richard Sutton called this the “Bitter Lesson” that A.I. research keeps re-learning.
Researchers may spend months hand-coding rules to spot edges in images or syllables in speech. But it’s big, dumb models with simpler architectures and lots of training data that win. (Here’s a fun paper from 2020 showing how this happened in video).
Sutton says the Bitter Lesson unfolds like this:
AI researchers try to build knowledge into their agents
This always helps in the short term, and is personally satisfying to the researcher, but...
In the long run it plateaus and even inhibits further progress
Breakthrough progress eventually arrives by an opposing approach based on ... search and learning.
Put a bit unfairly, we might say that super-genius researchers gratify their own egos by hand-coding expert knowledge into the model. Then they’re crushed by someone who just uses search and learning.
The Unreasonable Effectiveness of Trial and Error
But what does search and learning mean?
To simplify quite a bit, Sutton’s search and learning is a computer’s version of trial and error. A model sees some data and makes a prediction. If the prediction is wrong, the model — a web of numbers — tweaks its values and tries again. In a successful model these values converge and encode a stable “understanding” of the domain.
Layers form neural networks. Nodes form layers. This makes networks seem fractal. They’re made of sub-networks, which may look like the overall network. Layers specialize. You can even look inside neural networks and see how each layer “understands” its slice of the domain.
Here’s a vision model with layers specialized from random squiggly lines to shapes like insects and eyeballs.
So why does search and learning work so well? Because, says Sutton, it “scales arbitrarily.”
As any burned out programmer will tell you, it’s hard to add more and more expert knowledge to a system. But it’s pretty easy — at least in the era of Huang’s Law — to add more simple and dumb layers to your neural network. Search and learning scales well.
Search and learning is also neutral. Expert knowledge comes full of assumptions. Programmers may try to model how the human ear perceives sound or how the human eye perceives an image. These representations feel rational to the expert. But it turns out they’re not so good at processing data in practice.
In AI, search and learning scaled beats expert knowledge.
Cool story, bro! But what does this have to do with cities? Possibly much, dear reader.
A City Is a Computer Computational System
I’ll now politely offend a whole class of urbanists by arguing that, while a city is not a computer, a city is a computational system.
Like a neural network, a city is in an ongoing process of self-optimization. It must optimize to geography, to labor markets, to natural disasters, to population change, and a huge list of other unstable variables.
We can imagine every person and parcel as a node in this city-as-neural-network. Each person and parcel tries to adapt to local conditions through trial an error. People move. Businesses rise and fall. Houses are built and torn down.
Over time, these nodes form specialized layers. In a city, these would be neighborhoods. Iconic neighborhoods tend to be highly optimized (though rarely by design!) for a particular experience: nightlife, art, business.
Each node — each person or parcel — represents a tiny slice of knowledge of the city. A node does not “understand” or represent the entire system: just a little corner of it. Given enough time and the ability to change, the whole network — the city — becomes a many-layered system optimized to its surrounding conditions.
As in a neural network, how the city optimizes itself isn’t predictable. But it happens.
The Bitter Lesson for Cities
Neural networks are basically big math functions optimized little by little. Expert systems attempt to pre-optimize this function. Reasoning by analogy, much of urban planning is an expert attempt to pre-optimize a city.
We design expert knowledge modules and program them into the computational system called a city. These modules go by names like floor area ratios, setbacks, parking minimums, single family zoning, environmental review etc.
In the short-term, more expert knowledge seems better. It sounds inspiring and visionary. Consultants get paid. Mayors cut ribbons with giant scissors. People can be proud of “smart growth” and “smart planning.” But in the long term, it damages the product.
Sweeping urban plans and complex policies are like Sutton’s ill-fated expert systems. They pre-specify so much expert knowledge into a city that they constrain the long-term potential of it. This analogy suggests that Brasilia is less dynamic than Rio in part because Brasilia had more expert knowledge pre-programmed into it.
We might argue that the problems in many of today’s legacy cities are the failure of expert systems to scale. Where as Tokyo (lightly zoned and full of search and learning) has scaled well, San Francisco (deeply zoned and expert-controlled) has not. The American housing market drowns in expert knowledge pre-optimization: special policies for affordability, layers of permits meant to control externalities, detailed specifications of the built form. This hasn’t scaled.
When faced with urban problems, many urbanists favor putting more expert knowledge into the city. But the Bitter Lesson should make us skeptical of this path.
As in AI, expert knowledge relies on biased ways to represent a domain. A zoning map is a human-friendly representation of the city. It’s “rational.” But the expert-friendly representation doesn’t map to how the network optimizes itself. A restrictive zoning map is not the friend of search and learning.
Administrative complexity also hurts search and learning. Each new bureaucratic step, each layer of permissions, each restriction constrains search and learning within the city. We destroy little plans (locally-optimized sub-networks) in pursuit of a grand plan. We accumulate expert modules, making our network more falsely pre-optimized, more complicated, and less able to scale search and learning.
This is the bitter lesson for cities:
The smartest experts and their top ideas may be the source of city dysfunction and failure. Seemingly “dumb” search and learning by normal people scales better.
The Bitter Lesson for Cities is no one’s fault. It’s a general feature of computational systems likes cities. We just haven’t learned the lesson yet.
The Problem of Computation
We might call this challenge to urban planning: the problem of computation.
This argument is quite old. It harkens to Friedrich Hayek, Jane Jacobs, and their modern progenitors like Sandy Ikeda and Alain Bertaud. But it’s only recently that we have the stark example of successful neural networks.
Diehard libertarians who see the problem of computation often conclude: “Cool. So cities should have no plan! ANYTHING GOESSSSSSS!” But, as with neural networks, the opposite of pre-optimizing with expert knowledge isn’t exactly “no plan.”
First, cities hold millions of plans. People have little plans that reflect their expectations, budgets, and preferences:
“I should open a restaurant in this vacant corner unit.”
“I should add a cottage to my back yard.”
“This warehouse should be a nightclub.”
“I want to live within walking distance of my job.”
“I should sell mixed drinks from my garage.”
These tiny plans by firms and families unfold in parallel rather than as part of a single “master” plan or party agenda. So there’s plenty of planning, just at a smaller scale.
And, as complexity economists are fond of saying, markets don’t exist in a vacuum. They evolve within rules and norms — which I’ve previously called “social technologies” — that (hopefully) support them.
So the challenge for cities isn’t “should we plan anything in advance?” But “how do we set things up to encourage search and learning… but without pre-optimizing our network?”
The problem of computation applies to private Startup City developers just as it does to planners in city hall. Whether public or private, the problem remains.
Pre-Optimization vs. Laissez Faire Activism
If we take the Bitter Lesson for Cities seriously, what do we do when designing our computational system?
There’s an analogy from A.I. here, too!
You rarely train a neural network from scratch. Instead, you use special weight initializations, pre-trained weights, fine tuning, and other techniques. This biases the model slightly. Search and learning still powers the model, but it’s steered modestly in a positive direction.
This brings the urban planner back in. The AI researcher — the analog to the urban planner in our story — sets up the basic pieces for a neural network’s evolution. They program a basic architecture with lots of room for search and learning. They prepare the data and let ‘er rip!
The planner/programmer is not a pre-optimizing expert, but a steward of the environment within which search and learning will occur. There’s some foundational work to be done, but then the planner/programmer must let nodes in the network search and learn.
So the Bitter Lesson doesn’t suggest zero role for urban planning or a prohibition on all upfront design decisions. Some upfront decisions are inevitable. But it does suggest that urban planners — whether in a legacy or startup city — should be extremely cautious about their plans.
The problem of computation suggests that grand, utopian visions for cities are a bad idea. They’re delusional pre-optimization. But even mundane visions, such as an “industrial policy” could be seen as trying to build an ill-fated expert system.
What might we call this approach where we design for search and learning?
In the language of Roland Kupers and David Colander, a city must construct a reasonably laissez-faire “ecostructure” that allows massive, parallel, (and easily scalable) search and learning. Kupers and Colander call this approach “laissez-faire activism.”
Early Hong Kong looks like laissez-faire activism. And the internal plans of the more sensible Startup Cities I’ve seen also look like laissez-faire activism. There is no grand expert strategy (recall that Hong Kong’s John Cowperthwaite refused even to collect economic statistics!). But there is a gentle, sensible structure that allows search and learning to occur.
Just Add People & Big Dumb Boxes
What are practical ways that a city might follow laissez-faire activism? What levers do we have?
One maxim might be: “just add people!” More people is the urban equivalent of more nodes in each layer of a neural network. More people means more micro-plans, more specialization, and more trial and error.
Of course, “just add people” isn’t easy — as so many failed new city projects show. But, if we believe in the Bitter Lesson, a growing city is doing many things right.
A growing city that doesn’t see massive change in cost indicators — think of Houston or Tokyo’s rapid growth with relatively stable housing prices — seems to really have its act together. “Just add people” is a guiding light, a heuristic to measure the health of our city-as-computational-system.
A complement to “just add people” would be “let existing people do more things.” By analogy, this is like adding more layers (or activation functions) to a network. This allows the network to assume a wider range of states.
“Let people do more things” would seem to encourage mixed use. It might favor non-committal architecture such as the European “Big Dumb Box” or a simple road grid. A private developer might offer “build to suit” arrangements. It might also favor land-lease models rather than freehold, to avoid the premature lock-in to a given use.
Or perhaps the Bitter Lesson is stronger as a guide for what not to do: no utopian dreams, no army of consultants, no “if you build it they will come.” Just the modest work of building a safe, calm, sensible environment for people to live and do business. Just add people. And let those people build a life.
While abstract, this AI-to-city analogy feels like a rich vein to mine. There’s a Simple Rules for a Complex World vibe here. Perhaps urban planning that respects the Bitter Lesson looks more like data-driven surgery: a theme we touched upon in a recent interview with planner Joni Baboçi.
Pre-optimization by experts may better suit smaller scales: perhaps a neighborhood or town-sized project is different than “building a city from scratch.” And there’s many implications for product management and design principles for a Startup City.
Is the argument from computation real? What might Startup City developers learn from it? Tell me what you think.
Thanks for reading and don’t forget: Startups Should Build Cities!
lovely! Thanks Zach.
I mostly agree, but the city as a self-optimizing system does run into limitations such a transaction costs and externalities. It's often so costly to measure externalities that only governments can and have a reason to do this, and Coasean bargaining may not be a viable solution to externalities because of transaction costs. Think of the architectural externalities I wrote about for instance.
But I especially think it teachss an important lesson for designing street layouts. Roman-style street grids can adapt much easier to changes in density and transportation demands through merging smaller blocks into superblocks and making some groups of blocks car-free to create a more pedestrian-friendly neighborhood, or the other way around.