If you’re building in any of the areas mentioned, ping me: [email protected] + sign up.

https://stateofthefuture.substack.com/


It’s still early.

It’s on the record. Sam, Ilya, Demis, and Dario are on the podcasts. They are saying it loudly into the microphone. Leopold wrote most of it down in Situational Awareness. Even if the U.S. Government can’t or won’t do a Manhattan Project 2.0: This Time It’s Serious, the wheels are in motion. The next 5 years are bought and paid up for. Land has been bought. Power contracts secured. GPUs pre-ordered. HBM and CoWoS capacity secured. This thing isn’t stopping. New scaling laws around chain-of-thought (CoT) inference time rather than raw GPU power change the balance between training and inference; data center and edge, but they don’t change the fundament: we have already embarked on the largest capital allocation project in human history.

Listen carefully. The people closest to the action tell us scaling works. Scaling has plenty of headroom for data center training. And as of September, we’ve likely just begun a new scaling path with inference time with GPT-o1. For training, just plug in more GPUs and add more data and parameters and off we go to level 5-models and maybe Gen6. Sure lots of work on synthetic data generation, efficient data sampling, post-training optimization et al, but we have the contours of where we are headed. But it’s been nearly two years and we haven’t got 5-level models. Nvidia fell 7% in a day. McKinsey thinks AI might be overhyped. It’s so over?

It was never over, and we never had to come back. The cost for 2 million tokens has come down 240x times in 2 years.. Claude Haiku, GPT-4o mini, and Gemini Nano are on the smartphones. Attention-based transformers can be natively multimodal, inputting text, voice, image, video and outputting any combo you want. Practical agents are within touching distance. o1 has fired the starting gun on better-than-human reasoning. Demis says embodiment might be as simple as just adding another modality. You don’t have to believe that attention-based transformers and diffusion models will get us to AGI. You have to believe that the richest companies in history, venture capitalists, and increasingly Governments have the incentive to scale AI.

It’s happening, but people aren’t updating their priors fast enough. With situational awareness, you see three opportunities. Scaling, deploying, and securing AI.  First scale. By 2026, we'll see gen5 GPT, Claude, Gemini and Llama, with gen6 models likely in 2028 requiring $100bn+ and 10GW of power. The push to 7th-gen models could lead us towards $1trillion and 100GW. We need to create and transport many more electrons to data centers and use them efficiently. Second deploy. For systems to be pervasive, we need to massively reduce token costs making intelligence too cheap to meter at $0.0001/million tokens. Finally, secure. For society to accept AI we must protect privacy with privacy-enhancing technologies, ensure fair and unbiased models and have open access AI infrastructure. These are the problems. These are the opportunities.

Scale, Deploy and Secure Opportunities


Screenshot 2024-09-20 at 16.49.26.png

Scale: Pathway to AGI

The scaling hypothesis worked. It’s a foot race. A foot race with Sam, Demis and Dario carrying billions in their pockets. And now Ilya too, the classic 1bn and 5bn seed round. Labs are investing genuinely unprecedented capex in scaling their models. This isn’t speculative. Capex has been budgeted for. Chips have been pre-ordered. 3-mile Island is being restarted and Microsoft have agreed a 20 year contract to supply power to their datacentres. By 2026, we'll see 5th-level GPT, Claude, Gemini and Llama, with 6th-level models following in 2028/29. As Leopold outlined, this needs $100 billion training clusters by 2028 and with something like $1 trillion for 7th-level clusters by 2030. Big numbers. But it’s also likely Elon Musk will be worth $1 trillion by 2030. So I dunno, it’s all relative in the fight for the future. At these scales, algorithms and hardware matter, but it’s a power game. Securing enough of it, delivering it into data centers, and using it efficiently. While GPT-4 used about 10 MW, 5th-level models may need in the order of 1GW - equivalent to a large nuclear reactor. For 2030+ clusters we might be staring down the barrel of 100GW, far exceeding current datacenter capacities. Scaling to this level requires mainly increasing power supply. But also in parallel we should aim to reduce power consumption, and improve system efficiency. I used to joke that a future AI fund is a rare earth mining fund. But for now investing in AI is basically an energy fund. If you are a ClimateTech fund reading this, I strongly suggest new positioning as “AI and Climate”. You’re welcome.

1. Increase power into data centers

Hyperscale data centers typically consume between 100-250 MW of power. A 1 GW (1000 MW) AI training cluster is 3-10x the power consumption of today's largest data centers. We need to increase the amount of power into data centres. Alot.

1.1. On-site Power Generation