Johnny Lee

Dynamism as a supercomputing race

Fri, 17 Jul 2026 13:43:36 GMT

Trust: why companies by software

Mon, 23 Feb 2026 04:25:44 GMT

Something Small is Happening

Wed, 11 Feb 2026 00:00:00 GMT

How to even keep up with explosive AI progress?

Mon, 19 Jan 2026 12:57:34 GMT

How AI Is Changing the Way Students Learn

Mon, 24 Nov 2025 22:28:36 GMT

OpenAI's road to become a hyperscaler

Sun, 19 Oct 2025 22:36:00 GMT

Power & Fab Capacity: Last Jigsaw Pieces in the Dash for Compute

Wed, 08 Oct 2025 22:36:00 GMT

Cut-throat AI competition, captive resources, Google's AI moat

Sun, 20 Jul 2025 22:35:00 GMT

A.I. usage surging across America

Tue, 24 Jun 2025 22:34:00 GMT

Crossroads Ahead: US Market Confidence Belies Historic Uncertainty in Economic Growth

Fri, 02 May 2025 01:58:15 GMT

One month after Liberation Day, the US stock market looks more or less re-assured about the future economic outlook. For the month of April, the S&P 500 is only down ~1% and the NASDAQ is up ~1%.

Liberation Day unleashed the next stage of trade warfare

Shortly after inauguration, the new US administration embarked on a new trade war footing. The US placed new tariffs on goods from China, Canada, and Mexico. Throughout February and March, the counterparties likewise have increased their tariffs in response in a tit-for-tat motion–with some occasional pausing and backsliding.

However on April 2nd (Liberation Day), the US announced broad tariffs on nearly every US trading partner, with an indiscriminate minimum tariff rate of 10% and higher. In more tit-for-tat motions, the US raised tariffs on China to a cumulative tariff of 145%, with China retaliating with a cumulative tariff of 125%.

The new “reciprocal” tariffs exceeded Wall Street’s worst expectations. US equity and fixed income markets reacted dramatically to the broad protectionist measures.

In about a week, the S&P 500 lost ~11%. US junk bonds yield increased ~100bps. The US 10-year treasury rate increased ~50bps, the largest weekly increase since 2001. The 10-year is frequently used as the world’s risk-free rate for capital markets, as a global barometer of investment risk appetite and trust in the US financial system.

While Trump suspended the reciprocal tariffs (10%+ broad tariffs) for 90 days for all trading partners on April 9, the US-China tariffs remained. As of May 1, 2025, the US and China representatives are not in any active bilateral negotiations.

2025 starts with economic contraction

In the first contraction since the first quarter of 2022, the US economy contracted 0.3% in the first quarter of 2025, mostly driven by imports front-running tariffs. The 2022 contraction had the backdrop of the highest inflation rates since the 1980s. In June 2022, annualized inflation reached a high of 9.1% . Around the same time, the Federal Reserve started tightening financial conditions to combat inflation: raising interest rates from 0% to a high of 5.5% in 2023.

This time around, there are more signs of structural obstacles to the growth of the US economy, relative to the ex-US world.

1. The “US” brand as a symbol of free trade and safe haven has forever been damaged. “Liberation Day” and the fallout from the US trade policy progression continues to puzzle US and foreign investors. The lack of coherent US policy objectives creates tremendous uncertainty in global trade and development.

Year to date, the dollar has continued to weaken, with the US dollar index down ~7%. Tourism to the US sharply declined in the days after tariffs.
Ex-US equities are up ~8.4% for the year, outperforming US equities by ~13 points.
The bond selloff in the treasury and corporate bond market was mostly foreign driven, while the recent rebound in prices was mostly inflow from US domestic investors.
The lack of policy coherence creates a large disincentive for 1-5 year capital investments in the US, especially as it relates to reshoring manufacturing and infrastructure capacity.
For example, Apple, the symbol for US consumer brands, has no plans to move manufacturing to the US, but instead will shift more assembly volume to India, mostly as a means to side-step the US tariffs.

2. The supply chain shock from a man-made shock to the global trade system has yet to arrive.

As much as the US administration may want to decouple global trade, it cannot be done overnight.
Similar to the COVID dual-sided (supply + demand) shock, the supply chain is undergoing a similar shock. Sea-bound containers from China to US have fallen 45% year-over-year in April 2025.
While US merchants front-loaded inventory, they are simply delaying the inevitable: a combination of (1) a shortage of goods and/or (2) a massive drop in demand due to increased prices due to tariffs.
This type of supply shock will impact small businesses the most, those who do not have the working capital capacity to soften the supply chain shock with increased inventory. Small businesses employ 80% of America and are responsible for ~40% of US imports from China. When small businesses suffer, America suffers.
Even assuming the US and China come to some type of equilibrium that is able to restart trade–current conditions are tantamount to a trade embargo for most goods, the infamous bullwhip effect will come back into play.
The US inventory to sales ratio will give us a good indicator of how severe the shocks will be this time around, with March 2025 figures releasing May 15.

3. Shift in global security and perspectives is already becoming evident. Geopolitically, the US absconding from the global alliances and partnerships will have longer term consequences beyond one US presidential term.

Canada: Most obviously, Trump’s policies have literally reversed the Canadian election. Enabling Mark Carney’s Liberal party to erase the Conservative party’s ~10-15 point lead over a span of 40 days. The Conservative party leader even lost his own seat.
European security: Germany is gearing up for arms. Its defense industry base is being re-ignited after Germany approved a new 500 billion euro infrastructure fund. Shepherded by Merz–who was elected in Feb 2025 after Trump started the trade war and embarrassed Ukraine at the White House, the new law exempts defense and security spending from Germany’s strict debt guidelines–enabling the state to issue more Bunds to finance its security objectives. German defense companies (i.e. Rheinmetall) have more than doubled in value since then.
Indo-Pacific security: Australia and Philippines will hold general elections in May 2025, on the 3rd and 12th respectively. Both countries play a crucial role in Indo-Pacific security, as a US allies and regional balance of power against China. The Australian elections have already been an intense balance between its Chinese and US ties in recent weeks.

4. Pending new tax and budget legislation, US debt ceiling and fiscal outlook remain similarly uncertain as trade policy.

The White House places a July 4th deadline to finish passing Trump’s tax agenda.
The X date (when the US treasury will reach the debt ceiling) is likely to be sometime this summer or early Fall.
With uncertain economic outcomes over the short and medium term, drastic changes in lowering tax receipts relative to debt burden will place additional stress on Treasury yields.

The crossroads ahead

The most important major crossroad will on the path to US-China trade negotiations in earnest.

While the two parties have yet to talk, there are increasing unilateral signals that both sides are willing to sit down.

On May 2, China (via its state owned media CCTV) has signaled some willingness to approach U.S. under the right circumstances. But China’s other arms of state continue to message a strong stance of no concessions with the US.

The backdrop for these prospective talks are not good: (1) uncertain economic growth, (2) declining consumer confidence, (3) prospects of skyrocketing prices, and (4) uncertain US fiscal and tax outlook.

Events to watch

May 3: Australian Parliament elections
May 7: Federal Reserve interest rate decision
May 12: Philippines midterm elections
May 15: U.S. March 2025 inventory-to-sales survey data
May 28: Federal Reserve May meeting minutes release

Optimizely for Intelligence

Tue, 11 Mar 2025 20:14:10 GMT

The Road Towards Intelligence

The AI community continues to drive model improvements by pulling 2 key levers:

1. Compute

Algorithmic Improvements: Innovations such as new forms of reinforcement learning, and attention mechanisms contribute to increasing efficiency. Improvements also focus on achieving the same performance with less compute intensity.
Access to More Computing Power: The industry–for the most part–relies on NVIDIA chips, large-scale data centers, and capital investment to expand computational resources and supply.

2. Data

AI models learn from structured and unstructured data sources.
High quality data with previously unincluded knowledge is non-negotiable in improving intelligence.
Companies continuously invest in refining/curating datasets, via humans or machines.

Country of Geniuses vs. Country of Yes-Men

Dario Amodei, CEO of Anthropic, envisions a future where data centers house a "country of geniuses," a concept he explores in his essay Machines of Loving Grace. He argues that intelligence will continue to advance, though with physical constraints in sectors like biology, where real-world rate limitations affect progress.

Conversely, Thomas Wolf of Hugging Face (unsurprisingly, the open source foil to the close-sourced AI CEO) presents a counterpoint, arguing that we won’t experience a “compressed 21st century” of rapid innovation. He argues that AI models could function more like a "country of yes-men" rather than a "country of geniuses." While models will be a valuable source of knowledge, he believes it’s more akin to a very obedient and A+ student than it is a genius for scientific discoveries.

Personally, I’m more inclined to believe in Thomas’s argument, AI models primarily predict the most probable token during pre-training. While post-training techniques like reinforcement learning introduce more nuanced rewards for exploration (a la search), the core predictive mechanism may limit the model’s ability to challenge existing knowledge or pursue unconventional reasoning–esspecially if it is deemed improbable by historic data.

Regardless, I do believe Tyler Cowen’s viewpoint. Humans and our “sticky” modes of societal interaction will be a rate limiting factor in how quickly we can leverage the new sources of intelligence.

AI usage today

Right now, there are a few ways most people access large language models (LLMs) and AI:

Chatbot Products

People simply use the labs’ chatbot products like ChatGPT, Claude, or Gemini, and interact with them in a chat interface.

The chatbot application might select some default model settings or have access to certain tools to make the conversation more insightful.
If you’re more knowledgeable, you might pick a different model—like gpt-4o vs. o1, o3, or some other variation—but for a typical user, those model names and versions don’t mean much.

LLMs Embedded in Existing Applications

A straightforward example is in software engineering, where developers use an AI-augmented IDE like Cursor.

The model’s text understanding and knowledge base helps autocomplete and suggest code.
If you define a new function or module and your codebase is well documented—or is otherwise known to the model—it can help you quickly generate boilerplate or complete new code.
You then run and test it to ensure correctness. Essentially, it’s augmenting existing software (an IDE) with capabilities that can enhance developer productivity.

APIs in the Background

This is where LLMs are accessed through APIs to add conversational or classification features, object identification, editing suggestions, and so on.

For example, if you contact customer support via a chat, there’s likely a large language model in the background analyzing or drafting replies.
A human agent might still finalize the response—or it may be fully automated—depending on the setup.
In either case, the AI is an API-powered component within a larger process.

Adapting to the New Age of Intelligence & Cost-Benefit Analysis

Users need to adapt, change, or add new modes of interacting with computers. I think that will be critical for how we bring this technology into the world. Intelligence that creates leverage and growth.

It may be slightly unhelpful to think we can simply apply deterministic frameworks from the last couple of decades and hope they apply similarly here.

For instance, there’s a lot of hype and optimism around the word “agents.” The idea is that you can delegate a certain task to an AI system.

Think about the agent loop:

User Input: A user provides a task, specification, or request to an AI system.
Inference + Tools: The AI system—powered by one or multiple large language models—interprets that request and makes inferences. Part of the inference involves accessing various tools, which might be deterministic ones like a calculator, a code interpreter, a web browser, or even other models (e.g., an image generator if you need an image).
Execution & Feedback: The system predicts what it should do next using these tools, retrieves the result, and repeats this loop until it believes the request has been completed.
Response: Finally, it returns the response to the user.

In a classic API setting, you have a request and a defined contract with an expected response type. You typically get back a success status or some outcome. Here, we’re trying to fit that framework onto a probabilistic system by slowly injecting and abstracting away the probabilistic compute under the hood.

That’s a good way to start adopting these tools, but the challenge remains that these systems will make mistakes.

Some tasks can tolerate that, and others cannot. In cases where error rates are relatively low and the cost of an error isn’t too high, the cost-benefit equation may still favor deploying these agents.

This often applies to areas like customer service or fraud detection, where the cost of a false positive might not be extremely high, but manually reviewing each case would be very expensive.

In effect, you’re balancing the cost of mistakes against the total volume of tasks.

Optimizely for Intelligence

“Optimizely for Intelligence,” in my mind, is another way of saying we should help the average user navigate the world of AI in a way that feels more deterministic and controlled.

Optimizely gave creative marketers a deterministic method of navigating a probabilistic decision making process of choosing the right content and creative for their audiences.

In a world where we have variable-cost intelligence—easily supplied, nearly a commodity—and yet so many different “SKUs” or choices, the current market paradox is that people are not actually leveraging any choice.

They’re just sticking to a single model or setup.

So the question is: How can we inject that choice in a way that empowers the user or consumer to pick what’s best for them?

Maybe this doesn’t have to be explicit, it may be abstracted away with signals from the user’s experience.

The challenge remains, also challenging for humans:

Choosing the right mode of intelligence at the right time for the right task.

GPT4.5: Good or not, its launch is good for the LLM serving business

Tue, 04 Mar 2025 20:58:10 GMT

GPT-4.5 launched as a preview last Thursday. GPT-4.5 topped benchmarks like LMArena, only to be matched by Grok-3 shortly after.

Regardless of the reception on quality, OpenAI’s decision to roll out their largest model to the market is a clever business maneuver.

OpenAI is the undisputed market leader in the LLM serving business. The company projected $3.7B in revenues in 2024. Its closest competitor–Anthropic–was projected to have closed 2024 with $1B in revenues.

GPT 4.5 is the most expensive API model on the market today, and also OpenAI’s most expensive model to-date.

Prompt tokens are $75 per 1M tokens
Completion tokens are $150 per 1M tokens

Previously, OpenAI’s short-lived GPT 4 (32k context) (launched March 2023) was the most expensive model:

Prompt tokens were $60 per 1M tokens
Completion tokens were $120 per 1M tokens
GPT 4’s 8k context variant is 50% cheaper, at $30/$60.

Price dynamics in a competitive market

In competitive markets, pricing decisions have massive impacts on the market’s overall profit potential.

This is especially true in markets with oligopoly characteristics. When a select few suppliers control the market, one firm’s decisions can dramatically change the profits of the entire market.

In many ways, these markets present a classic prisoner’s dilemma, especially if the products are fairly interchangeable.

Take the classic gas station example.

There is one small town with only 3 gas stations.
They likely have similar cost-structures. Wholesale gasoline costs the same amount.
Given a small town, the demand is relatively predictable as price fluctuates.

The best profit outcome (for the gas stations) is if all 3 firms cooperate (collude) to keep prices high. In a prisoner’s dilemma, this is the “collusion” outcome: no one goes to prison if no one cooperates with the police.

They can reach what is known as a high molopoly profit position (not possible with a competitive market at equilibrium).
This profit can be generated by raising prices to the level where demand and price maximizes profits.
The town will still be willing to buy gasoline at some high price, but at some lower volume. The higher prices will more than compensate for the lower volume.

The worst profit outcome (for the gas stations) is if 1 gas station lowers prices to generate more demand for their own gas station.

Then, other gas stations will lower prices to stay competitive, which is a race to the bottom.
The small town now has cheaper gas, but the gas stations likely make much less profits.

Outright price collusion is illegal, but signaling and tacit collusion is not

For the players in consumer technology, price fixing is not new. Because of its attractiveness, examples are plentiful.

Yet, there are legal ways to accomplish cooperation, and also proven to work via game theory. Here are some examples:

Price leadership: In a small market, firms implicitly and simply follow the leader. The leader will signal price levels first. For example, Apple often sets the market leading levels for high quality smart phones around the world, and others follow the price levels.
Price matching: Retailers will put in-place “price matching” programs. Signaling to others that, as long as others set a high price, they will follow.
Advanced notice of new supply: For slow moving markets like aircraft manufacturers (i.e. Boeing vs. Airbus), they will publicly announce order books of not-yet developed aircraft types. Such that, it does not lead to a supply glut that tanks the market.

GPT4.5 signals price level for leading frontier models, its closest competitor is signaling too

As the most expensive API model on the market, OpenAI is definitively setting the price for this class of leading edge models. When the market leader moves, others in the market will take note.

Its second place competitor (Anthropic) has not yet released a similar class model.

Though, Dario Amodei, Anthropic’s CEO, has doubled down on a new phrase: “race to the top”.

Now, that may be in the context of model safety and responsibility.

But, if OpenAI reads the tea leaves, and it may understand there is potential willingness to cooperate tacitly.

For these firms, their current gross margins are too good to leave unprotected.

Are frontier labs making 80+% gross margins on LLM inference?

Mon, 03 Mar 2025 04:01:37 GMT

Rare insights into gross margin profile for a LLM model provider

Recently, as part of their open source week, DeekSeek disclosed their online inference system design and performance statistics. This system serves all of DeepSeek’s first-party model services (API and chat services).

Over a single day, DeepSeek utilized an average of 1,814 H800 GPUs (up to 2,224 GPUs at peak loads) to serve all of its inference workloads. H800 GPUs are US export-eligible variants of NVIDIA’s H100 GPUs.

At the moment, H100 GPUs are likely around ~$1.75 to ~$2.50 (per GPU per hr) on a long-term reserved basis. Using the same $2.00/H800 gpu/hr assumption as DeepSeek, their daily GPU inference costs are $87,072 for the entire inference cluster.

Currently, only usage of DeepSeek’s API services are monetized. Their chat services (via the web and mobile app) continue to be free.

Using the current DeepSeek R1 API pricing, the company said it could theoretically generate about ~$562k in daily revenues, representing ~84.5% in gross margins.

Extrapolating from DeepSeek, how much margins could other providers be making?

Mostly as a thought experiment, if other labs were able to incur some multiple N of DeepSeek’s inference cost structure, what would be their gross margins?

With DeepSeek R1’s very low pricing, they are already able to produce ~85% in gross margins.

If OpenAI served gpt-4o and Anthropic served claude-3.7-sonnet at the 1x DeepSeek’s cost structure, they would be making ~96.9% and ~97.5% gross margins!

Now, that may not yet be realistic as DeepSeek’s innovations do not transfer instantaneously to other labs. It will take some time for these labs to absorb the same cost improvements into their sprawling model training pipeline, and then into their inference systems.

But if we take a less favorable view, and assume OpenAI and Anthropic’s gpt-4o and claude-3.7-sonnet cost structures are 5x less efficient than DeepSeek.

Astonishingly, the less favorable gross margins are still ~84.3% and 87.3% respectively!

What kind of trade-offs did DeepSeek make?

DeepSeek’s V3 and R1 models created a lot of shock and awe early in the year.

Their transformer architecture improvements likely contributed significantly to their ability to serve V3 and R1 so cheaply with much less GPUs. Their MLA (multi-latent attention) innovation significantly reduced the Key-Value cache requirements, allowing them to parallelize the attention computations across much larger batch sizes (without running out of memory).

High batch sizes lead to higher throughput, at the cost of latency. Higher throughput means higher utilization of GPUs, and thus lower costs. You can read much more in-depth explanations on inference system trade-offs from Google.

Because their mixture-of-experts (MoE) has a very high sparsity factor (256 experts; 8 activated), they also need to utilize larger scale cross-node expert parallelism to achieve optimal load balancing for the high batch size.

As a result, Deepseek’s inference is much slower than other providers. DeepSeek is as slow as OpenAI’s recently released gpt-4.5; which is accepted to be a much larger model (larger models are slower to serve, and thus more costly).

DeepSeek: Compounding progress… delayed market reactions

Tue, 28 Jan 2025 01:54:35 GMT

It may surprise some people, but many of the improvements that DeepSeek (High Flyer’s AI lab) incorporated into DeepSeek V3 were released on May 7, 2024 as part of their DeepSeek V2 –almost 2 full months before Meta released their groundbreaking Llama 3 paper in July 2024.

Almost 9 months later, suddenly, public markets decided that NVIDIA should be worth ~$500B less. Okay…, there was a catalyst, DeepSeek released R1, their o1/3 comparable model the week before.

Not an overnight affair: Random walk with upward drift

Yet, the progress was gradual; an o1/3 class model in the public was an eventuality in 2025 [1].

Modern ML is mostly based on intuition, gut, and grunt work–betting on big ideas and empirically validating the results. One idea, built on top of the last useful one, without a clear line of sight on the next research breakthrough.

Most AI researchers are still compute/GPU constrained (or some people like to say the GPU rich vs. the GPU poor; maybe not the best turn of phrase). There is often a mile-long laundry list of experiments they wish they can implement/run on any given day. But not enough compute.

With some luck and sheer grunt work, researchers often stumble on new clever ideas that work. Folks have learned a long time ago (The Bitter Lesson) that letting machines learn is often the best way. By sheer coincidence, another team of researchers (from the Hong Kong University of Science and Technology) converged on the same R1 reinforcement learning findings as DeepSeek, published only a few days apart.

DeepSeek’s journey training LLMs did not start in 2024, it started long before as a side project at High Flyer [2] from around 2019 [3].

From DeepSeek V1 to R1: Leveraging other open source research

DeepSeek’s progress emerged from a sea of impressive and accelerating research. Mistrial and META’s open source stance only accelerated their progress. You can clearly see DeepSeek took many architectural and scaling guidance from the other labs (size of training runs in tokens trained, hyper parameters, etc..).

Timeline of highlighted milestones related to DeepSeek LLM development

JUNE 27, 2023: High Flyer announces proprietary internal HAI-LLM training framework
JULY 18, 2023: Llama 2 released - 2T token runs - 7B to 70B models
AUGUST 9, 2023: Reports of Chinese cloud providers stockpiling GPUs
SEPTEMBER 23, 2023: Mistrial 7B - rumored 8T tokens - beats Llama 2 7B
OCTOBER 23, 2023: Biden administration phase 1 GPU restrictions effective
DECEMBER 11, 2023: Mixtrial of Experts - open source MoE 8x7B (12.9B active)

Likely compounded/leveraged on top of their Mistrial 7B runs/checkpoints

JANUARY 5, 2024: DeepSeek V1 - 2T token runs - 7B & 67B models

Very similar to Llama 2 runs, uses GQA but deeper instead of wider models.

MARCH 8, 2024: Gemini 1.5 - close source acknowledgement of MoE in frontier model
MAY 7, 2024: DeepSeek V2 - 8T run - 236B MoE (21B active) close to Llama 2 70B performance

DeepSeek starts varying from the pack substantially…

New form of attention: Introduced MLA, improvement alone ~80%+ reduction in KV cache memory requirements (compared to comparable GQA). When combined with other memory optimizations, DeepSeek claims 93.3% reduction in KV cache.
New form of MoE: Introduced more flexible form of MoE with shared experts while still using auxiliary loss for load balancing
With these improvements, compared to a dense 67B model, DeepSeek claims ~578% inference throughput improvement.

JULY 31, 2024: Llama 3: META continuing their scaling program to 405B while increasing to 15T of tokens trained.
DECEMBER 27, 2024: DeepSeek V3: scaling up 15T run 671B (with 37B active) close to frontier models, GPT-4o, LLama 3 405B, Claude 3.5 sonnet.

DeepSeek continues their drive for training and inference efficiencies, faced with chips constraints (primarily memory bandwidth, as H800’s have ½ the bandwidth of H100’s)... the path now is clearly their own:

Improved MoE training: without auxiliary loss (learned routing, vs. heuristic)
Multi-token prediction objective during training: taking a lead from speculative decoding, though not used in inference
DualPipe training pipeling/scheduling framework: reduces bubble and communication bottlenecks during training
FP8 training dynamics: while accumulating in full precision without substantial loss in quality to reduce memory bottlenecks
NVIDIA SM’s allocation adjustments: Low level adjustment to allocate SM’s only for communication to reduce bandwidth bottleneck
Modular inference infra [4]: 2 inference infra setups specialized in (A) prompt processing (input tokens) and then (B) sampling (output tokens) to separate and scale to fit workloads and optimize distributed batch processing at scale.

JANUARY 22, 2025: DeekSeek R1: R1 model built on top of DeepSeek V3, close to OpenAI o1 performance and enables test-time compute regime

LLM development is a compounding phenomenon.

Labs leverages their last generation of models and builds on top of them. This is seen countless times again and again.

Mixtrial 8x7B is a MoE of their 7B (rumored).
Llama 3 models use Llama 2 models for filtering and data curation.
Gemini 1.5 teams used Gemini 1.0 generation models for evaluations, data curation, and hyperparameter extrapolations. Gemini 1.5 Flash distilled from Gemini 1.5 Pro.
DeepSeek V3 used DeepSeek V2.5 models for data generation in post-training.
DeepSeek R1 is built on top of V3 base model.

DeepSeek V2 release in May 2024 was the beginning of DeepSeek charting its own path, rather than simply imitating others.

Its V3 release very much cemented their self-efficiency in continuing to progress LLM research. From DualPipe, multi-token prediction, FP8 training dynamics without little loss in quality, and finally to providing 2 full pages of hardware “suggestions” to “hardware designers” (read NVIDIA), these are not behaviors of imitators.

R1: Reinforcement learning and test-time compute will accelerate inference demands

DeepSeek R1 claims to be on par with OpenAI o1/3 in benchmarks. LLM expert users are already impressed and running these models on different infra and model configurations (DeepSeek released R1 distillations down to 1.5B size, which can run on most modern MacBook Pro’s with 16+GB memory).

As reinforcement learning will require some method of evaluation (to give feedback to the system whether something is correct or not), inference as a type of workload will only continue to increase.

For OpenAI, in the middle of 2024 (before the release of o1), inference costs are already dominating training costs (~$7B in compute costs, ~$4B for inference towards ChatGPT). This shifts from training to inference workloads will accelerate, to include inference workloads targeted for research for test-time compute regime models (evaluations, sample generation, etc..).

Hundreds of Billions of CapEx in 2024-2026: Will only accelerate model scale ups and intelligence progress

"I will say that Deep Learning has a legendary ravenous appetite for compute, like no other algorithm that has ever been developed in AI. You may not always be utilizing it fully but I would never bet against compute as the upper bound for achievable intelligence in the long run. Not just for an individual final training run, but also for the entire innovation / experimentation engine that silently underlies all the algorithmic innovations."

– Andrej Karpathy

With open replications of R1 findings either in progress or complete, inference demands will only accelerate. Any model can become a reasoning model; where you can adjust a “knob” to increase test-time compute to get a more reliable answer.

Labs will be more comfortable scaling up when inference is cheaper with new forms of attention (MLA) and MoE’s. The test-time compute regime gives consumers of models the power of accuracy as a function of cost.

How much do you care about a more accurate response for a given task? $1k? $6k? $15k? This can now be within your control.

---

[^1]: Whether it was META open sourcing their work with LLAMA 4, or the community finding the needle in the haystack of speculation/experiments.

[^2]: See interviews from 2023 and 2024 with the High Flyer CEO Liang Wenfeng.

[^3]: See High Flyer AI blog posts going back to 2019

[^4]: This is probably not entirely novel, and already being done by most inference providers, prompt processing can be incredibly cheap and speed up inference when done with enough volume in the workload. Though DeepSeek’s wide MoE footprint will make extra benefit from this modular approach by increased parallelism across nodes (as shown by their minimal 40-node sampling setup).

Embracing the new era of computing, communication ... and energy

Wed, 22 Jan 2025 20:33:55 GMT

We're still in punchcard era of LLMs, designing prompts, copy pasting context around, hitting go, reading the thing, prompting occasionally. Pretty lame. If there are fewer than a few thousand tok/s of sustained throughput generated on my behalf do we even have AI

- Andrej Karpathy

Making it easier to communicate -- Attention

If we take a short journey back in time, generative AI (LLMs, diffusion models, etc..) started to emerge in the late 2010's, primarily driven by the clever introduction of attention into existing neural network architectures.

At the time in 2014, researchers were still trying to make deep learning work for machine translation. Ultimately, attention was a clever mechanism to help models adopt a method of communication between parts of a sequence (whether its nodes within the same vector i.e. self-attention, or nodes across different vectors i.e. cross-attention, etc...). Since 2017 transformers formally appeared as a new neural network architecture, it has transformed into a powerful tool partially because it was possible to parallelize the computation of attention across the modern computing accelerators (GPUs, TPUs). Then, it was possible to make bigger models, and train them on more data... i.e. scaling pre-training.

Scaling large language models was a slow start and took almost half a decade: the GPT's (1-3), BERT, T5, LaMDA, PaLM, etc... from 2018 to 2022.

But progress has been tremendous, and the quality of models has improved dramatically from new entrants and incumbents alike: GPT-4, GPT-4o, Claude, Gemini, Llama, Mistral, Cohere, Qwen, DeepSeek, O1, etc... until today.

Measuring progress, the year of benchmarks and evals

In addition to the cost of inference, measuring progress is becoming more difficult as the models are getting better and better and beating many contemporary benchmarks.

To measure a model's ability to reason with acquired knowledge from training, MMLU was released in 202. In 2024, models were reaching 90+% on MMLU. In 2024, MMLU Pro was released to provide more difficult tasks, and in early 2025, models were already scoring 80+% on MMLU Pro.

For coding and software engineering, SWE-bench was released as realistic real world software engineering tasks in 2024, and in early 2025, models were already scoring 70+% on SWE-bench.

Leading AI labs have always been transparent about engaging the community to create new novel benchmarks/evals to evaluate their models. Recently, some of these labs have been criticized for waiting to disclose funding to fund new benchmarks. Personally, I doubt there was any bad intent.

It's clear that existing benchmarks are being saturated. Progress is increasingly subjective to the type of task and domain. 2025 will be the year of evals; perhaps new standards and systematic methods will emerge between the labs and the community.

Curse of success -- scaling inference supply to meet demand vs. research progress

Frontier AI labs have had tremendous success in distributing their models to the public. OpenAI expects $3.7B and $11.6B in sales in 2024 and 2025. Azure, AWS, GCP, Oracle Cloud, and others had fantastic AI-driven tailwinds in their cloud computing businesses [1].

For research organizations like OpenAI, there is a rising tension between compute for inference vs. compute for research.

At the same time, the cost of measuring progress is becoming more difficult--and costly. At the end of 2024, for the o3 model's ARC benchmark, OpenAI spent at least $1M-2M [2] to run a single benchmark (once!).

Through model development, it's understood as a training run progresses, evals are run at intervals to measure progress of training. If we assumed that while training o3, it took 5 ablation experiments, for each experiment it took the equivalent of 10,000 training steps. Then, if the ARC benchmark was run at every 1,000 step, then it would have run 5 x 10 = 50 times. 50 x $1.5M is $75M (!!). While this is pure speculation in the details, it's a good proxy to understand the cost of measuring progress. For $75M (which is some X% of the cost of o3 development cost, as it does not include GPU hours for training the model), one can train a GPT-2 class (circa 2019 generation LLM) model ~110,000 times [3].

Yet, there's a lot of research progress to be excited about.

Progressing transformer architectures

Simply within the transformer architecture, DeepSeek's developments mostly from DeepSeek (V3 and R1) in MLA (new form of attention), MoE routing without auxiliary loss (taking inspiration to allow models to learn routing vs. heuristics), multi token prediction (incorporating the insights from speculative decoding), RL pipeline for reasoning with R1 (progressing to the first open source test-time compute frontier class model) show that a small team that's not distracted with inference demands can accelerate the rate of experimentation from ideas [4]. Their DualPipe distributed training improvement shows GPU continued algorithmic improvements in resource allocation/scheduling to maximize hardware use (they also made other clever adaptations like FP8 mixed precision to reduce memory use, as H800 has lower memory resources).

Meta's findings with byte latent transformer shows that there may be a path away from fixed tokenizers and vocabs and they can be learned. Tokenizers can be blamed for a lot of problems with LLMs (i.e. the infamous how many R's are in the strawberry question). This could also provide a path to consume more data in training without using heuristics to pre-process data into sequences.

DeepSeek's models and Meta's Llama both acknowledge using the previous generation's model to generate data to progress to the next generation of models, signs of compounding progress. Previous work compounds and yet open sourcing allows the entire community to progress at a lower global capital cost which accrues value to more of the community.

Reducing compute burden for the same tasks and performance

Other than continuing to optimize and progress the transformer architecture, there are other ways to reduce the compute burden for the same tasks and performance.

There are many teams working on a variety of bets, some cool examples:

State space models (non-attention based architectures): Mamba, rwkv, etc.. to reduce the attention quadratic runtime complexity.
Extreme quantization: BitNet models are 1.6 bit models (with 4 bit activations) which are much cheaper to run and can run on CPU's as matrix multiplications are removed.

For the leading AI labs and hyperscalers, they are racing to build more data centers designed for AI workloads. In the short term, compute remain in a supply crunch (primarily due to chips shortage--see NVIDIA stock price). Data center construction has skyrocketed with the inflow of capital. The next physical constraint will be energy.

Energy and AI

Vaclav Smil published a book in 2022 called "How the World Really Works: The Science Behind How We Got Here and Where We're Going". The short answer: energy. Taking aside how Smil may feel about the progress/solutions of the energy transition, it is clear that our modern world--particularly the developed world--runs on energy.

An average inhabitant of the Earth nowadays has at their disposal nearly 700 times more useful energy than their ancestors had at the beginning of the 19th century.

[...]

Translating the last rate into more readily imaginable equivalents, it is as if an average Earthling has every year at their personal disposal about 800 kilograms (0.8 tons, or nearly six barrels) of crude oil, or about 1.5 tons of good bituminous coal. And when put in terms of physical labor, it is as if 60 adults would be working non-stop, day and night, for each average person; and for the inhabitants of affluent countries this equivalent of steadily laboring adults would be, depending on the specific country, mostly between 200 and 240.

-- Vaclav Smil, How the World Really Works (emphasis mine)

The continued adoption and development of AI will require more energy than ever.

In 2025, Microsoft announced $80B of capex for AI data centers. In January 2025, The White House, OpenAI, Softbank, and Oracle announced a $500B investment in data centers and energy over the next 4 years. Amazon expected $75B in capex in 2024 mostly related to AWS, only to grow in 2025. The trend is the same for other players like GCP, Meta, etc...

AI data centers are joining the ranks of other items in the energy transition (i.e. electric vehicles, etc...). Forecasts of AI energy demands vary widely (from ~2x to 5x current data center energy demands by 2030). BloombergNEF's Michael Liebreich lays out a more nuanced perspective in this new generation of data center growth: we've seen this before but market dynamics, stakeholders governance, and energy consumption efficiency will all play a factor at moderating demand and supply.

Personally, I'm optimistic that physical constraints won't bottleneck the rate of progress derived from adding more compute. Capital is moving rapidly to balance the supply, and energy constraints are likely to be mitigated by more algorithm and hardware improvements (similar to the 2000's and 2010's as cloud computing adoption took off, when similar stakeholders cried out for more energy, but as a proportion of US energy demand, data center energy consumption grew relatively gradually due to more efficient hardware and software design).

AI and the Physical World -- How Humans Communicate With Machines

In 2025, in the developed world, most consumers spend their time on their phones and laptops. Let's look at how humans do a common task: shopping on Amazon's mobile app.

Tapping on a phone screen: Shopping on Amazon

Each user interaction follows a carefully orchestrated flow:

1. Physical Input → Mobile OS

User taps or swipes generate touch events
OS interprets and routes events to the application layer

2. App ↔ Server Communication

App sends HTTP/TCP requests to backend servers (i.e. Amazon ecommerce backends with product listings, ads, etc...)
Servers process requests (database lookups, payment validation)
Communication is via the internet backbone (i.e. TCP/UDP connections) that route through a host of hardware and network software; not to mention the cryptography to ensure the data is secure.

3. Server ↔ Services

Servers coordinate with other systems (inventory, payments, authentication, etc...)
Data flows through multiple service layers to aggregate the information needed to respond to the user's request.

4. Response → User Interface

Results return to device via the internet backbone
App updates UI based on new state
User perceives change and decides next action

Each loop is tightly scoped and optimized, typically requiring only 100,000 to 100,000,000 FLOPs [5] per iteration of this loop . The system is engineered for speed and responsiveness through these small, discrete steps, which are then repeated in very rapid succession hundreds or thousands of times in a normal user session.

The Human Factor: Bearing the Cognitive Load

While the visible computation is relatively lightweight, humans shoulder most of the cognitive burden:

1. Persistent and Context-Aware Computation

Users must constantly perceive and interpret their environment
Humans translate intent into discrete interface actions

2. Interface Navigation

Humans learn and adapt to predetermined UI patterns
Users bridge gaps between their intent, environment, time, and available actions

The system's efficiency comes from delegating most adaptive intelligence to the human user, keeping machine computation minimal but requiring significant human cognitive work.

Yet, the efficiency didn't come for free. On the other side of the user consuming the interface, the interface was designed by other humans on top of a stack of software and hardware to enable the loop: chipset, operating system, network, application, etc...

The fixed cost is amortized over the many users that consume the interface over its lifetime.

LLM Inference: A Different Paradigm

Large language models with 70B parameters take a contrasting approach, requiring 20,000,000,000,000 to 200,000,000,000,000 FLOPs [6] per inference - 200,000-2,000,000x times more than typical app interactions. However, they enable open-ended, natural language communication in a single pass.

With a fixed defined vocabulary space, the model can consume sequences of arbitrary length (up to a fixed length) and output a sequence of arbitrary length (up to a fixed length).

This is unlike programming, where there are syntax constraints that will limit the runtime/execution of the program with the underlying stack.

Can new AI systems balance the communication burden?

The evolution of human-computer interaction may point to a future where we're gradually shifting cognitive load from humans to machines. Traditional interfaces require humans to:

Learn specific interaction patterns
Maintain context and state
Translate high-level goals into discrete steps

The integration of LLMs into system architectures could happen at multiple levels:

1. Application Layer

LLMs could augment existing interfaces as an intelligent assistance layer
Natural language could complement rather than replace traditional UI elements
UI's can be fluid and generated on the fly depending on input from the user and the environment

2. Framework Layer

Web and mobile frameworks could incorporate LLM-powered components
Development tools could use LLMs to generate more adaptive interfaces easily

3. System Layer

Operating systems could employ LLMs for more intelligent resource management
System calls and memory allocation could become more context-aware (i.e. hardware aware, workload aware, etc...)
Kernel operations could adapt to usage patterns and requirements instead of using heuristics

Remembering attention is a form of communication

Large language models are marvels of deep learning. Modern human language is excellent at compressing information. Attention mechanisms allowed machines to learn via communication between many many different nodes of a sequence in multidimensional spaces.

If we move the unit of analysis from language and sequences to humans and the physical world, where else could we leverage the added benefits of learned machine communication?

In the LLM paradigm: Learned data processing vs. pre-determined data processing

Since the explosion of AI, industry has been predicting the explosion of data generated/stored/consumed. At 2024 NeurIPS, Ilya Sutskever predicted that the era of pre-training is over, but data continues to be the fossil fuel of the AI era.

But if we extend the previous section's analysis of human-computer interaction, we can see that data today is mostly generated and consumed in small discrete steps/iterations. Media content is slightly more continuous (i.e. videos, audio) but the data are still discretely packaged, collected, and consumed at the user's direction.

Since 2024, the frontier models have also expanded in modalities (both input and output; image, audio, etc...) and inference speed (i.e. realtime voice/video API's from OpenAI and Google).

There are applications of transformers where the inputs and outputs are more continuous in nature, such as in the classic self-driving car example: Waymo Research's 2024 work on transformer driven trajectory prediction–taking realtime continuous perception and scene data inputs to predict future motion trajectories of objects in the environment.

Yet, the data is still discretized and processed into features and tokens to be made into a sequence for processing; researchers spent effort empirically to find the best way to process the data into a sequence to achieve the best performance.

To add new modalities and formats of data to models, models have to be re-trained or trained for longer with the new data in a different way. The vast majority of this is still determined by humans based on empirical heuristics.

Perhaps, this is the reason why I have a lot of excitement around ideas like Meta's byte latent transformer. All data can be encoded into a sequence of bytes, and the model can learn to process the data in a way that is optimal for the task. The caveat is that this will likely require larger compute budgets than labs are willing to experiment with (especially when the current methods are working well for their existing use cases and users).

At the top of the learning curve for Generative AI

Fri, 15 Nov 2024 01:56:39 GMT

The mass adoption of generative AI models has yet to arrive.

Its growth will be driven by fundamental factors as the technology starts its journey down the learning curve, lowering the cost in developing and deploying models towards every general application. In the next 3-7 years, generative AI models will be as commonplace as gradient boosting and decision tree models for prediction and classification tasks.

By way of a simple analogy about the retail industry:

Within an apparel retailer, there are 2 critical business functions: branding & marketing, supply chain & distribution. High performance supply chain operations and distribution is often buttressed by the accuracy of the firm’s demand and supply forecasts. Over-forecasting demand could easily wipe out the firm’s quarterly profits. Under-forecasting demand will greatly limit growth.

Prior to the arrival of modern sample and compute efficient ML techniques (i.e. XGBoost and random forest) for predicting time series data, leading global retailers (i.e. Nike, P&G, LLBean) crafted demand and supply forecasting models by hand. Even in the early 2000’s and 2010’s, these firms hired enormous teams of business analysts and managers to fine-tune and manage supply chain and distribution forecasts.

Yet, by the late 2010’s, a novice computer programmer can easily train an highly accurate demand/supply forecast model (based on gradient descent) with a few lines of code and a small historical dataset. These models power the modern e-commerce world (i.e. Amazon, Walmart, etc..) predicting the demand and supply of their product lines to the minute and hour.

While the large language models have disrupted the world, the underlying mechanics and intuitions have not changed. The primary difference is that transformer-based LLM’s require substantially more (1) compute, and (2) data; many orders of magnitude more of both ingredients for the model to learn as generalized models.

The barriers to development will continue to decline.

Compute supply is currently constrained. Yet, chip supply is classically cyclical (over the past 40-60 years), and it will soon enter a cycle of oversupply. There will continue to be systems engineering constraint problems to be solved; but I am hopeful that the current rate of capital investment will be sufficient to unlock these barriers (i.e. the physical practicality of a $10B datacenter vs. a $100B datacenter in terms of connectivity, power, and management).

Data will be a more multi-faceted and complex problem; there are at least 3 factors: (1) specialized/fit-for-purpose data, (2) volume of world representational data, and (3) systems engineering to ease the distribution and ingestion of data in large-scale training.

Specialized data (#1) and volume of world representational data (#2) are somewhat orthogonal to each other, but it is important to understand them together.

Specialized data (#1) is a pre-existing constraint of ML for current users and firms; nothing has fundamentally changed about whether a firm or user has high-quality proprietary data or not. If a firm has historical data relevant to its products, then it is absolutely in the best position to use that data to develop and deploy generative AI applications.

World representational data (#2) is a continuous research problem. It is known that current datasets are not exhaustive and comprehensive of our world; yet there is some fear of “running out of data”. Capital continues to flow to firms which conduct foundational research for data formats, modalities, and techniques to represent our entire world.

Yet, with the sliver of world representational data (mostly from the internet), leading research labs have already shown the tremendous potential of large models by the current releases. While there is likely some declining returns to scale in a continuous fashion, I am hopeful that there will be some step-level discontinuity improvement in this area from modalities and format research. For example, neural networks are notoriously bad at representing 3-dimensions, various tricks are utilized to navigate this problem; yet humans only experience the world in 3-dimensions. Humans learn representations with much greater sample efficiency.

Scaling up data and compute infrastructure (#3) is a classic systems problem that is being solved by firms like Databricks and leading AI labs. This is unlikely to be a primary constraint to value creation.

The accumulation of value creation from generation AI will accrue to the users and the rest of the global economy.

As these barriers decline, training and/or deploying generative AI models and applications will become commonplace. We are likely far from reaching the bottom of the learning curve for this category of technology. As a result, leading model builders today must rely on their distribution-driven economies of scale to grow the market and retain share.

Fortunately, because we are still the top of the curve and in the early innings, there is still much to do to realize this vision. So much has yet to be built, and many opportunities lie ahead.

Renting clothes in Japan, "Speak Now", and Jobs

Sun, 09 Jul 2023 07:00:00 GMT

Recent fun news

Save on baggage, rent clothes when visiting Japan, says Japan Airlines (JAL). Travelers visiting Japan via JAL are being offered a new service called “Any Wear, Anywhere” to ditch their heavy baggage and pack lighter. Through the service, you can rent six tops and three bottoms for 6,000 yen, or about $45, for two weeks. The airline believes this experiment can reduce luggage weight per flight, leading to less fuel use.

“Speak Now”: Taylor Swift continues to re-record her old albums, releasing the latest version on July 7, 2023. Ms. Swift has since re-recorded 3 out of the first 6 of her albums. Shamrock Holdings—a private equity firm owned by the estate of Roy E. Disney—bought the rights to those albums for $300M in 2020. Perhaps unsurprisingly, each release of a re-recorded album lifts Taylor’s entire catalog in the streaming charts (original and re-recorded). As music right holders are paid by streaming counts, each releases provide a large injection of music sales for both Ms. Taylor and Shamrock Holdings.

Jobs report: U.S. job creation in June was 209,000, falling short of the projected 225,000. This marks the first time in 14 months that actual numbers feel short of consensus, according to Bespoke Research.

Pension funds start unloading private equity assets. The New York State Teacher’s Retirement system is looking to unload $6 billion of assets into the secondary market. Private equity funds are usually close-ended for 8-10 years, these movements in secondaries suggest the smart money wants to take profits while the marks are still frothy.

This week’s Speed Read

1. Some updates on office commercial real estate (CRE): Office commercial properties are started an upward trend towards high delinquencies since Jan 2023, from ~2% to ~4%.

2. In Q1 2023, offices dominated CRE foreclosures (~63% of all CRE foreclosures).

3. Higher quality office spaces fare substantially better, many metros (Boston, D.C., Manhattan) even continuing to see positive net absorption (incremental occupied squared footage) since 2021.

4. CMBS (commercial mortgage-backed securities) market also signals similar discontinuities through bond spreads between b/w higher credit rating (Triple-A) vs. lower credit rating (Triple-B) bonds. The spread is ~6%, which is nearing March 2020 pandemic levels.

5. Weekly bankruptcy filings moving back up to near GFC (2008 global financial crisis) and March 2020 (pandemic) levels.

6. Inflation stubbornly not going away, with core PCE around ~4%.

7. Wall Street continues to revise to higher inflation expectations in 2023 and 2024. 2023 consensus estimate at ~4.4%.

8. Global temperatures hit another record high on July 6, 2023, 17.23 Celsius.

Speed Read Charts

This week’s fun fact

On May 4, 2023, the Federal Reserve’s FOMC (federal open markets committee) increased the federal funds rate by 25 basis points to 5.00% to 5.25%.

As of Jul 9, 2023, the market has priced in a 93% probability that the committee will raise the range another 25 basis points at the Jul 26, 2023 meeting.

Bidenomics, AI money, problems for bankers

Sun, 02 Jul 2023 07:00:00 GMT

Recent fun news

This week, the White House unveiled a new push for “Bidenomics” (press memo). In Philadelphia a few weeks ago, President Biden apparently said he does not know “what the hell it is.” From the memo, it mentions jobs, middle class, infrastructure, and it keeps saying it is working?

More money for chips and NVIDIA. Money continues to flow to generative AI startups making foundational models. MosaicML exited for $1.3B to Databricks. Inflection AI announced finishing a $1.3B round of funding, valuing at $4B.

Cumulative funding raised by foundational model startups:

OpenAI $11.3B
Inflection AI $1.525B
Anthropic $1.5B
Cohere $445M
Adept $415M
Runway $237M
Character.ai $150M
Stability AI $100M

Orcas have been intentionally ramming into ships on the ocean. The behavior has spread from the seas around the Iberian peninsula to the North Sea between Scotland and Norway, almost 2,000 miles away. Many theories, no answers.

Wall Street continues to downsize. Goldman Sachs cuts 125 Managing Directorsglobally. JPMorgan cuts 40 dealmakersin North America. The Swiss bank UBS plans to cut 30,000 jobs this year.

Move aside Silicon Valley Bank, FDIC data shows Bank of America’s ~$100B in paper losses from its bond investments made during the pandemic-driven deposit increase. By comparison, SVB had a ~$16B loss in its held-to-maturity portfolio. Bank of America’s bet on securities is leading it to trail its leading competitor (see number 7 in Speed Read), JPMorgan—who decided to sit on its pandemic cash instead.

Thanks for reading Myriad Perspectives! Subscribe for free to receive new posts and support my work.

Subscribed

This week’s Speed Read

1. The monthly mortgage payment for a new average purchase loan size is close to $3,000, nearly 2x from ~$1,000 to ~$1,500 range from 2000 to 2019.

2. Every single inflation forecast from the Federal Reserve since June 2021 has been wrong.

3. In emerging countries (i.e. Brazil, India, Mexico), the cost of capital to build renewable energy infrastructure (i.e. a solar far) is ~2x higher than developed countries—EU’s ~4% cost of capital vs. India’s ~10% cost of capital.

4. For the first time, in Q1 2023, China topped Japan to be world’s largest exported of automobiles.

5. Morgan Stanley thinks the US will see its first negative payroll month around the end of this summer (August/September).

6. Since the start of the Ukraine war, ~25% of Russia’s crude oil continued flow to Western countries in the EU, the US, and the UK, through the conduits of Turkey, India, UAE, and China

7. In Q1 2023, Bank of America’s net interest rate spread (the difference between the interest rate it earns from its loans and assets, and the interest rate it pays to depositors) was 1.43% compared to JPMorgan’s 2.04% (42% higher).

Speed Read Charts

This week’s fun fact

France has laid the most undersea cables, closely followed by the United States.

Each of these countries have laid more than 500 thousand kilometers of cables, more than 1.3x the distance from Earth to the Moon.

Taylor Swift boon for hotels? Masayoshi Son is back.

Sun, 25 Jun 2023 07:00:00 GMT

Recent fun news

Taylor Swift tours are very good for hotels, termed the “TSwift Lift”. ~80% revenue bump in Nashville, and 10-20% bump in large metros like Chicago and NYC. I guess scalpers are not the only winners. San Francisco hotel operators can rejoice once Ms. Swift arrives at Levi Stadium later in July.

Masayoshi Son returns with his signature hyped slides at this year’s Softbank shareholder meeting. AI will solve everything? Find its full glory here.

Thanks for reading Myriad Perspectives! Subscribe for free to receive new posts and support my work.

This week’s Speed Read

1. Commercial credit card rates topped 20.1%, highest since ~1970.

2. New business applications remain elevated at ~175% pre-pandemic levels.

3. U.S. home listings continue to sink to lowest level on record since ~2012.

4. Uneven home price level effects across regions: SF Bay Area generally down ~12%, whereas NYC neutral ~0%, Miami up 8%.

5. ~80% chance moderate El Nino, ~50% chance strong El Nino this winter.

6. Global sea surface temperatures have been abnormally high (+1.1 degrees from 40-year average) in 2023 so far.

7. Since Medicaid started unwinding COVID-19 renewal provisions, ~1.5M enrollees have been disenrolled (~300k in Florida, ~150k in Arizona).

Speed Read Charts

This week’s fun fact

Ensign Peak Advisors, Inc manages $124B in assets for the Church of Latter-day Saints, also known as the Mormon Church.

Full faith and credit of the United States

Fri, 26 May 2023 05:00:00 GMT

The strange world of a potential default on U.S. Treasuries

Risk-free rate

In 1964, William F. Sharpe introduced a theory of capital asset pricing[1], commonly referred as Capital Asset Pricing Model (CAPM). Today, CAPM is the foundation for modern financial asset price theory[2]. For his work, Professor Sharpe of Stanford University received the 1990 Nobel Prize in Economics for his foundational contribution to financial economics[3].

The CAPM assumes the existence of a type of riskless asset. If an investor invests in a riskless asset, the investor is guaranteed to receive a rate of return, called the risk-free rate.

This risk-free rate is a key input in the pricing of near all types of financial assets, including stocks, debt, options, and real estate. A fundamental assumption in valuations, the cost of capital is calculated from an assumed risk-free rate.

While choosing the best risk-free rate is quite literally an academic exercise[4]. The global financial system, for the most part, uses U.S. treasury yields[5] as proxies for the risk-free rate.

The global financial system assumes U.S. treasury debt is a type of riskless asset. In financial lingo, U.S. treasuries have zero credit risk. The investor assumes they will always receive their money on time:

Risk of U.S. treasuries defaulting = 0

So, what if risk > 0?

The United States federal government has never defaulted on its debt. Such an event has never been observed by the world. It will be unprecedented.

Since January 19, 2023, the United States federal government had been unable to issue more debt beyond its $31.381 trillion statutory limit[6]. As a result, the Department of Treasury projects that the federal government will run out of money on June 1, 2023[7]. If the debt limit is not raised by then, the United States will default on its debt.

The frightening reality is simple. The world likely will still refer to U.S. treasuries as the predominant proxy for riskless assets.

As a result, if the U.S. defaults on its debt, it is quite likely that investors will buy more U.S. treasuries.

Why? In a world of risky assets, U.S. treasuries will remain the safest type of asset, despite of a default. In financial uncertainty, asset managers will run to safe assets, which ironically will be U.S. treasuries[8].

Will markets function?

Beyond the heavy financial pricing theory talk, if the U.S government defaults, there are simple questions that are fundamental to the largest and most liquid capital market in the world: the market for U.S. treasuries.

Will investors get advance notice of a certain delay in payment? Will it be 1-day or more?
When may payments resume? Will the Treasury Department provide compensation for defaulted Treasuries?
Will the Federal Reserve accept defaulted Treasuries as collateral?
Will other banks and parties continue to accept defaulted Treasuries as collateral?
Which types of Treasuries will default first?
Are defaulted Treasuries transferable?
Will money market funds be forced to sell Treasuries in an event of a default?

While some banks[9] seem to offer answers to its clients on these questions, it is my hope that we never have to find out whether their answers were correct.

In 2015, the Swiss National Bank suddenly stopped pegging the Swiss franc (CFC) to the Euro (EUR).

Some computers at trading firms continued to assume:

1 CFC = 1 EUR.

Calamity ensued, instantly bankrupting firms due to lack of funds and/or liquidity[10].

Let us hope the Congress decides to pay its debts, and we do not have to find out how many computers hard coded:

Risk of U.S. treasuries defaulting = 0

[1] https://doi.org/10.1111/j.1540-6261.1964.tb02865.x

[2] This is not to say CAPM is completely accurate. Notably, in 1993, Fama and French from the University of Chicago identified other factors that should be considered besides the risk of the market portfolio. https://doi.org/10.1016/0304-405X(93)90023-5

[3] https://www.nobelprize.org/prizes/economic-sciences/1990/press-release/

[4] You can peruse Google Scholar for the mountain of work debating the best risk-free rate in every circumstance.

[5] https://home.treasury.gov/resource-center/data-chart-center/interest-rates/TextView?type=daily_treasury_yield_curve&field_tdr_date_value_month=202305

[6] https://home.treasury.gov/system/files/136/Debt-Limit-Letter-to-Congress-20230119-McCarthy.pdf

[7] https://home.treasury.gov/system/files/136/Debt-Limit-Letter-to-Congress-Members-20230522-McCarthy.pdf

[8] Not just my word, the word of a survey of asset managers conducted by JPMorgan Chase, as part of “Q&A on a US Treasury Technical Default” (May 19, 2023, JPMorgan Chase Fixed Income). Excerpt can be found on https://www.ft.com/content/e4639f71-b69f-4d99-867d-dffd1733779d

[9] https://www.bloomberg.com/opinion/articles/2023-05-23/can-markets-handle-the-debt-ceiling

[10] https://www.wsj.com/articles/snb-shocks-bankers-and-markets-1421342951

Johnny Lee

Dynamism as a supercomputing race

Trust: why companies by software

Something Small is Happening

How to even keep up with explosive AI progress?

How AI Is Changing the Way Students Learn

OpenAI's road to become a hyperscaler

Power & Fab Capacity: Last Jigsaw Pieces in the Dash for Compute

Cut-throat AI competition, captive resources, Google's AI moat

A.I. usage surging across America

Crossroads Ahead: US Market Confidence Belies Historic Uncertainty in Economic Growth

Liberation Day unleashed the next stage of trade warfare

2025 starts with economic contraction

The crossroads ahead

Events to watch

Optimizely for Intelligence

The Road Towards Intelligence

1. Compute

2. Data

Country of Geniuses vs. Country of Yes-Men

AI usage today

Chatbot Products

LLMs Embedded in Existing Applications

APIs in the Background

Adapting to the New Age of Intelligence & Cost-Benefit Analysis

Optimizely for Intelligence

GPT4.5: Good or not, its launch is good for the LLM serving business

Price dynamics in a competitive market

Outright price collusion is illegal, but signaling and tacit collusion is not

GPT4.5 signals price level for leading frontier models, its closest competitor is signaling too

Are frontier labs making 80+% gross margins on LLM inference?

Rare insights into gross margin profile for a LLM model provider

Extrapolating from DeepSeek, how much margins could other providers be making?

What kind of trade-offs did DeepSeek make?

DeepSeek: Compounding progress… delayed market reactions

Not an overnight affair: Random walk with upward drift

From DeepSeek V1 to R1: Leveraging other open source research

Timeline of highlighted milestones related to DeepSeek LLM development

LLM development is a compounding phenomenon.

R1: Reinforcement learning and test-time compute will accelerate inference demands

Hundreds of Billions of CapEx in 2024-2026: Will only accelerate model scale ups and intelligence progress

Embracing the new era of computing, communication ... and energy

Making it easier to communicate -- Attention

Measuring progress, the year of benchmarks and evals

Curse of success -- scaling inference supply to meet demand vs. research progress

Progressing transformer architectures

Reducing compute burden for the same tasks and performance

Energy and AI

AI and the Physical World -- How Humans Communicate With Machines

Tapping on a phone screen: Shopping on Amazon

The Human Factor: Bearing the Cognitive Load

LLM Inference: A Different Paradigm

Can new AI systems balance the communication burden?

Remembering attention is a form of communication

In the LLM paradigm: Learned data processing vs. pre-determined data processing

Other interesting questions/possibilities in the LLM paradigm

At the top of the learning curve for Generative AI

The mass adoption of generative AI models has yet to arrive.

The barriers to development will continue to decline.

The accumulation of value creation from generation AI will accrue to the users and the rest of the global economy.

Renting clothes in Japan, "Speak Now", and Jobs

Recent fun news

This week’s Speed Read

Speed Read Charts

This week’s fun fact

Bidenomics, AI money, problems for bankers

Recent fun news

This week’s Speed Read

Speed Read Charts

This week’s fun fact

Taylor Swift boon for hotels? Masayoshi Son is back.

Recent fun news

This week’s Speed Read

Speed Read Charts

This week’s fun fact

Full faith and credit of the United States