What's Happening With AI Token Prices, and What It Means for Your Budget

A practical look at why AI token prices can feel both expensive and rapidly cheaper, and how falling inference costs change what customers should expect from AI products.

Shrijith Venkatramana

Jun 7, 2026 — 7 min read

If you prefer watching a video version of this article, check out:

Every few months, we seem to hear two completely different stories about AI pricing.

On one side, AI labs and industry commentators tell us that AI is getting dramatically cheaper. If current trends continue, they argue, we'll eventually stop thinking much about token costs at all because intelligence will be abundant and inexpensive.

On the other side, developers are staring at real AI bills. Teams building agents are running into rate limits. Companies are spending meaningful amounts of money on inference. For many organizations, token costs are still a very real constraint.

So which story is true?

The answer is: both.

To understand why, it helps to look beyond the headlines and examine what's actually happening to the economics of AI inference.

From Software Eating the World to AI Eating Software

Over the last few decades, software transformed almost every industry.

Healthcare, finance, education, transportation, government—software touched all of them. Marc Andreessen famously described this trend as "software eating the world."

Now we may be entering the next phase.

Increasingly, AI agents are starting to eat software itself. Instead of clicking through interfaces and workflows, users can simply tell an AI what they want done and let it handle the work.

That's why token economics matter.

The cost of tokens ultimately determines the cost of the products, agents, and applications being built on top of modern AI models.

Why Token Economics Matters

Take AI-powered code review tools as an example.

At LiveReview, we compete with products like CodeRabbit and other AI-assisted code review platforms. Whether you're evaluating a hosted product or a lighter-weight Git-based workflow like git-lrc, one of the most important questions is simple:

If a customer spends one dollar, how much value do they get?

Can we provide more useful reviews? Better analysis? More actionable feedback?

Those questions are directly tied to token costs because token costs sit underneath almost every AI product.

One of the reasons we're optimistic about the future is that customers should be able to buy more intelligence for the same amount of money over time.

In practical terms, that means the same budget should eventually buy more reviews, more analysis, more automation, and better outcomes.

To understand why, it's worth revisiting a familiar idea: Moore's Law.

Moore's Law and the Economics of Compute

Moore's Law is one of the most important trends in the history of technology.

Originally proposed by Intel co-founder Gordon Moore, it observed that the amount of compute available per dollar roughly doubled every two years.

If one dollar bought 100 units of compute today, then in two years it might buy 200 units. Two years after that, 400 units. Then 800.

That steady increase in compute-per-dollar helped drive the software revolution.

But what's interesting is that parts of the AI industry now appear to be moving even faster than traditional Moore's Law.

The Emergence of Tiered Super-Moore's Law

Recent research has introduced the idea of "Tiered Super-Moore's Law."

The basic observation is straightforward.

New AI capabilities usually appear first in expensive frontier models. Then competition, optimization, infrastructure improvements, and engineering innovations rapidly drive down the cost of delivering those same capabilities.

You can think of the market as having three broad tiers.

Frontier Models

These are the most capable models available.

Think GPT, Claude Opus, Gemini Pro, and other flagship systems operating at the cutting edge.

These models are typically priced above $5 per million input tokens.

Mid-Tier Models

These models offer strong performance at much lower prices.

Pricing generally falls between $0.50 and $5 per million tokens.

Economy Models

These are highly optimized models designed for large-scale production workloads.

Pricing is often below $0.50 per million tokens, making them attractive for repetitive or high-volume tasks.

The key insight is that capabilities don't stay in the frontier tier forever.

A capability that first appears in an expensive model eventually works its way down into cheaper and cheaper models.

AI Prices Are Falling Remarkably Fast

According to the research, frontier and mid-tier model prices have often fallen by 10× to 30× per year.

That's an astonishing rate of change.

Imagine a task that costs $100 today.

If prices fall by 10× over the next year, that same task costs about $10.

Another year of similar improvement brings the cost down to roughly $1.

That's not a small efficiency gain. It's a dramatic reduction in the cost of intelligence.

The pattern is becoming increasingly familiar: new capabilities show up first in expensive models, and then the industry gets very good at making those capabilities cheaper.

How Frontier Capabilities Become Affordable

The process is surprisingly predictable.

A frontier model demonstrates a new capability. Then researchers and engineers figure out how to reproduce similar results more efficiently.

Some of the techniques used include:

Quantization — Reduces the numerical precision of model weights and computations to lower memory usage and speed up inference.
Distillation — Trains a smaller model to mimic the behavior of a larger model, preserving much of its capability at a lower cost.
Mixture-of-Experts architectures — Activates only a subset of specialized model components for each task, reducing compute requirements.
Flash Attention — An optimized attention algorithm that improves speed and memory efficiency when processing long contexts.
Speculative decoding — Uses a smaller model to predict likely outputs ahead of time, allowing the larger model to generate responses faster.
KV-cache optimization — Reuses previously computed attention keys and values so the model does not need to recompute them for every token.
Prompt caching — Stores and reuses computations for repeated prompts, reducing latency and inference costs.
Parameter-efficient fine-tuning — Adapts models to new tasks by updating only a small subset of parameters instead of retraining the entire model.
Model routing — Directs requests to the most appropriate model for a given task, balancing cost, speed, and quality.

Every one of these innovations reduces the amount of compute needed to achieve a given result.

Over time, capabilities that were once expensive become available to almost everyone.

That's one of the reasons AI pricing behaves differently from traditional software pricing. The thing being sold—the intelligence itself—is constantly becoming cheaper to produce.

The Economy Tier Effect

What's particularly interesting is that even economy models continue getting cheaper.

The rate of improvement is slower than at the frontier, but it's still meaningful.

Suppose a task costs $100 today using economy-tier models.

If costs fall roughly 2× per year:

Year 1: $50
Year 2: $25
Year 3: $12.50
Year 4: $6
Year 5: Approximately $3

Five years later, a workload that once cost $100 might cost only a few dollars.

Even if model quality stayed exactly the same, the cost of accessing that quality would continue to decline.

Buying More Intelligence Per Dollar

Another way to think about all of this is that you're buying more intelligence for the same amount of money.

Of course, intelligence isn't a perfectly measurable unit.

But the practical effect is easy to see.

For the same budget, organizations can often:

Process more data
Analyze more context
Run more agents
Generate more outputs
Perform more reviews
Execute more reasoning steps

In other words, the amount of useful cognitive work available per dollar keeps increasing.

What Customers Should Demand

This has important implications for both buyers and builders of AI products.

Customers shouldn't expect AI products to stay static.

If intelligence is getting cheaper, then products should become more valuable over time.

That value might show up as:

More reviews per month
Better analysis
Larger context windows
Improved security scanning
More comprehensive monitoring
Higher reliability
Better user experiences

As the cost of intelligence falls, customers should benefit from those gains.

What This Means for LiveReview

For us at LiveReview, the goal is pretty simple.

As AI economics improve, we want those improvements to show up in the product.

That means:

More code reviews per dollar
More analysis per dollar
Better answers to developer questions
Increased security coverage
Improved production stability
Greater overall effectiveness

The objective isn't simply to capture efficiency gains internally.

The objective is to turn those gains into more value for customers.

Over time, customers should expect to get substantially more from the same subscription spend.

Why This Trend Is Likely to Continue

Many of the techniques responsible for previous cost reductions have already delivered enormous gains.

But there is still plenty of room for improvement.

Future advances may include:

Better model compression techniques, allowing models to deliver similar performance while using fewer resources.
More sophisticated routing systems that direct tasks to the most efficient model for the job instead of always using the most expensive option.
Improved reasoning architectures that can solve complex problems with fewer computational steps.
More efficient inference algorithms that reduce the amount of compute required to generate high-quality outputs.
Hardware improvements, including new generations of AI chips designed specifically for large-scale inference workloads.
Lower energy consumption, which can significantly reduce the operational costs of running AI systems.
Better datacenter utilization, helping providers get more value from existing infrastructure and pass some of those savings on to customers.

Each layer of optimization pushes the cost of intelligence lower.

Conclusion

The conversation around AI costs often sounds contradictory because people are looking at different parts of the same trend.

Yes, AI can still be expensive today.

Yes, developers still hit rate limits and budget constraints.

But the broader pattern is clear: the cost of intelligence is falling rapidly.

New capabilities appear first in expensive frontier models. Then optimization, distillation, compression, competition, and infrastructure improvements make those capabilities dramatically cheaper.

The result is that every year, a dollar tends to buy more intelligence than it did before.

For companies building AI products, the challenge is straightforward: take those falling costs and turn them into better outcomes for customers.

The companies that do that best are likely to be some of the biggest winners in the next phase of AI.

References

Tiered Super-Moore's Law: Price Evolution, Production Frontiers, and Market Competition in Large Language Model Inference Services (arxiv)
The Price of Progress: Price Performance and the Future of AI (arxiv)