From Logic to Linear Algebra: How AI is Rewiring the Computer
Computers were never just about wires and chips. They've always been built for suiting human imagination - or more practically -- a workload. Or in reverse - the rule is: the workload defines the hardware.
For half a century, workloads looked like this: operating systems making decisions, compilers checking conditions, databases searching records. All of them boiled down to logic -- if this, then that; otherwise, do something else.
The machine for the job was the CPU: fast, flexible, and great at juggling complex, branching tasks.
But AI has upended this. When you open up a modern neural network, you won't see forests of if-else statements. You'll see matrices -- massive grids of numbers. And the work is astonishingly repetitive: multiply, add, multiply, add. Not once, but billions of times.
This quiet shift in workload has triggered a hardware revolution. Computing has begun to move away from chips optimized for logic toward chips optimized for linear algebra.
The Physics Layer: From Sand to Switches
Underneath it all, computing is about materials.
- Copper is a conductor -- it lets electrons flow easily.
- Glass is an insulator -- it blocks electrons entirely.
- Silicon is a semiconductor -- its behavior changes with tiny tweaks ("doping"), letting it sometimes conduct and sometimes insulate.
That trick makes silicon the perfect switch -- the transistor.
The first transistor (Bell Labs, 1947) was thumb-sized. Today, billions can fit on a fingernail. Scaling transistors down has been the foundation of modern computing.
From Transistors to CPUs
One transistor is just a switch. Combine a few, and you get a logic gate (AND, OR, NOT). Chain thousands together, and you can build an arithmetic unit. Stack billions, and you get a processor.
The CPU was designed as a general-purpose workhorse: a Swiss Army knife of computing. It handles operating systems, databases, spreadsheets, video games -- anything you throw at it.
But CPUs are optimized for complex control flow: branches, loops, conditionals. They excel at decision-heavy workloads.
And that's exactly why they stumble on AI.
Why Matrices Took Over
Neural networks are not forests of branches. They're layers of linear algebra.
Take a simple example: multiplying two 3×3 matrices.
To compute just one entry of the result, you multiply a row by a column and add things up:
That's all matrix multiplication is: multiply, then add. Again and again. It's conceptually simple, but the scale is monstrous. A modern AI model requires trillions of these multiplications and additions.
And here's the key shift: AI workloads don't care about clever branching. They care about raw, parallel arithmetic.
This is why CPUs suddenly feel like the wrong tool for the job.
GPUs: Parallelism as a Superpower
The solution came from an unexpected place: graphics cards.
A GPU was originally designed to render pixels in parallel. Where a CPU has a few complex cores, a GPU has thousands of simpler cores, organized for throughput.
Key concepts inside a GPU:
- Streaming Multiprocessor (SM): cluster of cores executing together.
- Warp: group of 32 threads executing in lockstep.
- Memory hierarchy: registers, shared memory, global memory -- structured to keep math units fed.
This makes GPUs perfect for AI. Multiplying matrices looks a lot like drawing pixels: the same operation, repeated millions of times.
- CPU: handle one complex stream of instructions efficiently.
- GPU: handle thousands of simple instruction streams in parallel.
That's why training a neural net might take weeks on CPUs but hours on GPUs.
Beyond GPUs: TPUs and Groq
If GPUs are great, why stop there?
Google's TPU
Google realized GPUs were still too general-purpose. So they built the TPU (Tensor Processing Unit) -- a chip whose circuitry is explicitly designed for tensor math.
- The core is the Matrix Multiply Unit (MXU) -- a systolic array optimized for multiply-accumulate operations.
- Result: higher efficiency, lower cost per AI operation, and reduced power.
Groq
A newer contender, Groq, made a radical design choice: instead of thousands of cores, they built a deterministic dataflow architecture.
- Data streams across the chip like on a conveyor belt, with no scheduling overhead.
- This means predictable latency and high throughput for real-time inference.
Both TPU and Groq show the same theme: hardware bends to the workload.
Understanding FLOPS: A Common Benchmark
Before we compare these machines, we need a common measuring stick. How do you quantify computational power across radically different architectures?
Enter FLOPS -- Floating-Point Operations Per Second.
The Birth of a Benchmark
In the early days of computing, processors handled integers just fine, but real numbers (with decimal points) were a different beast. Scientists needed machines that could handle complex calculations: weather simulations, nuclear physics, engineering designs.
The solution was floating-point arithmetic -- a way to represent real numbers in binary. Think of it as scientific notation for computers:
3.14159
becomes something like314159 × 10^-5
- The computer stores both the significant digits and the exponent
But floating-point math is computationally expensive. Unlike simple integer addition, floating-point operations require:
- Aligning decimal points
- Performing the operation
- Normalizing the result
- Handling special cases (overflow, underflow, infinity)
Why FLOPS Became the Standard
By the 1960s, scientific computing was dominated by matrix operations -- the same operations driving AI today. Researchers needed a way to compare machines:
- How fast can this computer solve a system of linear equations?
- Which machine can simulate fluid dynamics more efficiently?
- What's the cost-per-calculation for different architectures?
FLOPS provided the answer -- a single number representing how many floating-point calculations a machine could perform each second.
The Scale Problem
Early computers measured in FLOPS (ones). Then came:
- KiloFLOPS (thousands) -- 1970s minicomputers
- MegaFLOPS (millions) -- 1980s workstations
- GigaFLOPS (billions) -- 1990s supercomputers
- TeraFLOPS (trillions) -- 2000s clusters
- PetaFLOPS (quadrillions) -- 2010s supercomputers
- ExaFLOPS (quintillions) -- 2020s frontier machines
Why FLOPS Matter for AI
Modern neural networks are built on floating-point calculations:
- Every neuron activation: floating-point multiply + add
- Every weight update: floating-point arithmetic
- Every matrix multiplication: billions of floating-point operations
When companies train GPTs or users generate huge number of images over time with DALL-E, we're performing petaFLOPS of computation. Hardware that delivers more FLOPS per dollar (and per watt) performs better for these workloads.
FLOPS became a useful benchmark because they measure exactly what AI workloads require: arithmetic throughput.
Comparing the Machines
Let's benchmark the same task: multiplying two large matrices.
Hardware | Optimized For | Approx Speed | Energy Use | Strength | Weakness |
---|---|---|---|---|---|
CPU | Logic-heavy workloads | Few GFLOPs | Moderate | Flexible, runs anything | Poor at AI workloads |
GPU | Parallel matrix math | 10,000+ GFLOPs | High | Excellent throughput | Expensive, power hungry |
TPU | Tensor ops (MXUs) | Similar or faster | Medium | Efficiency, tuned for AI | Narrow use case |
GroqChip | Deterministic dataflow | Competitive (low ms) | Medium-Low | Predictable latency, real-time AI | Ecosystem still maturing |
GFLOPs = billions of floating-point operations per second.
Takeaway: CPUs are versatile, but GPUs, TPUs, and Groq dominate in cost, speed, and energy for matrix math.
Inference Economics
This shift isn't academic -- it's financial.
- CPU inference → slow, costly per operation.
- GPU inference → fast but power-hungry, good for large-scale training.
- TPU inference → highly efficient, cheap at scale (esp. in Google's cloud).
- Groq inference → predictable timing, useful for latency-sensitive tasks like real-time translation.
The hardware you choose doesn't just change performance -- it changes the economics of deploying AI at scale.
Has Computing Changed Forever?
Yes.
For decades, computing meant logic. CPUs reigned supreme.
Now, computing means linear algebra. GPUs, TPUs, and Groq are the new champions.
Does that make CPUs obsolete? Not at all -- they'll always power operating systems, compilers, and general-purpose software. But the center of gravity has shifted.
The deeper lesson: the nature of computing bends to the workload.
Right now, that workload is matrices.