Intelligence as Compressed Search

In this entry, I offer a view of intelligence at a slightly different angle. It's inspired by my work from Corvus and my work on variance-aware trajectory search inference scaling for robotics action planning using diffusion policy. From these places, I've gained an insight: you can empirically map how a learned model compresses its internal state space. There are two levels to this. First, the model compresses high-dimensional environmental state space into a more compact internal representation space. Second, empirically, the model also learns to further compress that internal state space by only traversing a small, structured subset of it during inference. This dual compression reveals something interesting: intelligent systems don't just learn to model the world, they learn to compress it. From this perspective, intelligent systems may exist to reduce the effective compute required to search through large possibility spaces. It's not just optimization. It's optimization with amortized cost. In this view, intelligence is a compression engine for intractable search. This search can be over actions, thoughts, representations, optimal search methods (at the meta-cognition level), or any other space of possible outcomes, in relation to some objective function. At Corvus, we see this with our systems when we're trying to find the best estimate for your location. The smaller the search space, the more granular and therefore accurate the estimate. All of these spaces are potentially infinite, and would be very expensive to brute-force search.

1. Definitions

\( \mathcal{S} \): a space of possible actions, thoughts, or representations
\( T_{\text{brute}} \): expected compute cost of brute-force search in \( \mathcal{S} \)
\( T_{\text{intel}} \): expected compute cost using an intelligent system
\( \mathcal{I} \): intelligence as the compression ratio of search compute

\[ \mathcal{I} = \frac{T_{\text{brute}}}{T_{\text{intel}}} \]

Higher \( \mathcal{I} \) implies greater intelligence: the system achieves the same or better outcome with fewer evaluations or decisions. This metric is task-general and architecture-independent.

2. Interpretation

This theory reframes intelligence as a search-accelerating process that distills the structure of a domain to avoid wasting compute on dead-ends. In biological systems, this manifests as perception, memory, and predictive modeling. In artificial systems, it manifests as amortized inference, inductive biases, or learned heuristics.

Importantly, \( \mathcal{I} \) is not fixed. A system can learn to improve its own \( T_{\text{intel}} \) over time, i.e., it can compress search even further as it gains experience. This recursive efficiency is a possible axis for defining higher-order intelligence.

Another important insight is that, in this view, thinking is about using compute to search more of the compressed space offered by learned models.

Intelligence, in this sense, is not the raw power to search everything, it is the ability to know what not to search.