Compression is Magic One memory that stays with me from my early days of grad school in the mid-1990s was the buzz about a scrappy, yet highly effective, document classification method from CMU. As I heard it, a very clever student built a simple topic modeling system based on compressed file length for a class project. He used text crawled from Usenet discussion groups to build a set of codebooks, and then applied off-the-shelf compression routines to a novel text passage using each codebook, one at a time. Finally, he assigned the topic to the input passage as the name of the Usenet discussion group responsible for the most compressed file. So simple. So intuitive. I entered Machine Learning from the Electrical Engineering/Digital Signal Processing tradition. From that perspective, I had seen compression algorithms that were undoubtedly *cool* — lossless methods like Huffman coding, or lossy methods based on Discrete Cosine Transforms — but the above example was the first time I thought of compression as *magic*. Like most students of my vintage, however, I soon learned about Latent Semantic Analysis, a decidedly un-magical, yet sturdy method for topic modeling. Later, this was supplanted by Latent Dirichlet Allocation. These mathematically well-founded methods achieved state-of-the-art performance, and the clever method from CMU went into the gimmick dustbin. Fellow grad students in Computer Vision or Natural Language Processing would often make remarks along the lines of “texture is compression” or “naming things is compression,” but for me this was just a nerdy way of talking about learned representations and Shannon entropy. I mean, when it comes to perception, what *isn’t* compressed? These reflections resurfaced for me a few years ago when sci-if author Ted Chiang went viral for his essay “ChatGPT Is a Blurry JPEG of the Web.” Many responses to this imperfect analogy surfaced in the wake of that article, and from that wave of social media posts I came to learn that “X is compression” is very much still a thing. For example, I’m hearing rumblings that consciousness is just compression. Do you also encounter this trope in your field? What’s your sense of what people really mean when they say this?
Want to write longer posts on Bluesky?
Create your own extended posts and share them seamlessly on Bluesky.
Create Your PostThis is a free tool. If you find it useful, please consider a donation to keep it alive! 💙
You can find the coffee icon in the bottom right corner.