Why trackers eat cookies for breakfast

A brief explainer of the Internet

The internet is the backbone of modern life. We are online all day every day. Working remotely, ordering food, doom scrolling, all it takes is opening a URL or opening an app. A fully rendered page appears instantaneously.

This seamless experience hides the enormous complexity at play. Underneath the hood, the internet isn’t one monolithic system — it’s a patchwork of computer networks stitched together by the Internet Protocol Suite, known as TCP/IP. Think of it as a layered cake: the Physical layer ensures that data packets are correctly transmitted over a physical medium, such as a cable; The Network Layer routes raw data across the globe; the Transport Layer keeps that data organized and intact; and at the top, the Application Layer assembles these communicated bits of data into the pages, emails, file transfers.

The Application Layers turns the chaos of decentral global networks into order. HTTP (Hypertext Transfer Protocol) structures how your browser or app requests pages and how servers deliver them. Here’s how it plays out: You type a URL into your browser. Instantly, the Domain Name System (DNS) translates that human-friendly address into an IP number that servers understand. The browser launches a HTTP request to the right server, asking for the web page. But that page isn’t a single block of data—it’s a mosaic of text, images, videos, stylesheets, and scripts stored on multiple servers. Modern browsers are multitasking wizards, sending parallel requests to fetch each piece and stitching them together on the fly. Until your page has loaded.

If you understand the basics of HTTP requests and responses, you will become a tracking expert in no time.

Your average online newspaper will send around two hundred requests per page. Most of these requests don’t appear on the page. They merely transmit data about your session for analytics and advertising.

Why do we need cookies?

In the summer of 1994, Netscape engineer Lou Montulli worked on a project that changed the course of the internet for good. It all started with the mundane shopping cart. Back then, you couldn’t browse a site and save items in your cart for later.

In those early days — remember modems roaring under your desk — efficiency was critical. Engineers didn’t want the complexity of managing memory at the level of a server. So, they deliberately designed the HTTP protocol to be ‘stateless’. The server didn’t remember anything about past requests.

Imagine walking into your neighborhood coffee shop on your daily commute to work. You anticipate the lovely coffee smell, know all the staff by name, and order a cappuccino every day. However, the barista has no clue who you are, asks for your name and order, and moves on to the next customer.

If the server needed to "remember" something like an order, it had to rely on other tools. An easy solution would have been assigning a unique identifier to each browser. But Montulli understood that adding a unique key to your browser would totally undermine online privacy. He needed a different solution. So Montulli came up with a small text file that is added to the communication through requests and responses. The cookie was born.

Here’s how a cookie works for a shopping basket. A user visits online merchant great-store.com for the first time. The server sends a response that includes a command to store a cookie (e.g., “cart_id=unique_123”) that assigns a unique key to the session. The user adds an item to the shopping cart. The server responds with a new cookie to record the added item (e.g., “cart_items=product123:1”). The user navigates to great-store.com/category. The browser automatically adds these cookies to this page request. As the page loads, the cart gets populated, and the item stays available.

The term cookie is supposedly derived from a “magic cookie”. As its name suggests, ahum, magic cookies were used to pass small packets of data between programs in Unix operating systems. I suppose developers programming in C, needed more magic in their life. And cookies. Montulli was inspired by this trick and applied it to the browser. The patent application for cookies labelled them ‘persistent client state in a hypertext transfer protocol-based client-server system’. Cookies are catchier after all.

In short, we need cookies — or other forms of browser storage — to keep a snapshot of our user session across different HTTP requests. If we stick to the ‘cookie’ analogy, a ‘cookie jar’ is actually a better description. The jar has a label ‘chocolate chip cookies’ (name), 7 cookies (value) and a best-before date (expiry). When you return home from a long and frustrating day at work, you may not remember how many cookies are left in the jar (i.e., the state of your kitchen session). You do know where the jar sits in your cupboard (i.e., the URL). You feel relieved to find 7 cookies. You take one and let your mind wander. The next day, you probably forgot about the exact number of cookies again. No problem. You can always go back and check in the same place. Unless you’re feeling really hungry.

So far, cookies sound reasonable, right? They are just places to store useful stuff in your browser. So how did we go from a shopping cart to the surveillance economy?

image

Good cookies, bad cookies

The way websites could offer more personal and less frustrating experiences while preserving privacy was simply by siloing access by domain. In other words, cookies initially didn’t travel across website domains.

It didn’t take long before this restriction was bypassed by advertising agencies. This is where the distinction between a first-party and third-party cookie comes in. If the domain of the cookie is the same as the visited domain, we call it a first-party cookie. If the domain is different, it’s a third-party cookie. There is no other distinction between first-and third-party cookies.

If the same third-party server is present on many websites, that server can follow users around and create advertising profiles. Online advertising agencies, most notably DoubleClick later acquired by Google, figured out that an image file could become a tracking device. Image size doesn’t matter so tracking images were literally limited to invisible 1x1 pixels — Hence the name ‘tracking pixels’.

DoubleClick signed media publishers that wanted to sell advertising space. The publishers added the script of DoubleClick. DoubleClick gained insight into visitors and delivered personalized advertisements.

Say you visit topsports.com with your browser. As the page renders, a pixel tracking script of an advertising network AdNet is loaded. The script sends an HTTP request to the AdNet server with a unique identifier, stored as a third-party cookie, the URL, and standard information about your browser and device. AdNet records you viewed a page about soccer and stores this information in their database linked to your unique identifier. As you read more pages on topsports.com, AdNet can assign the segment ‘Sportslover’ or ‘Soccer fan’ to your unique identifier. How? AdNet also crawls and classifies the content of these pages separately and has defined rules for assigning interest segments to page views.

None of this is voodoo magic. Trackers uses the HTTP request standard just like any other text or image on your page. Information is stored in the headers, appended to the URL of the request, or included in the request body.

Let’s say you order a burger at McDonalds. The domain signals the store. The header contains information about restrictions (e.g., allergies). The query string summarizes your preferences (e.g., bacon yes, pickles no). In response, you get your burger 🍔.

image

Every time you return to your McDonalds of choice, you make requests. Over time McDonalds will have gain insight into your preferences. It sees the location of your orders, gets a sense of your favorite burger, quantifies your average order value, and can now send targeted e-mails with special offers.

Summary

Trackers collects data about a user's activities across multiple domains. It works by sending a request with information about a user to a central server. The payload always boils down to an identifier, information about a page or action, and information about a device. Cookies store that identifier.

It is a common misconception that cookies track users. They don’t. Cookies are just files waiting to be sent along requests and responses. Cookies get eaten by whoever requests them.