On Thursday, June 20th 1996 Edward J Markey rose in the Communications Privacy and Consumer Empowerment Act:
The issue of privacy in the information age and in particular, children's privacy protection, is quite timely as the Nation becomes ever more linked by communications networks, such as the Internet. It is important that we tackle these issues now before we travel down the information superhighway too far and realize perhaps we've made a wrong turn.
The act is ultimately unimportant in the event that followed, but the words echo through time to our current day and place.
We have indeed made a wrong turn.
A Brief Introduction to Web Tracking
Cookies have now been standard in web browsers for at least 20 years and the debates around them are almost as old.
When cookies were first introduced, there was a conversation around their privacy implications - while cookies have many legitimate uses e.g. recording session information such that the user does not have to repeatedly enter a username and password with every request - cookies are now regularly used to store information to make tracking users movements across the web easier.
Since then other methods of tracking have evolved. Device Fingerprinting is a method by which websites attempt to distinguish one user from another automatically using small variations in each users computer setup e.g. the operating system version, the browser software version, which fonts or plugins the user has installed, the users screen size and many other similar characteristics.
More recently developments have been in using newer HTML5 features such as the Audio and Battery API's now available in order to refine device fingerprints even further.
While these kind of techniques are concerning, I want to focus on a particular trend in modern websites, and one that I believe has major implications in how we view the modern web privacy conversation - the reliance on 3rd parties.
Any website can implement the tracking technologies I have described above. However, most will not and will instead rely on one of a handful of companies to collect the data, perform analysis or deliver "appropriate" advertisements.
This centralization of the Internet around a few companies that have visibility over an extreme number of everyday browsing demonstrates how fragile our content ecosystem really is.
There are three main ways that 3rd parties inject themselves into places where they can collect your browsing history. For the purposes of the article I am going to call these resource inclusion, script inclusion and content network centralization.
Resource Inclusion is based on the premise that your browser will load resources images, videos, etc when it is rendering the page. Sites includes resources from 3rd parties and when the browser makes a request to the 3rd party server the request is is logged and analyzed with every other request. Cookies are usually sent and received during the request, although not always.
Content Network Centralization is not really a tracker in the traditional sense, but content-distribution networks (CDNs) are increasingly being given visibility over a large portion of Internet traffic due to their size and network posture. Because hundreds of websites use the same CDN the network is able to determine what sites your IP address visits as well as perform other kinds of analytics much like a resource based tracker would.
Visualizing the Information-Tracking Superhighway
In order to understand the scope of 3rd party intrusion into everyday web browsing we looked at 1000 of the most popular Internet sites and counted the number number and type of 3rd party images and scripts that were present on each.
After that we rendered the connections between sites and tracking resources.
In an ideal world each websites should be its own island - only loading a few resources. Instead what we found is that many of todays popular websites are wired into a vast 3rd party centralized tracking infrastructure we have dubbed The Information-Tracking Superhighway
In our small sample of the 1000 popular websites we found that 451 of them could be connected to our Information-Tracking Superhighway.
These sites all shared one or more common 3rd party resources. This means that the 3rd parties present on the highway have access to data from many, many of the most commonly visited websites - and as such have opportunity to build large, detailed profiles on the visitors to those websites.
Google by far had the largest number of it's scripts and other assets loaded in our sample. In fact 6 out of the top 10 3rd party inclusions represented Google properties:
googleapis.com was the most well represented with 9% of sites in our sample using it.
Other popular 3rd parties were CloudFront and Optimizely.
Our results are very similar to a much larger 1,000,000 site survey conducted by Englehardt and Narayana which found that Google-owned domains made up 12 of the top 20 third parties.
What does this mean for privacy?
Frankly, the future of web privacy does not look to be a cheerful one. If the current state remains the same or gets worse then we can expect to see ever more centralization - and with it - ever more power being handed to those who monitor our web traffic.
Like in the million site study, we found that the sites that included the most third party resources tended to be news websites - sites like time.com, salon.com, cnn.com and others all included 5+ 3rd party resources.
Among other categories of sites with large number of 3rd party inclusions were dating websites, porn websites and many other sites revealing peoples passions, hobbies and lives.
It would be easy to accuse us of fear mongering, but we must remember that the corporations that collect this information are not benevolent, they do it for their own self interest, for profit. If it became profitable, or simply would endanger profits not, to spill out personal information, to hand it over to a government - these organizations would, in a heartbeat. Our data is not safe with them. Your data is not safe with them.
What can you do?
Thankfully there are tools that can be used to subvert the current system. To render it ineffective. To kill it.
Extensions that block 3rd party trackers like uBlock Origin(Firefox & Chrome) and Privacy Badger are an essential first step. Using these extensions doesn't make you invulnerable to web tracking, but it does greatly cut off the flow of information to these corporations.
Going further, Tor can be used to hide your location and device identity from a website that you are visiting; an ultimate defense against tracking.
However, Tor is not for everyone. There are serious security considerations that must be taken into account when using the Tor Browser. Malicious exist nodes are known to attempt far worse things than simply tracking you. You can protect yourself but, right now, the price is vigilance.
It is likely that if you are reading this that you will settle somewhere between these two options - perhaps using blocking extensions most of the time, and using Tor part of the time. That's OK. The most important thing is for you to start taking back control and to start demanding your right to consent.
But tools are not enough: if privacy is something that you want, then privacy must be something that you demand.
You must act to enforce your consent over your browsing history, the only person who should know you visited a website is you unless you want to tell someone else.
Note: This research could not exist without our supporters, and we hope we can continue to deliver new insights, research and technology in the future. To help us do so, please support us