Welcome to the third OnionScan Report. The aim of these reports is to provide an accurate and up-to-date analysis of how anonymity networks are being used in the real world.
In this report we will deliver some insights into how onion services are related to each other, through hosting, shared links and other identity correlations.
The dark web is more connected than previously thought. The classical portrayal of onion services is that of a lone, focused website unconnected to anything else. However, we show that a large proportion of sites can be connected to each other by following hyperlinks - even when we discount directory and wiki sites from our analysis.
We also show how site relationships between seemingly unrelated sites can be outed though link graph analysis.
Further, by using other identity correlations e.g. ssh fingerprints, we can make estimates on the number of unique web servers, demonstrating that even when sites don't link each other, they can be connected by other means.
A Note on Numbers
The lifespan of an onion service can range from minutes to years. When providing generalized numbers, especially percentages, we report approximate figures based on multiple scans over a period of time, rather than a single snapshot.
The Dark Web is Highly Connected
Note on Images: The graphs displayed in this article are transitive reductions of the actual graphs that we constructed. Transitive reductions simplify the number of edges in the graph while preserving connectivity - that is why the graphs sometimes have odd features off long "strings" of connections - in reality all these nodes are connected to each other - but it is hard to create graphs that convey this information without looking cluttered!
When we examine the connections between Onion Services we find that most of the services we scan are actually connected to a large proportion of the rest of the service we scan. The most direct from of these links come from wikis and other link directories however and these aren't that interesting.
Instead, we examined links from sites with less than 20 outgoing links - this removes the link directories from our analysis an allows us to look at more concrete relationships between sites.
As well as links (green) we also looked at relationships detected through SSH public keys (blue), ftp banners (pink) and Apache mod_status leaks (red). SMTP banners were also inspected but only a single small cluster could be identified.
Below is the largest cluster from the June 2016 scan. We found that this large cluster contained ~50% of all scanned sites - most connected through web links, but with a significant number connected through SSH keys and FTP banners.
Many other clusters, like the one below, are small. However, even these clusters have good connectivity suggesting the presence of many active subcommunities.
Outing Relationships through Link Analysis
While much of the web is connected, some sites are more connected than others.
Take a look at the image below. The image shows a small cluster of 19 nodes. These nodes, when we exclude directories and wikis, only link to each other in the pattern showed in the image.
There are several kinds of clusters than can be easily identified. When the links to and from an onion service are closed off to the larger web, the sites, ironically, stand out.
Estimating the Infrastructure of the Dark Web
Perhaps one of the most immediate insights from the above is that the dark web is smaller than previously estimated.
We have previously reported that Freedom Hosting II accounts for ~20% of all active onion sites - we are able to determine this because all of these sites present the same SSH public key - indicating they share the same infrastructure - the same as Freedom Hosting II.
We are able to make other connections. For example by linking FTP Banners we have been able to identify another cluster of 60 sites which all share an identical FTP banner (including an internal IP address leak) - we have been able to link this to another web hosting provider.
Thus we can place some bounds on the number of unique hosting providers being used:
Of the ~5600 active sites that we scanned during June - 23% shared a single unique SSH Key - linking them to Freedom Hosting II.
1% can be attributed to the hosting provider we talked about above via FTP.
- We were able to identify another 9 clusters of sites which could be linked through SSH keys - another 2.5% of sites.
- Through Apache mod_status leaks we were able to identify another 10 clusters of sites sharing hosting infrastructure, another 1% of sites.
Therefore only 21 infrastructure setups account for 27% (~1500) of all active Onion Services!
If we assume the relationships we identified in the previous section also leak co-hosting information we can push this estimate to ~30 setups accounting for 30-35% of all Onion Services!
It is important to iterate that these numbers don't mean that 35% of Onion Services are operated by one of 30 groups - but that they are likely hosted by one of those groups - and remember, these are only the relationships we have been able to discern from rather trivial correlations - there are likely many more.
This lack of diversity in hosting infrastructure is concerning - it places the future of a large proportion of Onion Services in the hands of a limited number of groups.
Other OnionScan News
- We have a new home - mascherari.press. All future OnionScan reports will be published here, as well as lots of other articles and discussions about anonymity and privacy.
- OnionScan now supports XMPP thanks to Scott Ainslie!
- OnionScan was included in ArchStrike Linux! We are super excited that this tool reaching more people in the security community.
If you would like to help please get in touch at firstname.lastname@example.org.
Goals for the OnionScan Project
- Increase the number of scanned onion services - We have so far only successfully scanned ~5000 (out of ~11,000 domains scanned).
- Increase the number of protocols scanned. OnionScan currently supports light analysis for HTTP and SSH, and detection analysis for FTP, SMTP, Bitcoin, IRC, XMPP and Ricochet - we want to grow this list, as well as provide deeper analysis on all protocols.
- Develop a standard for classifying onion services that can be used for crime analysis as well as an expanded analysis of usage from political activism to instant messaging.