Welcome to the inaugural OnionScan Report. The aim of these reports is to provide an accurate and up-to-date analysis of how anonymity networks are being used in the real world.
In this first report we will focus on The Tor Network: Security and Crime.
This report casts doubt on the reported level of crime on the Tor Network. Specifically, we analyze the assumption that crime using onion services can be calculated by simply counting websites. We find that the large amount of duplication of onion services (29%) coupled with the difficulty of accounting for instances of crime invalidate previous estimates. We conclude by offering a path for future research in this area.
Additionally we report that many onion services are leaking information that may be used to deanonymize their operators and users. These include many high profile onion services. We also make recommendations to onion service owners.
A Note on Numbers
The lifespan of an onion service can range from minutes to years. When providing generalized numbers, especially percentages, we report approximate figures based on multiple scans over a period of time, rather than a single snapshot.
Crime and the Tor Network
Summary: We believe that crime using the Tor network is over estimates based on an analysis of data obtained from the recent Deep Light report.
In February a presentation by INTELLIAGG and DARKSUM reported that they had classified 29,532 onion services and reported that 52% of these sites contained illegal content (as classified under UK/US law).
Besides the lack of details describing their methodology, the main problem with the DeepLight analysis is that they rely heavily on simply counting the number of onion services that happen to serve illegal content.
DeepLight did not provide a complete list of the services they scanned. The report is light (no pun intended) on details, but does provide us with a number of sites checked. Out of the 29,532 sites, less that half (46%, ~13,584) were online consistently enough to be analyzed, although the exact number is not given.
The visualization released alongside their report is based on a json file containing just over 8000 onion services. To understand how effective simply counting sites is, we scanned the services listed in that json file - we assume that the 8000 sites provided in this list are representative of the Deep Light scan.
The services were scanned in April 2016 over the course of 3 weeks using the OnionScan tool which was subsequently released that month..
The Problem with Counting
Of the ~8000 services, only ~2000 were online consistently enough (on port 80) to be captured by scanning. Of the 2000, ~600 had at least one duplicate, or just over 29%. Here, duplicate is defined by the sites front page HTML having an identical SHA-1 hash to another site. This is a very strong definition of duplicate - there is opportunity for future research comparing weaker methods of fingerprinting to account for dynamic content.
Taking this figure at face value, it is clear to see the problem with simply counting and reporting the number of “illegal” onion services. The large amount of duplication warps the number significantly.
Why this duplication happens is varied, below we have collected observations from the OnionScan crawl.
Scam Sites and Onion Cloner
Spend any significant timing browsing various lists of onion services and it won't take you long to find a site which appears to be an exact duplicate of another site.
These so called “cloned” sites exist to spam the onion-sphere with identical copies of their target site to trick unsuspecting customers of those sites into visiting, and usually, purchasing goods. The scammers then run off with the Bitcoin, and the victim usually has no recourse.
We believe that each instance of a cloned site was counted as a separate crime by DeepLight, with no attempt made to link the groups of sites together.
As a future recommendation we propose that new research into crime and the Tor Network make an attempt to distinguish cloned sites from the actual sites and treat all cloned sites as a single instance of a crime (scam site) rather than count each one as an instance of the cloned crime (e.g. credit card fraud).
Load Balancing / Distributing
Sometimes, duplicate sites aren't cloned sites but are simply one site being run over a number of different onion services, often to balance load or maintain redundancy.
Outlaw marketplace is a very obvious and visual example of this load balancing. As displayed on the front page of their website, all of the follow domains will reach the marketplace:
The deeplight analysis classifies each one of these services separately. This greatly inflates the amount of crime taking place, considering many sites have 5-10 different domains all pointing to the same site.
It is hard to come up with an exact number of cloned sites verses the load balanced / distributed instances - unless the site in question, like outlaw, provides a complete list to their users.
As with cloned sites we recommend that future research make an attempt to classify gateway services and count all gateway services as a single instance of crime.
The OnionSphere is not just HTTP
Reports like DeepLight focus solely on HTTP traffic, while valuable it is disingenuous to report >50% of onion services being illegal when you ignore a large selection of protocols (and thus services).
When scanning the DeepLight site list we also found hidden services serving only SSH, FTP, Bitcoin, SMTP & IRC. The most common of these was SSH which accounted for ~25% of all online services (10% of services were exclusively SSH).
Protocols like Ricochet & Pond which are based on onion services are also not accounted for the the usage analysis.
A note on Gleaning Relationships through Misconfigured Services
While scanning we came across ~200 servers which were misconfigured and exposed Apache mod_status pages.
Many of these exposed pages leaked more than just user requests and CPU utilization.
Many of these servers appeared to host multiple different sites. The exact nature of the relationship between these sites is speculative, but there are a few possibilities:
- All these sites use the same hosting provider, but are owned by different groups.
- These sites are all managed by the same person, but are owned by different groups.
- These sites are all operated by the same group. It is important to note at this stage that we didn't just find a single occurrence of this kind of setup. All in all we counted 13 distinct groups of sites which appeared to be co-hosting partners.
The following observations can be made:
These sites were hosted on different operating systems and software versions - ruling out a single bad hosting provider.
On one occasion we found evidence that the sites were being managed by a single admin - another hidden service containing folders containing the running versions of all the sites in the cluster.
Occasionally these sites linked to each other visibly through advertisements or endorsements - hinting at a more firm relationships (considering they were also operated on the same servers)
Often times the sites were similar in nature e.g. 3 different sites seller stolen Paypal accounts, or 4 different sites selling marijuana.
For more information of mod_status leaks, see our Security report below.
So how much “OnionCrime” is there?
The DeepLight report does not provide any details on how each site was classified into various categories or how the services were divided into illegal and legal content (apart from that this classification was based on UK/US law).
However, if we assume that their classification was 100% accurate, then taking into account a 29% duplication rate and at least 10% of traffic being SSH (not to mention the various other protocols) - then we have to conclude that the amount of crime is far lower than the reported 52% figure.
Apart from the recommendations above, we also need to the follow in order to increase the accuracy of these kinds of estimates:
A consistent definition of criminal activity using Tor - if a criminal syndicate creates 100 duplicate onion services should that count at 100 different crimes?
Open source, consistent, classification methods - when does filesharing become illegal? Should a site selling firearms and drugs be counted as one crime or two? etc.
If you would like to help with any of these problems please see the Getting Involved section below.
The Security of Onion Services
While scanning sites for the above crime report we also inspected each site for security flaws which may lead to deanonymization.
Summary: Over 50% of all Onion Web Services are hosted on Apache servers. 12% of these installs are configured to leak server status. Over 7% of sites still leak EXIF information.
Approximately 6% of all onion web services exposed Apache mod_status module. This number doubles when considering only Apache hosted sites of which 12% exposed server status information.
It should be noted that while 12% is relatively low, the distribution is not uniform. When limiting this analysis to sites hosted by the Hidden Wiki, a common gateway to many newcomers, this number approached 30% of all Apache hosted sites.
As mentioned above, these server status leaks can lead to leaks of co-hosted services which can be used to analyze relationships between sites. These leaks also expose clearnet connected IP addresses (both clearnet clients and misconfigured servers) and also expose hidden directories and user behavior.
OnionScan also found spot examples of exposed phpMyAdmin instances, personal wikis and other software exposed via onion services. We advise onion service operators of the following:
Exif Metdata Leaks
About 7.5% of all websites leaked some level of Exif metadata.
We suspect this is a problem on the decline, although more data is needed. There are still high profile marketplaces and forums which do not strip EXIF metdata from user uploaded images.
Relatedly, many sites also fail to lock down directories. Wordpress hosted sites appear especially vulnerable to this, as the /wp-content/uploads directory is often readable. This makes it trivial for a crawler like OnionScan to gather lots of data on the site.
Guidance for Onion Web Service Owners
At this time we would like to provide the following advice. None of this advice is new or groundbreaking, but based on the above analysis we believe it is necessary.
- Onion services are easily discovered, do not host anything with the assumption that it won't be found.
- If you are using Apache, please check to see if the /server-status endpoint is exposed & disable if so.
- Default XAMPP installs also leak server status as well as phpMyAdmin consoles and phpinfo() output. If you rely on this platform you should ensure mod_status is disabled and phpMyAdmin is secured with a password.
- If you collect or display images of any sort you should make an attempt to remove all EXIF metadata from the image.
- .htaccess or similar server configuration should be used to forbid directory listings.
- Operators should host different protocols on different onion services in order to avoid fingerprinting attacks e.g. If an operator hosts two web services using the same server and also exposes SSH through both, then the SSH public key fingerprint can be used to link the two sites together. We are currently gathering data on how useful this kind of fingerprinting can be.
Goals for the OnionScan Project
- Increase the number of scanned onion services - DeepLight reported ~13,000 active scanned services. We have so far only successfully scanned ~2,500. We would like to build a database of services, including online activity patterns.
- Increase the number of protocols scanned. OnionScan currently supports light analysis for HTTP and SSH, and detection analysis for FTP, SMTP, Bitcoin, IRC and Ricochet - we want to grow this list, as well as provide deeper analysis on all protocols.
- Develop a standard for classifying onion services that can be used for crime analysis as well as an expanded analysis of usage from political activism to instant messaging.