OnionScan Report: May 2016 - Technologies used by Onion Services

Welcome to the second OnionScan Report. The aim of these reports is to provide an accurate and up-to-date analysis of how anonymity networks are being used in the real world.

In this second report we will focus on Technologies used by Onion Services

Summary

This report provides a detailed analysis the technologies used to host and run onion services. We report on the technology being used, and the dangers inherent to using technology not designed for the threat model facing anonymous services.

We also introduce a scanning policy, guiding the direction of the OnionScan project.

A Note on Numbers

The lifespan of an onion service can range from minutes to years. When providing generalized numbers, especially percentages, we report approximate figures based on multiple scans over a period of time, rather than a single snapshot.

Common Onion Technologies

We found over 50 different server technologies being used across the more than 5000 web servers we scanned. Of these the most common technology was Apache at over half of all servers. nginx was the second most common, making up close to 30%. another 10% of server had modified their service to not report a version (no attempt was made to attempt to identify these services through other means e.g. 404 defaults).

There was no clear pattern of technology choice for the 20% of servers not falling into the buckets above. These servers ranged from hosting Etherpad instances, to running the cyclone.io python web server. Others were server specific e.g. the OnionMail servers identify themselves with the “OnionMail” server version.

Operating System Leaks

Somewhat concerning, it was possible for us to identify the operating system from the server version over 10% of cases, with most of these identifying themselves as Ubuntu or Debian services, with a handful identifying themselves as Windows, BSD and Raspberry Pi variants.

It is worth nothing that while in 5% of these cases server version was identified from Apache mod_status leaks, the remaining 5% were identifying server version without explicit misconfiguration.

Platforms of Choice

Counting platform technologies, while also accounting for duplicate, default and cloned sites, is a tricky prospect. For example, over 10% of sites we scanned simply display a default “Site Hosted by Freedom Hosting II”. Over 20% of all sites are hosted by Freedom Hosting II, which we can identify through the exposed ssh key fingerprint of the ssh server exposed by every Freedom Hosting II site.

Small, static pages make a up over 50% of all web sites. On these pages we could identify no platform, and there were no links to other external pages.

All reported figures must be read with these considerations in mind.

We were able to fingerprint ~5% of sites with using a commonly known self hosted web applications:

Wordpress is one of the most common used platforms used in onion services making up ~2% of all sites scanned. Wordpress is not primarily used as a blogging platform, but an ecommerce using plugins like woocommerce and shopify.

phpBB and simple machines split the forum market with ~40 sites opting to use phpBB and ~30 using simple machines. vBulletin, myBB and others also have a few sites using them.

Of other content management systems, Drupal is the most common with ~40 identified instances. 31 instances of MediaWiki were identified. Less than 15 instances of Joomla, twiki and others were found.

We found less than 15 installed instances of owncloud and about the same number of installed instances of Etherpad. We also found a few instances of Roundcube and other webmail clients.

Perhaps the main surprising fact is how few sites can be fingerprinted with using a common web framework. The truth is that very few onion sites are actually big enough to warrant a whole platform, the most common usecase is still a static site.

However, this does mean that the majority of unique sites lack a commonality in framework which may be useful for fingerprinting. Additionally, with so few sites overall using common frameworks, it may be possible to correlate sites by identifying versions of platforms and plugins.

Update on Reporting

  • We report no change in the number of sites exposing Apache mod_status, between 5% and 6%.
  • We report no change in the number of sites that can be linked to an SSH server fingerprint, close to 25%. Though this number is inflated by Freedom Hosting II which accounts for 95% - all sites relate to a single fingerprint.
  • We report an increase in the number of open directories from 4.2% to 5%. We believe this has been caused by changes to OnionScan which more smartly inspect the source of a website to identify potential open directories.
  • We report a decrease in the number of sites exposing EXIF metadata. However, we believe most of this change is due OnionScan no longer probing open directories more than 1 level deep.
  • We report no change in the number of cloned sites, which still accounts for ~29% of all onion sites.

Ethics of Onionscan: Scanning and Privacy Policy

In order to better interface with the community, I am publishing a small policy intended to be used t guide the decisions made by OnionScan.

This policy is designed to respect and protect the identities of the services that we scan while balancing a need for accurate reporting.

  • Scans take place on an ad-hoc basis throughout the month. Once a scan has been completed all data is aggregated at a high level e.g. Apache based servers 2510 (~52%). The raw scan data is then destroyed.
  • We will never publish details which could be used to deanonymize an onion service e.g. leaked IP addresses / identified correlated clusters etc. We will never publish any raw scan data.
  • We do not currently respect robots.txt. After deliberation the decision was made to ignore the contents of robots.txt to avoid fingerprinting bias from the sites. As an example, all illegal sites could add a directive to Disallow OnionScan from crawling their site, biasing classification results.
  • To balance the last point, OnionScan attempts to make as few requests as possible to the site e.g If an open directory is discovered we will not attempt to continue the crawl to child directories.
  • If you would like to contact us about this please see the contact details at the start or end of this report.

Other OnionScan News

  • OnionScan now has source support for smarter scanning of directories and files, PGP key extraction and mongodb identification, with thanks to contributions from Dan Ballard, JosephGregg, Nick Doiron, Bartłomiej Antoniak and Korons. A formal 0.1 release of OnionScan will be announced in the coming months.
  • OnionScan was included in the list of tools available in BlackArchLinux. We are very excited by the prospect of this tool reaching more people in the security community.7. Get Involved

If you would like to build upon any of the requests for future research provided by this report please get in touch at team@onionscan.org, you can also find us on twitter @OnionScan.

Goals for the OnionScan Project

  • Increase the number of scanned onion services - DeepLight reported ~13,000 active scanned services. We have so far only successfully scanned ~5000 (out of ~10,000 domains scanned).
  • Increase the number of protocols scanned. OnionScan currently supports light analysis for HTTP and SSH, and detection analysis for FTP, SMTP, Bitcoin, IRC and Ricochet - we want to grow this list, as well as provide deeper analysis on all protocols.
  • Develop a standard for classifying onion services that can be used for crime analysis as well as an expanded analysis of usage from political activism to instant messaging.