OnionScan Report: August 2016 - Revisiting CARONTE; Analytics, Bitcoins and Correlations

Welcome to the fifth OnionScan Report. The aim of these reports is to provide an accurate and up-to-date analysis of how anonymity networks are being used in the real world.

In this report we will examine how bitcoin addresses and 3rd party analytics services can be used to correlate or deanonymize onion services.

Summary

In 2015 a paper titled CARONTE: Detecting Location Leaks for Deanonymizing Tor Hidden Services was published. One aspect of the paper analysed how different identifiers e.g. bitcoin addresses and google analytics identifiers could be used to deanonymize hidden services.

In this report we look for the same identifiers within the OnionScan dataset and show that CARONTE style vulnerabilities still exist, and may be more prevalent than the original paper determined.

A Note on Numbers

The lifespan of an onion service can range from minutes to years. When providing generalised numbers, especially percentages, we report approximate figures based on multiple scans over a period of time, rather than a single snapshot.

Methodology

We scanned ~12,000 Onion Service hosts over the period of a week from August 29th to September 4th. As has been seen previously, ~6000 services were active and online during this time.

During these scans we searched for instances of Google Analytics IDs, Google Publisher IDs and Bitcoin Addresses.

The follow sections provide an analysis of the crawl data.

Analytics and Publisher IDs

In total we found over 100 unique Google Analytics and Publisher IDs across ~150 different sites.

61 of these were Casino and Gambling related sites that all shared a single Google Analytics ID. We estimate theses sites make up ~70-80% of all casino and gambling sites on the Dark Web.

All of these sites had clearnet domains, and on closer inspection appears to be tightly related e.g.many shared the same "Latest News" section, although the content was presented in different languages.

Apart from the casino cluster, we found number of different clusters of 4-5 sites all sharing analytics IDs. Many of these were clone or load balancing type sites relaying the same content.

In a few instances we were able to identify relationships between sites that we had not seen prior e.g. different sites selling counterfeit IDs under similar names, but ultimately using the same google analytics IDs.

As well as relationships between Dark Web sites we were able to identify relationships between seemingly unrelated clearnet sites and illicit dark web sites.

In one particular case we found a number of illicit websites using the same analytics ID that is associated with a social media marketing company.

Bitcoin Addresses

Overall we found 384 sites advertising one or more Bitcoin addresses (due to the limitation in scanning we believe this number is likely much higher, with many addresses being hidden behind payment forms and login forms).

Through these sites we identified 13,889 instances of Bitcoin Addresses of which we found 2,293 unique addresses.

As you might imagine, many of these addresses related to large exchange/blockhain info type sites and don't provide much correlation value.

However, closer analysis revealed that many of these addresses were only seen in a small number of sites - indicating potential links.

In one instance we identified a cluster of different pastebin/image upload sites where some had an obvious reference to a bitcoin address and the others had that same reference hidden in a comment.

It could be that the some of these sites used the one common site as a template, but we believe that is unlikely.

We found a number of other similar matches, however due the the large number of addresses and sites involved we have not fully processed the dataset e.g. looked up addresses on the blockchain or attempted to identify clearnet instances. We will dive deeper into Bitcoin address use in a future report.

Discussion

The original CARONTE experiment reported:

CARONTE extracted 58 unique identifiers: 24 Google Analytics IDs, 3 Google AdSense IDs, and 31 Bitcoin wallets. Only 66 hidden services (3.3%) contained an identifier, indicating that most administrators are careful to avoid them.

Overall, we found substantially more hidden services (1141 or ~20% of active hidden services) containing either an analytics IDs (177 sites with 192 IDs / 99 unique), publisher/adsense ids (34 sites with 38 IDs / 12 unique) or bitcoin addresses (384 sites with 13,889 addresses / 2293 unique address).

While the large number of bitcoin addresses are somewhat skewed by large blockchain info type sites, there are still a large number of correlations that can be made from and between sites using the addresses.

Likewise for publisher and analytics IDs, while the casino site mentioned above was a large outlier, skewing the number of hidden services upwards, there were still a significant number of sites publishing these kinds of identifiers.

The authors of the CARONTE paper were able to deanonymize 14 services using these kinds of identifiers, stating that 12 of these were likely intentional (e.g. clear non-dark web links such as news outlets or the casino pages we have identified) and that 2 were unintentional.

Given the much larger number of the identifiers we collected, and through casual observance, we believe that many more sites can be deanonymized using this method.

As such we conclude that CARONTE style attacks are more common than previously thought, and that site operators should seriously consider other options before either relying on Google or any other 3rd party for analytics services, and should also minimize their use of common bitcoin addresses across multiple different sites.

Get Involved

If you would like to help please read Sarah's post OnionScan: What's New and What's Next for some great starting off points. You can also email Sarah (see her profile for contact information).

Goals for the OnionScan Project

  • Increase the number of scanned onion services - We have so far only successfully scanned ~6500 (out of ~12,000 domains scanned).
  • Increase the number of protocols scanned. OnionScan currently supports light analysis for HTTP(S), SSH, FTP & SMTP and detection for Bitcoin, IRC, XMPP and a few other protocols - we want to grow this list, as well as provide deeper analysis on all protocols.
  • Develop a standard for classifying onion services that can be used for crime analysis as well as an expanded analysis of usage from political activism to instant messaging.

Correction

Due to an error in data analysis some of the data in an earlier version of this report were counted twice, these were:

  • The number of Casino sites which was originally reported as 122
  • The number of instance of Google Analytics IDs which was originally reported as 393
  • The number of instances of Bitcoin Addresses 27,778
  • The number of sites where containing Google Analytics IDs or Bitcoin Addresses (355 and 768 respectfully)

The number of unique instances was reported correctly.