Mascherari Press

Oniondildonics: Securing Sex Toys Using Privacy-Preserving Protocols

Sarah Jamie Lewis — Fri, 06 Oct 2017 13:14:51 GMT

Introduction

A few weekends ago at least 5 people who follow me on Twitter connected to my vibrator through the Tor network. After connecting they were able to issue commands to my vibrator and cause it to vibrate or switch off. They were also able to read the battery indicator.

OMG MY VIBE IS BEING CONTROLLED ANONYMOUSLY OVER TOR
— Sarah Jamie Lewis (@SarahJamieLewis) August 6, 2017

I even had a few journalists connecting to my vibrator.

Why Did You Do This?

I believe that technology should be consensual by default, and there is a very clear example of a set of technology that really should be consensual by default but that isn't - sex tech.

All sex tech devices currently on the market that feature remote interaction rely on communication mediated by a server. This server is almost exclusively owned and operated by the manufacture of the device (or related associates).

This server has direct access to the content of play sessions between partners, and even if manufactures took steps to provide end to end encryption for this content (they often don't) the servers would still be able to derive metadata of sessions e.g. which partners engaged in remote sex, when for how long etc.

I don't want corporations knowing who my sexual partners are and so I wanted to demonstrate that such an architecture was not necessary, and that these devices can be consensual by default.

I have long been a fan of the ricochet protocol - a peer to peer, metadata-resistant messaging protocol based on tor onion services. So I set out to prototype a remote sex architecture using Ricochet.

Architecture

Overall communicating with a vibrator over Tor is pretty simple...

Practically all connected sex toys on the market today use bluetooth LE to connect to a remote control or a phone application.

I first reverse engineered this protocol which allowed me to talk to my vibrator.

The device is connected via bluetooth to a computer running a program using the oniondildonics library. This program also creates a Tor hidden service running a ricochet server.

The ricochet protocol has a number of steps which we will outline here for reference and future discussion:

Authentication - During this step the client is asked to prove its identity by completing a cryptographic proof using its public key.
Authorization - Once authenticated, if the client is known then the server may accept/reject the client based on its identity. If the client is not known then the server may wait for a contact request or simply reject the client.
Messaging - Once Authorized the client and the server may setup communication channels within the ricochet protocol and send messages to each other.

This architecture has a number of benefits when it comes to enforcing & protecting consent.

For example, by setting the contact manager to only allow partners with specific ricochet identities to connect, they can lock down and ignore any new contact requests, ensuring that only people who they have pre-authorized and who have access to specific ricochet keys can participate in play sessions.

Another possibility is generate a new ricochet identity for a one-time play sessions - once the session is over the ricochet identity can no longer be used to control the toy. This is a very useful properties for those who like or need to engage in remote sex in a public or semi-public context e.g. those on cam sites - and ensures that they do not have to provide any long term public identifiers.

All connections & messages are encrypted from end-to-end and the metadata resistance of onion services mean that there are no service providers in a position to spy on or record sexual activity between partners. The only people who know they are involved an an intimate activity are the people who are in it.

Future Work

As sex tech improves it may be possible to eliminate the bluetooth the connection entirely and integrate the ricochet connection within the device itself.

The code is online, and is a very rough outline of this experiment; there are plenty of opportunities for improvement, including implementing the contact management & long term play relationships discussed above.

Overall, I hope this experiment inspires more people to come up with novel ways of using onion services and the ricochet protocol as well looking for ways to make our world more private & consensual by default.

Note: This research could not exist without our supporters, and we hope we can continue to deliver new insights, research and technology in the future. To help us do so, please support us

The Information Superhighway has become The Information-Tracking Superhighway

Sarah Jamie Lewis — Tue, 27 Jun 2017 10:00:00 GMT

On Thursday, June 20th 1996 Edward J Markey rose in the Communications Privacy and Consumer Empowerment Act:

The issue of privacy in the information age and in particular, children's privacy protection, is quite timely as the Nation becomes ever more linked by communications networks, such as the Internet. It is important that we tackle these issues now before we travel down the information superhighway too far and realize perhaps we've made a wrong turn.

The act is ultimately unimportant in the event that followed, but the words echo through time to our current day and place.

We have indeed made a wrong turn.

A Brief Introduction to Web Tracking

Cookies have now been standard in web browsers for at least 20 years and the debates around them are almost as old.

When cookies were first introduced, there was a conversation around their privacy implications - while cookies have many legitimate uses e.g. recording session information such that the user does not have to repeatedly enter a username and password with every request - cookies are now regularly used to store information to make tracking users movements across the web easier.

Since then other methods of tracking have evolved. Device Fingerprinting is a method by which websites attempt to distinguish one user from another automatically using small variations in each users computer setup e.g. the operating system version, the browser software version, which fonts or plugins the user has installed, the users screen size and many other similar characteristics.

More recently developments have been in using newer HTML5 features such as the Audio and Battery API's now available in order to refine device fingerprints even further.

While these kind of techniques are concerning, I want to focus on a particular trend in modern websites, and one that I believe has major implications in how we view the modern web privacy conversation - the reliance on 3rd parties.

Any website can implement the tracking technologies I have described above. However, most will not and will instead rely on one of a handful of companies to collect the data, perform analysis or deliver "appropriate" advertisements.

This centralization of the Internet around a few companies that have visibility over an extreme number of everyday browsing demonstrates how fragile our content ecosystem really is.

There are three main ways that 3rd parties inject themselves into places where they can collect your browsing history. For the purposes of the article I am going to call these resource inclusion, script inclusion and content network centralization.

Resource Inclusion is based on the premise that your browser will load resources images, videos, etc when it is rendering the page. Sites includes resources from 3rd parties and when the browser makes a request to the 3rd party server the request is is logged and analyzed with every other request. Cookies are usually sent and received during the request, although not always.

Script Inclusion is technically a form of resource inclusion, but the main work is performed by running the included javascript on your computer when you load a webpage. This javascript can additionally load resource based trackers, but will also be used to perform more advanced device fingerprinting as well as collecting information about links that you click, or where your mouse pointer is on the screen. All of these are used to build a deeper profile of your behavior.

Content Network Centralization is not really a tracker in the traditional sense, but content-distribution networks (CDNs) are increasingly being given visibility over a large portion of Internet traffic due to their size and network posture. Because hundreds of websites use the same CDN the network is able to determine what sites your IP address visits as well as perform other kinds of analytics much like a resource based tracker would.

Visualizing the Information-Tracking Superhighway

In order to understand the scope of 3rd party intrusion into everyday web browsing we looked at 1000 of the most popular Internet sites and counted the number number and type of 3rd party images and scripts that were present on each.

After that we rendered the connections between sites and tracking resources.

In an ideal world each websites should be its own island - only loading a few resources. Instead what we found is that many of todays popular websites are wired into a vast 3rd party centralized tracking infrastructure we have dubbed The Information-Tracking Superhighway

The Information-Tracking Superhighway. 45% of sites in the 1000 we looked at can be connected through shared 3rd party infrastructure. High Resolution Version

In our small sample of the 1000 popular websites we found that 451 of them could be connected to our Information-Tracking Superhighway.

These sites all shared one or more common 3rd party resources. This means that the 3rd parties present on the highway have access to data from many, many of the most commonly visited websites - and as such have opportunity to build large, detailed profiles on the visitors to those websites.

Google by far had the largest number of it's scripts and other assets loaded in our sample. In fact 6 out of the top 10 3rd party inclusions represented Google properties: googleapis.com, google.com, googletagservices.com, googleadservices.com, google-analytics.com, googlesyndication.com

googleapis.com was the most well represented with 9% of sites in our sample using it.

Other popular 3rd parties were CloudFront and Optimizely.

Our results are very similar to a much larger 1,000,000 site survey conducted by Englehardt and Narayana which found that Google-owned domains made up 12 of the top 20 third parties.

What does this mean for privacy?

Frankly, the future of web privacy does not look to be a cheerful one. If the current state remains the same or gets worse then we can expect to see ever more centralization - and with it - ever more power being handed to those who monitor our web traffic.

Like in the million site study, we found that the sites that included the most third party resources tended to be news websites - sites like time.com, salon.com, cnn.com and others all included 5+ 3rd party resources.

Among other categories of sites with large number of 3rd party inclusions were dating websites, porn websites and many other sites revealing peoples passions, hobbies and lives.

It would be easy to accuse us of fear mongering, but we must remember that the corporations that collect this information are not benevolent, they do it for their own self interest, for profit. If it became profitable, or simply would endanger profits not, to spill out personal information, to hand it over to a government - these organizations would, in a heartbeat. Our data is not safe with them. Your data is not safe with them.

What can you do?

Thankfully there are tools that can be used to subvert the current system. To render it ineffective. To kill it.

Extensions that block 3rd party trackers like uBlock Origin(Firefox & Chrome) and Privacy Badger are an essential first step. Using these extensions doesn't make you invulnerable to web tracking, but it does greatly cut off the flow of information to these corporations.

Going further, Tor can be used to hide your location and device identity from a website that you are visiting; an ultimate defense against tracking.

However, Tor is not for everyone. There are serious security considerations that must be taken into account when using the Tor Browser. Malicious exist nodes are known to attempt far worse things than simply tracking you. You can protect yourself but, right now, the price is vigilance.

It is likely that if you are reading this that you will settle somewhere between these two options - perhaps using blocking extensions most of the time, and using Tor part of the time. That's OK. The most important thing is for you to start taking back control and to start demanding your right to consent.

But tools are not enough: if privacy is something that you want, then privacy must be something that you demand.

You must act to enforce your consent over your browsing history, the only person who should know you visited a website is you unless you want to tell someone else.

Note: This research could not exist without our supporters, and we hope we can continue to deliver new insights, research and technology in the future. To help us do so, please support us

OnionScan Report: Freedom Hosting II, A New Map and a New Direction.

Sarah Jamie Lewis — Mon, 06 Mar 2017 18:54:05 GMT

Welcome to the ninth OnionScan Report. The aim of these reports is to provide an accurate and up-to-date analysis of how anonymity networks are being used in the real world.

Over the last year we have produced reports into the state of onion services. In this final report we will examine the most up to date analysis of the OnionSphere compiled from the largest OnionScan crawl we have ever conducted.

Note: This research could not exist without our supporters, and we hope we can continue to deliver new insights, research and technology in the future. To help us do so, please support us

A Year In OnionScan

Over the last year we have written many reports relating to anonymity, security and the dark web.

Our first ever report covered estimates of crime on the dark web, counting onion services and hidden service security, we then looked at what platforms are used on the dark web.

For our third report we produced the first maps of the dark web analyzing relationships between onion services.

Our forth report examined how https certificates were being used (and misused) by dark web sites.

In August we replicated part of the CARONTE study finding relationships between hidden services using various identifiers like email addresses, bitcoin addresses and analytics tokens.

In September we took a close look at the hosting provider Freedom Hosting II and demonstrated that they were hosting a significant portion of the dark web.

Towards the end of 2016 we demonstrated a new attack to extract co-hosting relationships from hidden services. And at the start of 2017 we demonstrated how much information we could extract from a popular dark marketplace.

Much has changed in the onionsphere in that time, and yet much has remained the same. The issues that we identified in all of those reports haven't gone away. Hosting an anonymous service is still very difficult.

The Death of Freedom Hosting II

In early February the free hosting provider Freedom Hosting II was compromised along with all of the sites it was hosting.

OnionScan data was used as part of the initial analysis to confirm the impact.

While it is still difficult to say how many of the 10,000 sites compromised in the database leak were serving content before the compromise - we can definitely say that over 2000 sites that were part of the OnionScan master list are no longer being hosted - and that this has had a significant impact on the overall size of the onionsphere - an estimate we put at between 15-20% of all dark web site content on the tor network.

An analysis of the database dump revealed that at least 3 of the largest databased in the dump were forums related to sharing and discussing child sexual exploitation - this would appear to align with statements made by the hackers, that it was only after they had gained access to Freedom Hosting II that they decided to destroy the hosting provider and leak the database.

When we examine the smaller databases, we find them to be populated with blogs, small political forums and personal sites - whatever the main motive behind the attack, it is clear that it was not without collateral.

The Shape of the OnionSphere Today

Without venturing into guesswork, we can say that 2017 has not been a good year to be operating a hidden service.

After the downfall of Freedom Hosting II we undertook the largest OnionScan crawl to date and examined over 30,000 onion services extracted from both the freedom hosting database and our existing master list.

Of the 30,000 queried just over 4,400 were online when we scanned (as always the scans were conducted across a number of days in an attempt to make the set as large as possible). We believe that many of the others in our list are now defunct.

These 4,400 hidden services are far fewer than previous scans. We believe that the Freedom Hosting II takedown not only removed many thousands of active sites but also may have affected other hosting providers who were hosting some infrastructure on top of Freedom Hosting II.

The sudden disappearance of Sigaint, an encrypted email provider, may also be associated with the decline of some hidden services.

Without venturing into guesswork, we can say that 2017 has not been a good year to be operating a hidden service.

Overall we present the following numbers regarding the makeup of the onion sphere in March 2017.

HTTP Detected - ~4000
TLS Detected - ~250 (In line with previous counts - seemingly unaffected by FHII)
SSH Detected - ~270 (much lower, mostly due to the FHII hack)
FTP Detected - < 10 (much lower, again expected to be related to FHII)
SMTP Detected - < 100
VNC Detected - < 10
Bitcoin Nodes Detected - ~220 (much higher, likely because of better bitcoin capability in OnionScan)

Since our initial explorations back in April we can see that web sites still dominate the landscape. SSH endpoints are down substantially mostly linked to the downfall of Freedom Hosting II.

Otherwise not much as changed. Even in terms of security the state remains much the same as it was a year ago.

Apache mod_status Exposures still site between 7-10% of sites (we detected just over 400 in our recent scans) - this might even represent a slight increase in the number of misconfigured Apache installs.

Other kinds of vulnerabilities (open directories, EXIF metdata, host header co-hosting leaks) are also within the levels that we described over the last year.

We were able to extract nearly a thousand unique IP addresses from our data set belonging to both services and clearnet clients accessing misconfigured hidden services.

A New Map

When we examine larger scale correlations we still see that the vast majority of dark web sites can be connected to one or more clusters based on outbound/inbound links, bitcoin addresses and/or other types of identifiers.

A zoomed out image of the dark web connection graph produced by OnionScan. Multiple clusters of hidden services can be seen connected by a large number of identifiers and misconfigurations.

Truth be told, it is simply far too easy to correlate hidden services to each other. This is because it is far too easy to misconfigure software or otherwise leak information.

Operational patterns appear across our new map, as can been seen in the images below there are several clusters of hosted nodes that are connected through Apache mod_status leaks - only a few of these sites are ever linked to outside of the cluster.

The fact that we can find multiple examples of this pattern likely indicates that, as we discussed back in our first report, that most hidden services are likely operated by a small collection of organizations and people rather than by individuals.

This is possibly driven by a few factors - the difficulty of setting up and maintaining a website, combined with the skills required to run a Tor hidden service make offloading that work to a 3rd party tempting.

However as seen with Freedom Hosting, and the other leaks we have demonstrated, these relationships creates additional security risks - and may in the end completely compromise any anonymity or privacy.

In order to truly solve the problem of usable, anonymous publishing we need a new approach.

The Future of OnionScan

We have always states that the end goal for OnionScan was to "make OnionScan not work" - by that we meant that we want to build a world where anonymity applications do not/cannot leak the kinds of information that OnionScan uses to draw correlations.

To achieve this goal we must move away from a pure OnionScan development and start investing in new technologies.

Over the course of the next few months we will be experimenting with and releasing prototypes, and papers wit the aim of coalescing on a new direction for anonymous publishing.

The end goal: a tool that can be used by anyone that will host a hidden service securely, privately and anonymously. A tool that will empower and aid the user in making decisions that protect themselves and others.

This doesn't mean that OnionScan is going away.

OnionScan will still be under active development - we plan to release 0.3 later on this year which will make scanning more efficient, and introduce more in-depth scanning of various protocols.

OnionScan will also feature in many of the new projects we build - we believe that active scanning on anonymity applications is vital to maintain their security - to that end we believe OnionScan, or a derivative, can be used to continually monitor for potential deanonymization vectors.

So, while this may be the last in-depth OnionScan Report we produce for a while - this is not the end of OnionScan - and will certainly not be our last report.

This research could not exist without our supporters, and we hope we can continue to deliver new insights, research and technology in the future. To help us do so, please support us

The First Contact Problem: Getting to SecureDrop

Sarah Jamie Lewis — Tue, 31 Jan 2017 22:41:08 GMT

When a source wants to leak information, they must first become aware of their options & take steps to protect their operational security.

However the very act of seeking out, researching and accessing this information can place a source at risk of discovery.

Overview: The First Contact Problem

This first contact problem is an artifact of most privacy and anonymity systems - how do you gain access to a system that protects your privacy in a private fashion?.

This article will examine some of the common ways in which the current leaking pipeline...leaks.

Problem: Choosing Where To Leak

Let us imagine that a source has heard of SecureDrop, and that they know that some organizations use it to protect sources.

Perhaps the source is fond of The Guardian and is aware that they have a secure drop instances at: https://securedrop.theguardian.com/ and they visit the page...

In that one act the source has potentially compromised themselves.

Between entering the URL in the bar and the page loading they have twice associated a reference between SecureDrop and themselves (or at least their IP address).

Once in the DNS request for securedrop.theguardian.com and once in the TLS SNI extension:

Now it is important to state that neither of these metdata leaks by themselves or even together are substantial evidence of someone leaking - but during an investigation they could act as a smoking gun.

It it worth remembering at this point that multiple governments now require Internet Service Providers to keep a record of sites visited - and that we know other government departments regularly intercept Internet traffic and make such data available during investigations. An investigation having access to such information is no idle fantasy.

The above discoveries can be mitigated by hosting SecureDrop information as part of the main news site e.g. https://theintercept.com/securedrop/

This ensures that direct references to SecureDrop does not show up in DNS requests or the SNI extension.

Alternatively using Tor or a non-personal Internet connection also reduces the risk of association. However, these solutions also beget problems.

Problem: Acquiring and Using Tor Browser

The same DNS/SNI issues arises when downloading the Tor browser. Most sources are likely going to end up on the Tor Project's website and this fact will be exposed by their network traffic.

A further complication arises once they start using Tor. Tor traffic can be fingerprinted and thus it should be assumed that ISPs and others with access to traffic logs can construct activity logs for when people start and stop using Tor.

For sources who start using Tor immediately prior to leaking sensitive information this is another smoking gun during an investigation and one which we must assume will be used.

Possible Solutions

Providing potential sources immediate access to Tor is essential in ensuring that their activities are not caught up in the drift net of mass surveillance.

Emphasizing the importance of everyone using Tor as part of regular Internet activity is essential. The more people who use Tor, the harder it is for any one person to be singled out.

Having a history of using Tor might single a source out - so it is essential not only that they have a history of using Tor, but that others around them do also.

Outside of a personal environment there are other options - informing potential sources to use non-personal Internet connections e.g. coffee shops is also a good tactic when it comes to disassociating traffic data. However there are also various operational risks associated with public settings.

Providing potential sources immediate access to Tor is essential in ensuring that their activities are not caught up in the drift net of mass surveillance.

In that vein The Library Freedom Project has worked with libraries to make Tor available on community computers - making it easy for anyone to access anonymous resources without compromising clearnet metadata. Such schemes are likely necessary if we wish to protect a wide variety of potential sources.

Document Metadata

Finally, we must talk about the problem of document metadata. Most potential sources are unlikely to have any knowledge regarding the kinds of information that a word document, PDF or image contains.

While SecureDrop contains software allowing journalists to inspect and remove metadata - we must consider the case where sources wish to keep this information hidden.

Practically every Darknet Marketplace now removes EXIF data from uploaded images - inspired by years of vendors making trivial opsec mistakes while uploading.

We need to be build tools that allow sources to inspect what metadata various documents contain and provide them with the ability to scrub them themselves - before interacting with SecureDrop.

This creates yet another first contact problem like the ones above - but like those I believe this one is solvable.

SecureDrop could of course purge all metadata on upload, like darknet markets do - but we must take into account a threat model where a news organization or SecureDrop instance is compromised - and aim to protect the source from such a risk.

A generally marketable privacy extension for documents and images ("see what kind of information you are sharing") could gain enough ground to be known and help sources making these decisions.

Right now, searching for any information on document metadata brings up a variety of technical documents with very little aimed at a general audience.

Moving Forward

New technology and innovations in user experience can make being an anonymous source less risky.

In addition, general education about the privacy concerns of Internet & document metadata are essential in helping sources make choices that protect themselves far before they engage with someone with more expertise in operational security.

In an age where leaking might be the only way to gain access to accurate information regarding the state of various government departments, we should do everything we can to make it as easy, and as safe, as possible.

Sarah is a privacy & anonymity researcher working to make the world a fairer place. If you would like to support her research and development of tools like OnionScan you can become a patron at Patreon

Financial Censorship: When Banks Decide Morality

Sarah Jamie Lewis — Wed, 25 Jan 2017 21:17:26 GMT

The social networking site Fetlife has removed hundreds of groups & fetishes from it's platform in response to a credit card company shutting down a merchant account.

The social networking site that serves people interested in BDSM, fetishism, and kink made an announcement about the takedowns after community outcry.

In the announcement (mirror), the owners explained the reasons for the deletions - takedown requests from credit card companies based on the content that fetlife is hosting.

"Last Tuesday we got a notice that one of our merchant accounts was shutting us down. One of the card companies contacted them directly and told the bank to stop processing for us. The bank asked for more information, but the only thing they could get from the card company was that part of it had to do with 'blood, needles, and vampirism.'" - a snippet from the Fetlife announcement.

Three days after the first notice, Fetlife received another notice, this time from another merchant account. They got a similar call from the same card company, and they were asked to close the Fetlife account. This time they were told it was for "Illegal or Immoral" reasons.

At this time Fetlife can no longer process credit cards. Without a merchant account, FetLife runs at a loss every month because all of it's advertisement revenue (50% of it's income according the annoucement) is processed by credit card companies.

In response Fetlife have scrambled to enforce new guidelines around what kind of content is allowed on their platform. Content that is now forbidden includes rape-play, race-play and kinks involving drugs or alcohol.

Many of the kinks that have been removed are very common. Multiple studies have found that forced-sex, in many different contexts, is a common fantasy among both women & men. Further the BDSM community have evolved strong models of consent to reduce, and in many cases, eliminate harm while acting out these kinds of fantasies.

[P]ressure applied by credit card companies in an aim to remove depictions of these fantasies amount to censorship.

Even for non-common fantasies, we must emphasis that they are just that, fantasies - and pressure applied by credit card companies in an aim to remove depictions of these fantasies amount to censorship.

What about Cryptocurrencies?

Many readers might now be asking why Fetlife does not move to a cryptocurrency like bitcoin? They tried:

We used to accept bitcoins through Coinbase. They dropped us a year ago because we are a kinky site. No joke.

If a Bitcoin site wants to accept credit cards, then they have to adhere to rules set forth by the credit card companies.

Yes, there are other options, and we are going to look into them, but options like Bitcoin are a nice to have and not currently a viable replacement for being able to accept credit and debit cards on FetLife, no matter how much one might want to believe otherwise.

As awesome as the ideal value of cryptocurrencies is in theory, Fetlife have found that it hasn't worked in practice.

According to the FAQ attached to the annoucement, when Fetlife offered Bitcoin as an option, it was responsible for less than 0.1% of their daily transactions.

This was no where near enough needed to reduce their dependency on credit card companies.

With anti-porn, pro-censorship laws & taskforces being debated in United Kingdom and other countries - the impact of financial censorship is likely to grow beyond niche kink communities.

For those who believe that banks & credit card companies shouldn't be the arbiter of moral behavior - Fetlife is the canary in the coalmine.

Title image Source: Wikicommons

Visualizing the Dark Web: Dark Market Flower

Sarah Jamie Lewis — Thu, 19 Jan 2017 04:12:38 GMT

Much has been written about the dark web; about the violence, the crime, the technology. When people picture the dark web they think of police raids, stacks of seized drugs and perhaps some cartoonish image of a hooded hacker in a dark room.

I believe the dark web is more than that. I believe the technologies & the people who combine to form it represent something bigger than the constituent parts.

As the cliché goes I can't tell you, you have to see it for yourself. This is the first in an unspecified series of works visualizing the dark web in a way you have never seen before.

Dark Market Flower

Every day on Hansa thousands of dollars worth of drugs & other illicit items are traded. The transactions, as the latest OnionScan report reveals, can be reconstructed using some simple data analysis.

I took these transactions and the relationships between the buyers & sellers and with a little bit of math magic formed a flower.

Each line in the image represents a single transaction on the marketplace. Where the lines meet at a point represents a vendor or a buyer. The more lines going to a point, the more that vendor or buyer has interacted with others.

The lines subtly change in thickness depending on the dollar amount of the transaction. You can see this more clearly in the closeup below.

Dark markets are seen as places to fear - but they also hold so much potential. In a world where governments have gained too much power over their people these markets can be seen as a way of taking back that power; to realizing consent.

I hope this has given you some greater understanding of the beauty inherent in the connections that underly us all. I hope to share more of this world with you in the future.

If you would like to support further research and development of tools like OnionScan you can become a patron at Patreon

OnionScan Report: Reconstructing the Finances of Darknet Markets through Reputation Systems

Sarah Jamie Lewis — Sun, 15 Jan 2017 19:09:38 GMT

Welcome to the eighth OnionScan Report. The aim of these reports is to provide an accurate and up-to-date analysis of how anonymity networks are being used in the real world.

In this report we will provide an in depth analysis of the financial information & business of a darknet marketplace, demonstrating how information publicly available can be used to build up a detailed profile.

Summary

Hansa Marketplace is a dark web marketplace where vendors can sell illicit products such as hacked accounts & drugs. We conducted a large scale crawl & analysis of the products & reviews listed on the marketplace.

A screenshot of the front page of Hansa Marketplace

By using those products listings & reviews we were able to gain a large amount of information about financial transactions facilitate by Hansa Market, including, how much Hansa Market facilitates, which vendors earn the most through Hansa, which products are bought the most & where vendors are likely based.

We discuss the impact of such information leaking through market reputation systems & what the future might look like.

Methodology

Between December 10th to December 14th 2016 we conducted a large crawl of the dark web marketplace Hansa. To do this we modified a version of OnionScan to output CSV records for each configured user relationship.

The two relationships we configured were: vendors listings & customer reviews. We have provided these relationship configurations at the end of this report.

Vendor Listings extracted information about each product sold on Hansa marketplace, including the Title, the Price in USD, the top level Category the item was listed under, the country the vendor claimed the product shipped from (if applicable), and the Vendor who was selling the product.

Customer Reviews extracted each review of the item listed on the products page, this included the time of the review, whether the review was positive,neutral or negative, the delivery time, the redacted user name & any review text.

We configured OnionScan to perform a deep scan, discovering & scanning new pages to a depth of 100 link follows and conducted multiple scans over the course of a week.

The scans themselves took many hours to complete, and multiple scans were run to ensure complete coverage of the site.

Overall we collected 14,544 unique product listings & 43,841 user reviews. We believe this is the largest processed scrape of dark market listings & reviews publicly available*.

A graph showing the number of reviews posted on Hansa Marketplace over time.

After scanning we analyzed the data in the following ways:

Number of products sold by each vendor
Amount of revenue generated by each vendor
Most popular products
Highest revenue products
Which vendors shipped from which country
Customer satisfaction with the product

Hansa Market Finances

Through our analysis we determined that Hansa likely facilitated over $3,000,000 USD worth of sales between September 2015 & December 2016.

Hansa Market was formed in late 2015, and according to a report published by RAND in January 2016 there were 4,829 listings & 219 vendors on the site.

Our scrapes, taken in mid December, show that those numbers have grown; showing Hansa as having 14,544 listings offered by 511 vendors.

Through our analysis we determined that Hansa likely facilitated over $3,000,000 USD worth of sales between September 2015 & December 2016.

These figures are based on the reviews logged for each scrapped item, and the price of the product as listed when scrapped. Hansa earns between 2% and 5% commission on each item depending on the status of the vendor, that means we can estimate Hansa itself likely made $100,000 USD - $150,000 USD between September 2015 & December 2016 (not including vendor fees & other miscellaneous income)

Popular Vendors & Products

Proceeds from Drugs (over $3,000,000 USD) dominate Hansa's income, with Counterfeit (~$120,000 USD) & Fraud (~$79,000 USD) related products trailing way behind.

A bar chart showing Total Sales in each Hansa Category. Drugs outsell every other category.

Despite drugs overwhlemly contributing to the bottom line it is digital items like accounts on Porn sites & how-to books that dominate the top 20 most sold products. Only 5 products in the top 20 most sold products can be classified as drugs - these are Liquid Mushrooms, MDMA & Xanax.

Being a Vendor on a Dark Market appears to pay well, with the top vendor (dutchcandyshop) selling $171,747.71 USD

The product with the highest number of reviews (and presumably sales) is titled +++++ Netflix Account(Premium + Lifetime) -Best price on Hansa +++++. This "Lifetime" Netflix Account sells for just 99 cents, and has 745 reviews. As with all reviews on Hansa the majority are positive with comments such as "All good, thanks a lot" & "Quick delivery, fixed issue quick as well, trust worthy and fixes issue if it happens. Highly recommend!" - lurking in between there are a few negative reviews "1 month later, account isn't working again (4th time)" and "its not lifetime". Perhaps indicating that these are hacked Netflix accounts that for the most part, remain hacked.

Being a Vendor on a Dark Market appears to pay well, with the top vendor (dutchcandyshop) selling $171,747.71 USD worth of product, and all vendors in the top selling over $50,000 USD worth of product each since September 2015.

A bubble chart of the Top 20 vendors on Hansa by gross sales income. Larger circles indicate more customer reviews/sales.

These numbers are based on the price of the product when scraped & the number of reviews - it is quite possible that user do not leave reviews, or that products are removed from the site when they are no longer available - meaning that data isn't available to collect - that being said, based on the large number of reviews we are analyzing, it seems correct to say that vendors are making serious money through Hansa marketplace.

Positive Reviews

We mentioned above that practically all of the reviews that we scrapped were found to be positive. In fact only 825 products had any neutral or negative reviews.

Overall, 96.4% of all reviews on Hansa Marketplace were positive.

This trend towards positivity likely stems from a few key factors:

Sales have to be finalized by the buyer & the seller - failure to deliver an item results in the sale not being finalized, and in many cases canceled. As such no review would be left.
Dark Marketplaces rely heavily on trust models and so there is a natural pressure on vendors to ensure what they are selling is legitimate (for some meaning of the word legitimate)

Shipping & Countries

Vendors have a bias towards only supplying within their countries borders - this is caused by the difference in the levels of package inspection within a country verses at international borders (and most countries having stricter punishments for shipping illicit products across borders).

The above means that products on darkmarkets are often listed alongside their shipping origin & destinations. Buyers will use this shipping information to filter products that they would be unable to receive.

Using this information we can work out where vendors operate from, and (based on information about the product itself), work out how much product from dark markets is moving through a given countries postal system.

After discounting digital items & those where the vendor claim they ship worldwide, the United States is the country with the most number of vendors who list it as an origin with 149. Germany (51) and the United Kingdom (50) follow. After that the number of products per vendors in country drops pretty quickly from The Netherlands (43) to Canada with (15) and China (6).

Discussion

We have been able to determine very specific information about the business of Hansa market just by using information publicly available on the marketplace itself.

This poses a big threat for Hansa and other darknet marketplaces. The anti-scrapping technology applied by marketplaces like Hansa is trivial & at most relies on easily defeatable CAPTCHAs. Dark markets are unable to utilize modern robot detection frameworks because all of them rely on large centralized companies & are unsuitable for anonymous marketplaces.

To put it another way, the amount of data being put out by Hansa (and others) is a risk to the anonymity of themselves & the vendors & customers that use the site.

Because of this, crawling dark markets can result in a trove in information that can be analyzed to uncover not just the finances of the marketplace, but that of vendors & products.

We are able to tie Vendors to countries & product categories, as well as work out how much income they have taken in & when that occurred. In criminal investigations, this kind of information is used to correlate bank account transactions or other financial interactions and to narrow down suspect lists.

To put it another way, the amount of data being put out by Hansa (and others) is a risk to the anonymity of themselves & the vendors & customers that use the site.

Towards Anonymous Reputation

This hints at the underling problem that is pervasive to these kinds of marketplaces; Customers need information to work out which vendors are trust worthy & what products to buy. In order to encourage sales, Vendors & Marketplaces are encouraged to make reviews available as well as the overall reputation of the Vendor. Without this information it is likely that Customers will go elsewhere to buy.

But this information has a cost, and that cost comes when each data point is correlated with the others & external information. As we have shown before, correlation can kill anonymity, and the reputation & review data being provided by modern marketplaces is no exception.

If markets want to avoid these kind of leaks in the future then there needs to be an entire overhaul in the foundations of these reputation systems. There is some theoretical schemes being proposed that use zero-knowledge proofs to demonstrate trustworthiness without revealing particular information - we imagine that such schemes will become more popular as time goes on.

We are able to tie Vendors to countries & product categories, as well as work out how much income they have taken in & when that occurred. In criminal investigations, this kind of information could be correlated to bank account transactions or other financial interactions to narrow down suspect lists.

Conclusion

The reputation systems that power dark markets are vulnerable to exploitation. The data that can be gathered by them is often enough to reverse engineer detailed finances of marketplaces, vendors & product lines. While we have only looked at Hansa in this report, cursory examinations of other markets seem to indicate that this problem is universal - and will likely be for a considerable time to come.

If you would like to support further research and development of tools like OnionScan you can become a patron at Patreon

Data

You can find the data used in this analysis at https://polecat.mascherari.press/onionscan/dark-web-data-dumps/tree/master

_{* This report originally stated that we believe the data behind this scrape to be the largest publicly available. It should have read that we believe this data is the largest processed dump of dark market listings & reviews publicly available.}

Canada's Online Consultation on National Security - Investigative Capabilities in a Digital World - My Feedback

Sarah Jamie Lewis — Thu, 01 Dec 2016 22:39:54 GMT

For the next couple of weeks "as part of the Government's commitment to openness and transparency, Public Safety Canada and the Department of Justice Canada are consulting Canadians on key elements of Canada's national security laws and policies to ensure they reflect the rights, values and freedoms of Canadians."

One section Investigative Capabilities in a Digital World caught my eye, and I decided to answer. My comments are duplicated below.

How can the Government address challenges to law enforcement and national security investigations posed by the evolving technological landscape in a manner that is consistent with Canadian values, including respect for privacy, provision of security and the protection of economic interests?

You have not presented any argument that there are challenges to law enforcement unique to the rise of technology.

The closest you have come is saying that law enforcement are frustrated because of having to follow procedure & not being able to immediately violate the privacy of anyone they choose - this isn't a problem, it's a feature.

In the physical world, if the police obtain a search warrant from a judge to enter your home to conduct an investigation, they are authorized to access your home. Should investigative agencies operate any differently in the digital world?

No. A search within the meaning of section eight of the Canadian Charter of Rights and Freedoms is determined by whether the investigatory technique used by the state diminishes a person's reasonable expectation of privacy.

The expectation of privacy extends into the digital world - I would go as far as to say that most people expect their digital lives to be more private than their physical ones.

Currently, investigative agencies have tools in the digital world similar to those in the physical world. As this document shows, there is concern that these tools may not be as effective in the digital world as in the physical world. Should the Government update these tools to better support digital/online investigations?

This is a leading question & you have not provided any evidence that existing tools are less effective in the digital world.

Provide evidence of cases you were unable to prosecute because of new technology.

Is your expectation of privacy different in the digital world than in the physical world?

No. Why would it be? For many, many people in this country those are not distinct worlds, they weave in and out of each other constantly & will continue to become ever more linked.

Since the Spencer decision, police and national security agencies have had difficulty obtaining BSI in a timely and efficient manner. This has limited their ability to carry out their mandates, including law enforcement's investigation of crimes. If the Government developed legislation to respond to this problem, under what circumstances should BSI (such as name, address, telephone number and email address) be available to these agencies? For example, some circumstances may include, but are not limited to: emergency circumstances, to help find a missing person, if there is suspicion of a crime, to further an investigative lead, etc…

The law is designed to limit the ability of the police to carry out investigations. We have a name for states that make it easy for police to do their jobs, they are called police states.

BSI should be available to law enforcement agencies once they have received an order from a court authorizing such information to be given to the police.

Do you consider your basic identifying information identified through BSI (such as name, home address, phone number and email address) to be as private as the contents of your emails? your personal diary? your financial records? your medical records? Why or why not?

Yes. Stalkers, abusive spouses, and others with access to even the most rudimentary information can cause a large amount of harm.

Contrasting it with financial & medical records is disingenuous - I don't give out my bank or medical records to everyone I meet, but I don't give out my home address or phone number either.

Do you see a difference between the police having access to your name, home address and phone number, and the police having access to your Internet address, such as your IP address or email address?

Nope.

The Government has made previous attempts to enact interception capability legislation. This legislation would have required domestic communications service providers to create and maintain networks that would be technically capable of intercepting communications if a court order authorized the interception. These legislative proposals were controversial with Canadians. Some were concerned about privacy intrusions. As well, the Canadian communications industry was concerned about how such laws might affect it.

This is not a question. But there is a reason it was so concerning, that's because it would have violated fundamental rights.

Should Canada's laws help to ensure that consistent interception capabilities are available through domestic communications service provider networks when a court order authorizing interception is granted by the courts?

Just so we are clear this should be more clearly rephrased as "Should Canada's laws help ensure we can spy on whoever we want when a court says so."

This is an impossibility. Laws should not be impossible to enforce.

If the Government were to consider options to address the challenges encryption poses in law enforcement and national security investigations, in what circumstances, if any, should investigators have the ability to compel individuals or companies to assist with decryption?

This would be a violation of section 11 of the Canadian Charter of Rights and Freedoms: "1. Any person charged with an offence has the right ...
(c) not to be compelled to be a witness in proceedings against that person in respect of the offence;"

So, in no circumstances because it would be a breach of fundamental rights.

How can law enforcement and national security agencies reduce the effectiveness of encryption for individuals and organizations involved in crime or threats to the security of Canada, yet not limit the beneficial uses of encryption by those not involved in illegal activities?

You can't. Physically Impossible. The laws of universe state you cannot.

Should the law require Canadian service providers to keep telecommunications data for a certain period to ensure that it is available if law enforcement and national security agencies need it for their investigations and a court authorizes access?

Should the law compel service providers to spent millions on infrastructure investments whose sole aim is to collect information on Canadians? No.

Why would you want to put all of Canadians data in one place to make it easy for criminals to steal?

If the Government of Canada were to enact a general data retention requirement, what type of data should be included or excluded? How long should this information be kept?

Billing information seems to be a safe choice since telecoms have to keep it already for a justifiable reason - so things like name, address, IP, data used, who called who and at what time.

OnionScan Report: This One Weird Trick Can Reveal Information from 25% of the Dark Web

Sarah Jamie Lewis — Wed, 30 Nov 2016 09:25:24 GMT

Welcome to the seventh OnionScan Report. The aim of these reports is to provide an accurate and up-to-date analysis of how anonymity networks are being used in the real world.

In this report we will examine how a small change to a regular HTTP request can reveal information, and sometimes deanonymize a large number of hidden services.

Summary

Over a quarter of hidden services hosted on the Tor network are vulnerable to Hostname Hacking (compared to 7% vulnerable to mod_status leaks) - for many this simply reveals co-hosted sites however for a smaller subset, this means deanonymization.

A Note on Numbers

The lifespan of an onion service can range from minutes to years. When providing generalised numbers, especially percentages, we report approximate figures based on multiple scans over a period of time, rather than a single snapshot.

25% of Onions are Vulnerable to Hostname Hacking

We recently talked about Hostname Hacking a technique that exploits misconfigured virtual hosts on a web server to trick a hidden service into reveal some more information about itself.

We have scanned 15,000 onion domains (of which just over 11000 were online consistently enough to be queried).

During these tests we simply replaced the Host: HTTP header with localhost instead of the sites onion domain. We then compared the responses of the normal Host and the hacked Host.

This modification is not enough to fully exploit the site in many cases, but a difference in response tells us that virtual hosts are unlikely to be configured correctly. (It is worth noting that even an error condition can be enough to reveal information - sometimes servers print their real IP address in server error pages, sometimes a common error condition is enough to link servers.)

We found over ~2800 sites that responded differently to regular Host compared to the hacked Host. This indicates that over 25% of hidden services are vulnerable to this technique.

However, ~1400 (~50%) of these are all from the large FreedomHosting II hosting provider. Which defaults to a Double You Bitcoin scam when you request Host:localhost.

When we remove FHII hosted sites from our figures we still find over 12% of online hidden services have this vulnerability.

Unlike Apache mod_status leaks, hostname hacking affects all major web servers including nginx & lighttpd. This means that hostname hacking vulnerabilities are far more pervasive (~12% of all non FHII sites) than mod_status leaks are (~7% of all sites).

One Weird Trick - Multiple Possibilities

We have only talked about hostname hacking as a way to detect co-hosting - however, further analysis has revealed a couple more interesting uses:

Certain configured servers won't expose /server-status over a regular Host: abcdefghijklmnop.onion request but will expose it over the Host: localhost request.
Badly configured services will expose emails, IP addresses and more over error pages caused by the a bad Host parameter that they would not in other contexts.
A large number of sites vulnerable to hostname hacking reveal an open directory file listing, or an otherwise personal site, on a Host: localhost request - this indicates that many are co-mingling personal data with anonymous sites, a very bad practice.

Other HTTP Headers

We found it wasn't just then Host parameter - other sites expect the existence of other Tor Browser default HTTP headers and will act differently if they are not available - one of the most stark examples we found was a web hosting provider where all sites would show the same (custom) error if the Accept-Encoding: gzip, deflate header was not sent. This kind of bot detection is not only useless, it actually compromises all the sites that are co-hosted by the provider.

Other OnionScan News

We released OnionScan 0.2 which contains many new feature & analysis tools.
Onionscan has a new website where we will be advertising releases, uploading documents & tutorials and listing reports.

Get Involved

If you would like to help please read Sarah's post OnionScan: What's New and What's Next for some great starting off points. You can also email Sarah (see her profile for contact information).

If you would like to support further research and development of tools like OnionScan you can become a patron at Patreon

Goals for the OnionScan Project

Increase the number of scanned onion services - We have so far only successfully scanned ~6500 (out of ~12,000 domains scanned).
Increase the number of protocols scanned. OnionScan currently supports light analysis for HTTP(S), SSH, FTP & SMTP and detection for Bitcoin, IRC, XMPP and a few other protocols - we want to grow this list, as well as provide deeper analysis on all protocols.
Develop a standard for classifying onion services that can be used for crime analysis as well as an expanded analysis of usage from political activism to instant messaging.

Dark Web Diaries: Detecting Co-Hosted Hidden Services without mod_status Leaks

Sarah Jamie Lewis — Thu, 17 Nov 2016 01:00:39 GMT

I've talked a bunch about mod_status leaks, mainly because they are so prevalent.

However, today I want to talk to you about a way to detect co-hosting without mod_status. This attack takes more work to pull off than simply requesting /server-status, but it is easily doable for someone who has the time.

Virtual Hosting

Most web servers support the concept of virtual hosting, that is hosting more than one domain on a single server. The web server interpreted the Hostname header to determine which version of a site to serve.

This works well on the clearnet where IP resolutions are common and it is trivial to work out if two sites are co-hosted.

There are multiple ways of setting up virtual hosting such that a different hostname header will not trigger content to be served. For example, using Caddy we can specify that two hidden services are served on separate ports:

[hshostname].onion:2016 {
 root ./site1
 tls off
}

[hshostname2].onion:2017 {
 root ./site2
 tls off
}

And in our torrc file we can then setup the hidden services like so:

HiddenServiceDir /var/lib/tor/hidden_service_1/
HiddenServicePort 80 localhost:2016

HiddenServiceDir /var/lib/tor/hidden_service_2/
HiddenServicePort 80 localhost:2017

However configuring servers in such a way requires extra work and diligence which is often not carried out.

Hostname Hacking

Because of the above tendency for misconfigurations, plenty of sites, even those that disable mod_status are vulnerable to hostname hacking.

curl -H "Host: [redacted].onion" --socks5-hostname 127.0.0.1:9050 -s [redacted].onion | sha1sum 
d756.... -

curl -H "Host: localhost" --socks5-hostname 127.0.0.1:9050 -s [redacted].onion | sha1sum
423a.... -

As you can see from above trace, changing the Hostname header completely changes the output of the site (and this output change is consistent).

This technique can be used to test for co-hosting leaks in hidden services .

Depending on the configuration of the server, Hostname can be a different onion site, or a gibberish - generally the first case you can confirm co-hosting directly, and in the latter case you can confirm co-hosting by testing that each hidden service serves the same "default" page when messing with Hostname.

This test will be included in OnionScan in the near future.

I have so far been able to use this attack to confirm co-hosting on a ring that did remove mod_status, but did not obviously remove co-hosting. I suspect a large scan trying a bunch of hostnames per site would be fairly lucrative.

This attack can also be used to test whether a site is hosted on Freedom Hosting II - as a separate test from the known SSH fingerprint attack.

curl --socks5-hostname 127.0.0.1:9050 -s [redacted].onion | grep -Po "(.*)"
[Redacted]

curl -H "Host: fhostingesps6bly.onion" --socks5-hostname 127.0.0.1:9050 -s [redacted].onion | grep -Po "(.*)"
Freedom Hosting II

curl -H "Host: localhost" --socks5-hostname 127.0.0.1:9050 -s [redacted].onion | grep -Po "(.*)"
100x Your Coins in 24 Hours - the real underground service

Yeaaaah......

Lesson: Virtual Hosting is a great way to limit infrastructure costs, but be sure you are not shooting yourself in the foot.

If you would like to support further research and development of tools like OnionScan you can become a patron at Patreon

Tipping Point - What I Will Do Now

Sarah Jamie Lewis — Thu, 10 Nov 2016 10:01:15 GMT

I was planning on publishing a new OnionScan report today, but instead I have found myself thinking about the events of the last few days.

This isn't about Donald Trump, he is a symptom of a zeitgeist that has been growing for nearly a decade.

We saw signs many years back in the Greek financial crisis, we see it in the uptick of islamaphobic legislation in France, we see it in the vile language wrapping up the Brexit vote, and now we've seen it's face in this US election.

The nationalist factions inside every border are feeling powerful tonight. They will take the lessons from this campaign and continue to make the media dance to an ever ominous tune.

Fascism is alive, it is strong and, remember, it is always good for the economy.

Progress on trans rights, gay rights, civil rights, human rights will halt, and reverse. The train has been running on borrowed tracks.

What a government is powerful enough to give, it is powerful enough to take. We can no longer afford to have optimism in progress being a one way street.

Climate change, human displacement and the inevitable famine and war that follow them are now almost certain. We are over the edge, through the looking glass and there is no turning back.

Today is not the end for humanity, but it is certainly a fork in the road. One of those forks might see us grow, and love and explore this vast and ancient universe. The other will see our home destroyed, our civilization reduced to ashes before it has a chance to spread it's wings.

I choose to believe that we can grow, and love. I choose to believe that I can live in a world where I don't have to fear holding my partners' hand, where I don't have to fear for my friends around the world.

Where I don't have to fear.

All of that was just words. We need action. We need infrastructure.

Here is what I think I can do, what I am doing, and perhaps what you can do, now, today and over the next few weeks and months - because, this is real life, and there is no next time around.

Start Exercising Compassion - There will be calls for unity. These should be ignored. The liberals will seek to improve the economy while turning a blind eye to the abuses of power to come. Some will fight the traditional political fight, and might even win - but not before huge damage has been done to our families, our friends and our planet.

Fund Crisis Orgs - With your time, and with your money. The next few years are going to be rough for many groups. No government is likely to continue funding their efforts for long. We need to understand that it will get worse before it gets better, and we need to invest effort into bringing as many people as we can along with us.

To start:
- Fund Trans Lifeline - they are going to get many more calls.
- Fund Planned Parenthood - they are going to need to stay open.
- Fund Local Shelters and Food banks - sometimes crisis lasts far too long.
Go Vegan, Be Vegan - I don't care if you do it for the animals, for your health or for the environment - industrial animal agriculture is a major contributor to climate change, and the practices promoted within are deeply unethical and immoral on practically every level. The more we can dent that, the more we increase our species runway. That's the truth.

To start:
- Read 7Day Vegan - for a kickstart.
- Read & Contribute to [/r/vegan] - for support (https://www.reddit.com/r/vegan/) - for
- Eat Beans, Grains, Nuts and other Plants. (In many cases it is that simple)

Start Enforcing Consent - We are all different and I believe that we are all capable of love and compassion. Systems of power are design and structured to remove agency, to divide, and to rule. The only way I can see for all humans on this planet to be free, to have agency is for us to withdraw our consent from these systems, or at least, to start enforcing it.

Use Alternative Information Infrastructures - Rulers can only rule if they have information, about events, about people. Prevent mass spying through encryption and anonymity technologies like Signal and Tor. Without information it is impossible to rule effectively without consent.

To start:
- Use Tor for browsing.
- Use Ricochet for anonymous chatting.
- Use Signal for when chatting when Ricochet doesn't fit (which brings me onto...)
Build and Improve Alternative Information Infrastructures - We need more. We need to decentralize the media, decentralize the news and to decentralize the networks. We need better tools for organizing, ones not controlled by a central corporation, ones that are private by default.

To start:
- Start Talking to Journalists - they are one of the groups that need these tools first, and they are the ones that need to guide their development.
- Build Hidden Services - for news sites, for blogs, for social networks, for instant messaging, for pretty much anything you can send across a computer network.

That's all I've got for now, that's my headspace. I expect this journey to be a long one, and I expect this list will grow and change.

Sarah

Untangling the Dark Web: Unmasking Onion Services - Hackfest November 5th 2016

Sarah Jamie Lewis — Sun, 06 Nov 2016 21:02:50 GMT

The following is a rough transcript from my presentation at HackFest 2016, on the 5th November 2016

Good Afternoon Everyone, I am so excited to be here today to talk to you about anonymity systems and how they break

My name is Sarah , I am in independent anonymity and privacy researcher/engineer. Prior to this I was a security engineer at Amazon working on preventing fraud through abuse of autonomous & machine learning systems, and way before that I was a computer scientist at GCHQ in the UK, doing stuff I can't tell you about.

All the work that I am going to show you today is published on the site mascherari.press - where I write abou anonymity and privacy issues.

So a quick overview of the agenda - I'm going to start off with a very quick introduction to hidden service operational security - I'm not going to bore you with how Tor works, the base model you need for this talk is much smaller. With that I am going to give you a quick tour through my top 5 risks to anonymous systems from some research that I conducted at the beginning of the year.

Then we going to get into the heart of the talk, and I am going to lead you through my last few months of research investigating exactly how anonymous the dark web is. And with that we are going to look at a few case studies - important to note that I while I am going to talk about deanonymziation vectors I am not going to directly reveal information which would lead to deanonymization.

Finally, I'm going to end with some thoughts on future research and talk about how we can start to fix some of the issues that I have identified.

I am going to talk to you about hidden services, in particular how to deanonymize hidden services, quickly, at scale and very cheaply. This is isn't a new crypto breakthrough and this doesn't impact the tor network or any other anonymity network. Instead, everything I am going to demonstrate today can be traced back to bad assumptions made y the operators or software designers about what kinds of information we need to protect.

The most important thing that you need to keep in mind with Hidden Services is that they only seek to hide location - that is IP address - by design. The services hide behind an anonymity network like Tor and so clients never learn the services IP addresses and the service never learns the clients IP address. Nothing about the design of hidden services prevents them from revealing information about themselves - for example, I operate a bunch of hidden services, all of them are linked back to by clearnet identity - It's really easy to find out that I run them.

This talk is going to focus on unintentional identity leaks in hidden services - from the really obvious to the horribly persistent - to the ones that are only an issue if your adversary correlates across the whole dark and clear web. But as we will see, that is really not that hard.

At the beginning of this year I conducted my first mass scan of ~8000 dark web site looking for well known, well document misconfigurations. Something to note about these kinds of scans is that they are done over multiple days and weeks as the uptime of many hidden services is volatile. After extensive manual analysis I came up with the following Top 5 risks to hidden services, based on the numbers and severity of issues seen in the wild.

The first one is open directories - these are very common on the clearweb too, but they can be particularly bad for hidden services. Administrators tend to like to leave things lying around in directories they think no one will look at, or they like to put information in commonly named folders like "backup" - the screenshot on the slide shows a directory listing I found on a service- each of those folders contains the hosting files for 22 hidden services - letting us make the determination that all those hidden services are hosted on the same server - these include multiple drug marketplaces and a social network - knowing that hidden services are co-hosted is very useful if you are trying to work out if those services might be related. Perhaps worse than that is the top folder which reads Backup - and contained a bunch of sql database dumps and config files for one of the sites.

I've also found large image caches, a backup of someones trello board containing their homework plans - and a bunch of zip files containing various content.

Number 4 is EXIF metadata - this is data encoded along with an image that tells you information about where it was taken, on what kind of device, or the software the image was edited on etc. EXIF images are not as big as a problem as they were a few years ago - you can still download scraps of marketplaces captured then and find images of cocaine for sale with GPS coordinates which you can pin down to a house in Boston and using streetview you can pick out the window in the background where the photo was taken.

Recently most large marketplaces now re-encode all the images that are uploaded to them, ensuring no metadata is left for analysis - however on small sites or vendor pages you can still often find images containing EXIF metadata - although GPS data is much, much rarer than it used to be. The screenshot is a collection of really off image editing software tags I pulled put of a marketplace - this could be very useful if you are trying to learn more about the person making these images,do they use Windows or Mac, are they using random software, are they using software that came bundled with their camera? etc.

Number 3 is cloned sites - During our scan we found that 29% of onion services have at least one duplicate - and by duplicate I mean the sha1 hash of the pages content is identical - which is a high bar for duplication - more recent work by myself and others shows this number is likely higher once you account for certain dynamic content on the page.

Many of these duplicates are intentional - either because a site is load balancing across multiple onion domains - or they have spammed 100 different versions of their site. However there are a significant number of these that are, so called, cloned sites - these are sites which silently proxy request from one domain to the legitimate domain and spy of the traffic in order to replace things like bitcoin addresses etc. Often the only way to detect these is to look at every image image on the site and to see if there is a watermark or similar information that matches the onion site you are visiting.

Number 2 is SSH Fingerprinting - also under here I group software banner fingerprinting - this is the idea that certain things about the software you are using are unique to you - the key one, if you are exposing an SSH server along with you site is the public key. Each server generally has a unique public key and so we can determine if two hidden services share a server if they both share an SSH fingerpint - also if you have misconfigured you server and the ports are available to the clear internet too then you are likely to be unlikely enough to end up in Shodan - and then it is a simple case of looking up your SSH public key to find you hidden servers IP address.

I have found at least 30 sites that are vulnerable to just SSH fingerprinting through Shodan. And it isn't just SSH keys, I've observed the same deanonymizations with unique combinations of server version (e.g. a particular version of php and a particular perl and a particular python etc.) those combined can get unique quickly, and Shodan is very helpful in filtering out those!

Number 1 is local host bypasses - these occur when the software you run on your hidden service gives special permission to localhost traffic - or assumes that you have done extra configuration before putting it on the internet - the major one of these is Apache mod_status. When I scanned back in April I found that 6% of the servers I scanned had an Apache localhost bypass revealing everything from IP addresses of the server, to other sites hosted on the same server, all the way up to the IP addresses of clearnet users in the cases where the clearnet and dark web sites where hosted on the same server.

These kinds of bugs can be devastating and they are really common. If you just consider Apache installs then 12% of all Apache installs have mod_status leaks - and despite publicity over the last few years that number is going up not down.

It's not just Apache, there are a bunch of software installs which don't do the necessary diligence on localhost connections - I have found open i2p console, wikis and even a few phpmyadmin instances not locked down - In one case the phpmyadmin was clearly on the data receiving end of a botnet each table entry contained a request from a clearly compromised computer. - which goes to show these things can be really bad.

So back in April after all this examinations I launched a tool called OnionScan which checked for all of these things, and many more - it's had really positive reviews from the clearnet and the darknet, I know it is being used in practice by many darknet site operators to protect their pages. I've also been able to work with universities, journalists and others sites where I have found these issues and helped them correct.

So that was April - since then I have continue to work on OnionScan and expanded it to a wider project, -these are our goals - number 1 of which is to map the dark web, to understand what people are using it for, to understand where people are using it insecurely, and to make recommendations and build software to actively counteract that. Sub goals of that are to increase the kinds of protocols we scan and the kinds of identifier correlations we uncover - since the initial release of OnionScan the community has been awesome at providing new protcols and new identifier correlations - apart from just HTTP, SSH and FTP we no have Bitcoin, VNC, XMPP, IRC and a bunch of others.

And since April I have also been busy scanning the dark web - every month I produce a report on some aspect of dark web usage , usually focused on security, opsec. And I am not going to lead you through the kinds of information you can extract from anonymity networks when you start correlating everything you scan.

This image is a graph of the dark web - it's very pretty, at least I think so, it's worth noting that this is a transitive reduction of the actual graph - it is that way because the actual graph as way too many connections to be visually useful - so keep that in mind as you look at this - where you see chains of nodes on this graph all of those nodes can be connected to each other in the original graph.

There are 4 colours of links on this graph Green, Pink, Blue and Red. Green connections are the least ominous they are simply hyperlinks between services - what you see on this graph is a lot of starburst links - these are sites that link to a lot of other sites, you also see a few small clusters where all the sites link to each other - these are particular interesting as this kind of link connectivity might mean an underling relationship, or less ominously the same ad network!

Pink links are sites that share the same FTP server, there is one hosting provider in particular that exposes an FTP server on each of the hidden services it hosts - that makes it trivial to determine if a site is hosted by that hosting provider.

Likewise the massive blue ring all around this graph are SSH servers. Freedom Hosting II is one of the largest hosting providers on the dark web, by my estimates they host between 10 and 20% of ALL stable hidden services, and I know that because every single one of their sites has an exposed SSH server meaning we can link them. These kinds of software correlations also exist for SMTP servers although they are rarer.

The freedom hosting case is particular important because they have been having hosting problems recently and completely dropped off the dark web for most of September, and only just reappeared again a couple of weeks ago - I now have a job that regularly tracks the uptime of various freedom hosting servers because when it disappears it takes a lot of varied content with it - and that's important for reasons I will come to at this end of this presentation.

Finally, it may be tricky to see but there are a few small clusters of red links on this graph - these are hosts we have been able to link together because they expose an Apache mod_status page and on that page lists all the other hidden services hosted by the same service - we have found 11 distinct clusters of services that are like this, and this tells us something interesting, especially combined with the SSH and FTP results I just mentioned...

It means that the dark web is actually, physically, much smaller than previously estimated - it looks like at least 50% of dark web sites are hosted by only 20-50 entities - whether these are dark web hosting providers, or dedicated operators, or simply 1 cartel running 5-10 different front sites. This has huge implications for estimating the amount of crime and other activities that is facilitated using anonymity system. 4 drug marketplaces might only be run by 1 group in which case is that 4 drugs sites, or 1 black market group? How we count these things has large political implications. Something I'll revisit soon.

I now want to move onto HTTPS/TLS certificates. Back in July I spent some time analyzing how dark web sites were using, (and mostly abusing) HTTPS. It's worth noting here that there is very little reason, security wise, for a dark web site to opt for a HTTPS certificates - hidden services provide end to end, forward secure encryption- by laying TLS on top the only thing you are really buying is identity verification - and this is why we see only 7 sites having legitimate HTTPS certificates - these are privacy charities like Privacy International, with Facebook and Blockhain also having a legitimate HTTPS certificate.

But this map shows many more than 7 sites, and this is because there is an awful lot of HTTPS misconfiguration going on - the bright cluster in the top right are let encrypt certificates, often signed for clearnet domains and then served on dark web sites, and that is a trend that you see for the rest of these, many of these services are accidentally exposing port 443 to the dark web and by doing so are leaking their clearnet certificates - most of the time this isn't that big a deal because the site is declared on the clearnet and the owner is clear, but there are definitely a few cases where this leak is purely accidental - my favorite is an anonymous hacktivist site which serves the https certificate for someones personal blog.

There are also a few clusters on here with self signed certificates, these are the most interesting as we can often link seemingly unrelated sites by using information that leaks in the self signed certificate - things like emails, hostnames, IP address etc can leak into certificates and can be used to cluster sites - so you can see on here a few distinct clusters that are connected by unique TLS properties.

Now onto a fun story about google analytics IDs, google analytics as you may know is an analytics platform provided by google, and to use it you past some javascript into you site, and that javascript contains a unique ID. Similairy if you are hosting adsense ads on your site you get another piece of javascript with another unique ID.

We have detected a bunch of sites with the analytics and publisher IDs, the largest of which was a bunch of casino sites, some exclusively on the dark web, some with clearnet versions - but all of them with the same anaytics ids - these sites had different names and themes - some from roulette some for poker etc. but we managed to link more than 100 gambling sites, pretty much the entire dark web gambling category to the same analytics id, and thus the same operator.
Revisiting a point from earlier, this means that while it may look like there are 100 gambling sites on the dark web, and this is what a few studies have published, there aren't, there are 100 front sites for the same operation. This is a point I'm going to keep hitting on.

It wasn't just the casino sites, we also found a number of other sites we could link through these identifiers. I even found one example of a drug market which shared an analytics ID with a legitimate clearnet business...Ill let you draw your own conclusions about what that could mean.

Everyone's favourite cryptocurrency bitcoin is widely used on the dark web. This is actually my first attempt at graphing out sites which shared the same bitcoin address, as you can see it's pretty much a big orange blob - when you actually do a transitive reduction on this graph what you get is a graph covered in orange circles showing that there are many many sites that share bitcoin addresses.

These connections span the rang between obvious clones or duplicate sites, to hidden relationships. One very common thing we saw was small marketplaces that reused bitcoin addresses for payment - which means you could try to by an apple laptop on one site and some cocaine on another site and end up sending bitcoin to the same bitcoin address - it also makes working out how much each dark web site is earning, or how many people are falling for bitcoin scams very easy.

The screenshot above shows two paste bin sites that I found, both French - I actually thought something was broken with the correlation when I first checked this, because while the first site clearly has a bitcoin address on its page, the second one does not. It turns out that the bitcoin address was hidden in the comments - it looks like either one site copied the other and chose to comment out the address instead of remove it - the other option is that this site is run by the same owner who, for whatever reason, chose not to collect bitcoin on the second site.

So that was a very quick overview of a number of different correlations that you can do on the dark web - there are many, many more - Last weekend I released OnionScan 0.2 which greatly improves of the April release, and also feature a built in correlation lab - OnionScan now does all of these correlations for you and allows you to shift through hundreds of thousands of correlations quickly and efficiency.

I want to really emphasis that these aren't academic thought exercises, you can download onionscan onto a fresh digital ocean install, run a list of onions that you can pull from any one of clearnet sites and within 1 or 2 days you will be able to deanonymize or other wise find out information about a large portion of the dark web.

Casting an eye towards the future, I'd like to give you a sneak peek at this months OnionScan report that I will be releasing next week.

The new version of OnionScan doesn't just have preconfigured correlations - it also allowed users to define their own - as part of that I have been running scans against various darknet markets pulling down things like listings, vendors, shipment sources etc. And I am using this data to pull out a lot of information about how these markets are used.

The map above shows countries where vendors claim that they ship from on the Valahalla market. This information is generally given to allow users to make decisions about where they buy from - for example it is generally far riskier to ship inside a country than it is to ship across borders - although some vendors and buyers are willing to do so.

As you can see by this image the darker countries where the most listings originate from are the USA, the UK and though you can't really see it on this map - the Netherlands and Germany are also pretty dark.

The October OnionScan report will be out next week with much more analysis on the vendor makeup of some dark markets, and I will be expanding this research in the coming months.

As I have mentioned throughout the presentation that many of the findings have an big impact on how we think about classifying onion services.

You can find multiple studies, like the one from
DeepLight earlier this year which reported that greater than 50% of Onion Services contained illegal content.

However, such studies never take into account data that I have talked about today - co-hosting analysis, duplication, group analysis - this means that their reports are, to put it frankly, wrong.

You cannot simply count hidden services and report the number. Crime sites are likely heavily over represented in such counts because they are the ones that are worth cloning, and they are likely the ones that have load balanced over multiple domains.

To make this a little better, I have started a project to manually categorize 6000 onion domains, including performing proper duplicate and cloning analysis, and, if I can work out a way to do it properly, grouping analysis. . I am about half way through, and it is slow going - but once it is done, I am hoping it can advance the conversation about the uses of the Dark Web.

Looking further into the future - how can we make the dark web better? Currently the Dark Web is more like geocities than an underground cyberpunk net.

There are technologies that are changing the balance. OnionShare and Ricochet use hidden services to remove metadata from filesharing and instant messaging respectively.

One I am particularly excited about is OpenBazaar a peer to peer marketplace platform - that isn't yet anonymous, but there are plans to make it so.

Peer to peer tech is much less prone to the attacks I have described, because they don't have to content with the large attack surface of general web platforms - and they can be designed with that in mind.

Personally, I think another trend evident in these new technologies is a move away from the browser. As awesome as the effort that has been put into the Tor browser, and indeed any other browser, is - the fact remains that browsers are littered with 0-days - and when you are relying on tech to preserve your life or your freedom, you need tools with a much smaller attack surface than a browser.

This week Quebec has been rocked by news that local and provincial police forces have been spying on journalists, in an attempt to determine their sources.

Such behavior has no place in free and just society - and while it is important to let the judicial process unfold, it is clear that power, in any form will always seek to perpetuate itself.

Using these tools, you take that power away and distribute it among the people - this is by no means a free pass to be unethical - there are very few crimes in a free society that can be undertaken entirely online - the trafficking of guns, humans and child exploitation requires a real world component and it requires trust - this trust can always be broken by a dedicated investigation task force - people are rarely infallible.

But by encouraging the adoption and betterment of these technologies I advocate for the adoption of a freer world, a world where governments can't blanket spy on an entire population, where whistleblowers are able to report unethical practice, where domestic violence victims can escape and where queer people can explore their gender or sexuality without facing judgement or abuse of an oppressive society.

In Summary, Anonymity is hard, our tools suck, our software sucks, people don't know the risks, and right now a large portion of the dark web really isn't that dark.

From simple exploitation to large scale correlations it is often trivial to uncover links and even direct deanonymizations of gun markets, illicit file sharing and also sadly political blogs and journalistic outlets

If this stuff interests you and you would like to read the full reports around these issues you can find them and many others on mascherari.press - If you'd like to get involved in OnionScan,or would my help exploring this space you can also email me@sarahjamielewis.com or follow me on twitter @SarahJamieLewis - I tweet about this stuff on a daily basis.

Thank You for listening.

If you would like to support further research and development of tools like OnionScan you can become a patron at Patreon

Operation Hyperion: Netherlands Law Enforcement Troll Dark Market Vendors

Sarah Jamie Lewis — Mon, 31 Oct 2016 21:30:00 GMT

The Dutch National Prosecution Service and police launched an Hidden Service on the darknet today. The launch was announced by the Dutch Public Prosecution Service (Openbaar Ministerie, OM) who claimed that it took place within the framework of ‘Operation Hyperion’.

The site lists a number of usernames for online dark market vendors, based on the Netherlands - it is common on dark markets for vendors to list their origin country, allowing buyers to select the best option for them - as this can often greatly reduce the risk of shipping related security issues.

Among the vendors listed are vendors who sell across major darknet markets including AlphaBay, Hansa and Valhalla:

DutchCandyShop
FrankMatthews
Etos
DutchFarmerNL
DutchMagic
DutchDelights
FromAmsterdam
DUTCHRABBIT2
Partyflockcrew
DCDutchConnectionGroup
PartySquadNL
DrugsFromAmsterdam
QualityWhite

Of the claimed vendors that have been arrested, most appear to from small of defunct markets (BMR, Evolution and Silk Road 1 - suggesting that these arrests have originated from evidence collected during the seizure of old markets, and are not the result of exploiting flaws in newer markets.):

HighQualityTrips
RuudNL(info)
XTCExpress(info)
TheHeineken(info)
AmsterdamUnited(info)
HollandOnline(info)
LowLands(info)
AlbertHeijn(info)
The Flying Dutchmen(info)
HellsGate(info)
VitaminStore(info)
Chiquita(info)
SaltnPepper(info)
Supertrips(info)

According to ICE "Operation Hyperion resulted in a number of law enforcement leads on cases related to the buying and selling of illicit drugs and other goods on the Darknet."

However, we suspect that this is mostly a PR stunt and a little troll based on the websites FAQ section where the agencies provide advice about what you should do if you have been falsely listed:

My username is falsely mentioned

If you think that this website mistakenly mentions your username on this website please get in touch with the Dutch police.

If you would like to support further research and development of tools like OnionScan you can become a patron at Patreon

What We Can Learn from the ICE Darknet Investigations Feature

Sarah Jamie Lewis — Sun, 30 Oct 2016 21:56:19 GMT

The U.S. Department of Homeland Security: Immigration and Customs Enforcement (ICE) recently published a feature on their darknet investigations. Overall the feature adds up to more PR than insight - however there are a few things we can take away from what ICE has published.

As always, words seem to mean very little to government departments...

Takeaway 1: Investigations Often Begin (and End) in the Postal System

Cyber investigations sometimes begin as traditional in nature then progress into the cyber environment. HSI was one of the primary agencies on the Silk Road investigation that revealed large-scale illegal drug and contraband smuggling through the U.S. Postal Service.

Postal service tracking and interception has long been known to be a key law enforcement strategy in drug trafficking investigations

Littered through the video, text and images are references to previous investigations nearly always starting with an intercepted package - this should come as no surprise - one of ICE's key mandate is preventing the smuggling of goods across the US border - so parcel inspection and the resulting investigations are right up their alley.

No doubt by now some areas of the governments involved in darknet operations are tracking the rise of new darkweb marketplaces and building intelligence - but, without hard evidence, in the form of intercepted contraband - it is difficult to build any kind of case.

Takeaway 2: ICE does a huge about of Image Analysis

While nothing is said in the text or the video, the images attached to the article, tell an interesting story about the roll of image analysis in ICE's overall approach to investigations.

At least 3 separate images (about 50% of the images not related to showing darkweb sites) appear to show different stages of image analysis - whether analyzing pictures directly, looking for camera artifacts or metadata analysis.

Pictures are often worth 1000s of words, and with various opsec mistakes by vendors meaning there is a treasure trove of data out there waiting to analyze. Just last month we have seen vendors making trivial mistakes when posting photos of their operations - it would seem this behavior has not gone unnoticed.

Takeaway 3: ICE wants us to believe the dark web is all bad.

At times the video attached to the article borders on bad satire. The below image saying it all, while provided as an example of how seedy the darkweb is, it doesn't show a dark web site - in fact it doesn't appears to show an illegal site - live sex web cam sites are not illegal in most jurisdictions relevant to this discussion.

One has to wonder how ICE stumbled across this page when their main jurisdiction is customs enforcement...

So why would ICE be attempting to make this connection between the dark web and (legal) sex? I suspect because sex, especially in the US, is still seen as something to hide away in the shadows - much like drugs and rock 'n' roll (ok, maybe not so much rock 'n' roll) - and as such those links are intended to brand the darkweb as something that no regular citizen should ever be find themselves dealing with.

Whatever the true intent of ICE, it must be seen as harmful - darknets have plenty of legitimate purposes - anonymous publishing, private instant messaging, simple file sharing, whistleblowing and many, many more.

Despite, and in someways because of, the misleading PR, the feature itself is a small window into how ICE perceives itself, the work it does, and the kinds of work it needs to be doing in the future - and these kinds of insights are essential for defending darknets, and their applications, against future adversaries - be they government departments, private corporations or open source investigation toolkits

Building Dark Web Bots with OnionScan Custom Crawls

Sarah Jamie Lewis — Sun, 30 Oct 2016 05:34:03 GMT

OnionScan 0.2 has been released! This article will take you through one of the newest and most powerful features - custom crawls.

OnionScan 0.2 has been Released! https://t.co/qb4WAGRIiT - Your new Dark Web Correlation Lab awaits! pic.twitter.com/OFTgjNz4EY
— OnionScan (@OnionScan) October 30, 2016

You may already know that OnionScan comes packed with logic for pulling out many different types of identifiers from onion services e.g. bitcoin addresses, pgp keys and email addresses.

However, many services publish data in non-standard formats, making it difficult for any tool to automatically process it.

OnionScan helps solve this problem by providing a way to define custom relationships for each site - these relationships then get imported into it's Correlation Engine letting them be discovered, sorted and correlated like any other identifier.

As an example, let's look at Hansa Market. If we were investigating this market we would like want to know what listings were available, for what categories and who was selling them. It turned out we can get all of this information from the /listing page of a product:

Before, we would have to build a custom web crawler to pull the data down, process it and put it into a form we can analyze. With OnionScan 0.2 we just need to define a small configuration file:

{
    "onion":"hansamkt2rr6nfg3.onion",
    "base":"/",
    "exclude":["/forums","/support", "/login","/register","?showFilters=true","/img", "/inc", "/css", "/link", "/dashboard", "/feedback", "/terms", "/message"],        
    "relationships":[{"name":"Listing", 
                     "triggeridentifierregex":"/listing/([0-9]*)/",
                      "extrarelationships":[
                            {
                              "name":"Title",
                              "type":"listing-title",
                              "regex":"(.*)"
                            },
                            {
                              "name":"Vendor",
                              "type":"username",
                              "regex":""
                            },
                            {
                              "name":"Price",
                              "type":"price",
                              "regex":"(USD [^<]*)"
                            },
                            {
                              "name":"Category",
                              "type":"category",
                              "regex":"([^<]*)",
                              "rollup": true
                            }
                      ]
                    }
                    ]
}

That's a lot to take in so we will break it down.

The first two configurations specify the onion service we are targeting ("onion":"hansamkt2rr6nfg3.onion") and the base URL we want to start scanning from ("base":"/"). Some onionservices only have useful data in subdirectories e.g. /listings in that case we could use base to tell OnionScan to ignore all other parts of the site.

The next configuration exclude tells OnionScan to exclude certain links like "/forums","/support", "/login","/register" - these are links which we don't want to click on because they take us offsite, or perform actions we don't want to take.

Finally we have relationships and this is where our custom crawl logic happens.

A relationship is defined by a name and a triggeridentifierregex - the regex is applied to the URL of the site and when it matches the rest of the rules in the relationship are triggered. In this case we tell OnionScan that URLs matched "/listing/([0-9]*)/" will trigger the Listing relationship. OnionScan will also treat the number in the url (([0-9]*)) as a unique identifier for the relationship.

Next each relationship can have extrarelationships - these are relationships that OnionScan will look for and assign to the unique identifier that we have extracted above.

For example, in our configuration file we define 4 extra-relationships Title, Vendor, Price and Category. Each extra-relationship as a name, a type - which OnionScan uses in it's Correlation Engine and a regular expression regex. The regular expression is used to extract the relationship from the page that we have previously triggered.

For the Hansa market example, we can see that from the /listing/ page for a product being sold we can grab the vendors name by looking for a hyperlink with the structure. Similarly we can find the title, price and category of the listing by searching for similar structures.

The rollup parameter under Category is an instruction for OnionScan to calculate statistics on the different kinds of Categories we find, and the graph them in the Correlation lab.

At this point we have told OnionScan how to read a marketplace listing from Hansa market, but how does OnionScan use it?

Placing the configuration above in a folder called service-configs we can call OnionScan to scan the market with the following:

./onionscan -scans web --depth 1 --crawlconfigdir ./service-configs/ --webport 8080 --verbose hansamkt2rr6nfg3.onion

After letting OnionScan run for a while, you can navigate to localhost:8080 and enter hansamkt2rr6nfg3.onion in the search.

Scrolling down the list of relationships you should eventually find something that looks like this:

As you can see OnionScan has taken our small config file and transformed it into relationships capable of working with OnionScan's Correlation Engine. Each of the those relationships we defined earlier is now searchable and can correlated against anything else that OnionScan has found - for example - if we were to scan another marketplace of forum where a vendor had reused their name or product title then we can find relationships across onion services!

The graph is generated because we told OnionScan to rollup the Category relationship that we defined earlier.

We hope that you find this feature as powerful as we do, and that users start to maintaining and sharing configurations for all kinds of hidden services.

This is just the start! There are many more features that we want to add to OnionScan - come help us by joining the discussion on Github!

If you would like to support further research and development of tools like OnionScan you can become a patron at Patreon