Introduction to Adversarial Machine Learning

Practically every technology company is now using machine learning in their day-to-day operations. The statistical algorithms that were once reserved for academia are now even being picked up by more traditional industries as software continues to eat the world.

However, in all the excitement there has been one element of the problem that is still left behind on the academic shelf, and that, of course, is security.

While machine learning has made it into the security field; being used in the fields of intrusion detection and malware classification, among others - the underlying algorithms have received little attention.

A quick search on using machine learning in your applications will provide plenty of articles documenting best practices, however, few will cover the techniques and design decisions that you will need to adopt to keep your models safe.

This is a collection of resources, articles, tutorials and guidance to help you adopt secure practices for machine learning.

Introduction / General Overview

Adversarial machine learning is a research field that lies at the intersection of machine learning and computer security.

All machine learning algorithms and methods are vulnerable to many kinds of threat models.

The Machine Learning Threat Model

At the highest level, attacks on machine learning systems can be classified into one of two types: Evasion attacks and Poisoning attacks.

Evasion Attacks

The simplest kind of attacks of models are attacks which attempt to bypass the learning outcome.

A generalized model of how an evasion attack works. In the real world an attacker may only be able to observe partial or derivative outcomes e.g. a spammer will be able to record how many clicks a given link in an email body gets - and expect this to go up if they have successfully evaded a spam filter.

For example, an attacker who wishes to send spam emails could first try a number of different email contents against the model to try and discover a way to get their spam email classified as innocuous.

Relatedly, an attacker may try learning how to trigger positive matches in a given model in an attempt to drive up the rate of false negatives so as to make the model practically unstable.

Poisoning Attacks

An attacker may focus on influencing your training data in an attempt to influence the learning outcome

A generalized model of how an poisoning attack works. Real training data is tainted with data from a malicious source.

For example, an attacker who knows that network traffic is currently being collected to train a classifier that detects anomalous traffic can then send traffic to that network such that when the model is built it will fail to classify the attacks connections as out of the ordinary.

Various domains are very susceptible to poisoning attacks e.g. Network Intrusion, Spam Filtering and Malware Analysis - however, poisoning can occur in any area where training data isn't thoroughly validated and verified before being incorporated into the model.

Beyond Evasion and Poisoning

Researchers have constructed a taxonomy of potential attacks targeted at statistical systems. Using that taxonomy we can classify attacks using 3 properties:

Whether the attack is Causative, where the attacker is able to influence the training data; or Exploratory, where the attacker is unable to influence the training data and instead can only identify learning outcomes.
Whether the attack focuses on Integrity, where an attacker attempts to learn how the model treats data and to construct samples which violates the goals of the model, Availability where the attacker tries to overwhelm the system with false positives; or Privacy where the attacker tries to gain information from the model itself.
Whether the attack is Targeted, where the attack is focused on manipulating a small set of features or points (e.g. get my spam email through the classifier); or Indiscriminate where the attack has a more flexible goal (e.g. get any spam email through the classifier)

Machine Learning Security Best Practices

Like all security, the solution to adversarial machine learning is layered defenses - and there is no one-stop patch. However, there are a few techniques that can be used to minimize your risk footprint:

1. Understanding Training Data

Being able to identify the sources of the data and evaluating the likelihood of compromise is a good first steps to understanding the level of risk associated with incorporating it into your model.

2. Sanitizing Your Training Data

When it is not possible to check every source - or there is simply a high level of risk evident in the problem domain e.g. network intrusion detection. Then it is necessary to buidt checks into your training method.

A Reject on Negative Impact defense measures the impact of adding each new training sample, and discards samples which have a significantly negative impact on the learning outcomes.

A generalized model of a RONI defense against new, potentially malicious, data sources.

This can be done by training 2 classifiers - one with the base training set, and another with the base training set and the new data sample. Both classifiers are then subjected to a set of tests with known labels to determine their accuracy.

If the 2nd classifier is significantly worse than than the 1st then the new training data can be discarded marked as malicious.

3. Examine Your Algorithms

Certain machine learning algorithms e.g. ANTIDOTE have combined existing algorithms with techniques from the field robust statistics. The hardened algorithms that assume that a small portion of data is likely to be malicious and have built in countermeasures to limit the impact of poisoning.

Wait! What about Evasion Attacks!?

Unfortunately, combating evasion attacks is the same problem that lies at the heart of machine learning - better algorithms will, by definition, be better at thwarting evasion attacks.

The best that you can do is understand that this is a problem inherent to every machine learning approach and focus on building layered defenses such that a failure in one model doesn't mean a security disaster for your application.

Adversarial Machine Learning Literature

If you would like to dive into the academic research - and I encourage you to

do so - I have collected a number of different articles below.

General

Support Vector Machines

Neural Networks / Deep Learning

Online Learning

Classification

Clustering

Is Data Clustering in Adversarial Settings Secure?

Application Specific Resources

There are many domains in which knowledge of adversarial machine learning isn't just useful - it is necessary. Below are a collection of domain-specific papers, categorized by area.

Intrusion Detection

Spam Filtering

Biometrics

Poisoning Attacks to Compromise Face Templates

A note on completeness and future work.

Adversarial Machine Learning, much like machine learning in general, is a very new field and new papers are written and published all the time.

As work becomes available I will endeavor to keep this page up to date.

If you know of a paper related to #amlsec which isn't listed above please get in touch - I would love to add it to the collection (and read it myself!)