Securing Machine Learning: Adversarial AI & Attacks (part 1)

In recent years, the surge in Machine Learning (ML) performance, fueled primarily by Deep Learning (DL), has elevated its practical significance across various domains, including speech, image, and text processing. As ML techniques find applications in critical settings, such as the use of Convolutional Neural Networks (CNNs) for road sign recognition, concerns about the robustness and security of ML models intensify. The potential consequences, like autonomous vehicles failing to recognize a STOP sign, underscore the indirect impact on safety.

However, with the rapid adoption of ML in systems involved in autonomous decision-making, a new challenge emerges—adversarial attacks. These attacks demonstrated through various methods in recent years, exploit the vulnerability of ML models, aiming to cause target misclassification.

Adversarial AI

Adversarial AI is a new and growing research branch that presents many complex problems across the fields of Artificial Intelligence and Machine Learning. The adoption of Artificial Intelligence solutions is growing wider and wider and preventing adversarial learning attacks has become a priority in many industries. Adversarial attacks threaten to undermine the accomplishments of machine learning putting at stakes its further adoption.

Adversarial Attacks are grouped into four main groups that vary in terms of the goal of the attack or the information that the attackers hold.

The four versions of Adversarial Attacks

Evasion: In this type of attack, the attacker aims to avoid detection in order to throw off the classifier and cause the model to misclassify the example. This kind of attack takes advantage of the model’s flaws to, for example, elude detection models or trick classification models

Poisoning: In this type of attack, the attacker aims to influence the model at training time by introducing perturbed (poisoned) data into the training set. It has the same goal as Evasion attacks, but it happens during the creation of the model in order to have what has been called a “backdoor” that enables the attacker to create data that will be misclassified by the model at inference time.

Extraction: In this type of attack, the attackers aim at reconstructing the data on which the model was trained. This usually happens in a black box scenario, when the attackers do not have any details on the model’s structure and parameters or the data it was trained on.

Inference: In this type of attack, the attackers aim to fool the model by exploiting some vulnerabilities. These types of attacks do not corrupt the model in any way but manage to extract knowledge that was not meant to be shared from the model.

The AI4PublicPolicy project primarily focused on Poisoning and Evasion attacks, as they are the most prevalent, as reported in ‘Adversa: Report on The Road to Secure and Trusted AI.’ These types of attacks can be addressed without the need to alter the model’s architecture or retrain it. In the context of the project’s use case and the perspective of reusable policies, it is crucial to avoid changing a model’s architecture to ensure robustness against attacks.

Stay tuned for the second part of our blog, where we will explore Adversarial Attacks in more detail.