noodls browser compatibility check

The security settings of your browser are blocking the execution of scripts.

To use noodls, javascript support must be enabled. Please change your browser's security settings to enable javascript.

If you have changed your browser's security settings, you can click here.

related announcements

News

U-Haul International Inc.

Tornado Recovery: U-Haul Offering 30 Days Free U-Box Storage in Nebraska
MPDC - Metropolitan Police Department

MPD Arrests Suspect in Armed Robbery and Burglary
State of West Virginia

Energy and Environmental Law CLE

Technology

Splunk Inc.

03/26/2024 | News release | Archived content

Adversarial Machine Learning & Attacks on AIs

There's a catch to Artificial Intelligence: it is vulnerable to adversarial attacks.

Any AI has the potential to be reverse engineered and manipulated - due to the inherent limitations in its algorithms and training process. Improving the robustness and security of AI is key for the technology to live up to its hype, fueled by generative AI tools just like ChatGPT.

Enterprise organizations are readily adopting advanced generative AI agents for business applications ranging the gamut:

IT service management and service delivery
Customer support and call centers
Business process optimization
Product and marketing, including product analytics and web analytics
And many more applications and experimentations

In this article, we will discuss how both the neural networks training process and modern machine learning algorithms are vulnerable to adversarial attacks.

Defining adversarial ML

Adversarial Machine Learning (ML) is the name for any technique that involves misguiding the neural networks model and its training process in order to produce a malicious outcome.

Associated with cybersecurity, adversarial AI can be considered a cyberattack vector. Adversarial techniques can be executed at several model stages:

During training
In the testing stage
When the model is deployed

How neural networks train

Consider the general training process of a neural network model. It involves feeding input data to a set of interconnected layers representing mathematical equations. The parameters of these equations are updated iteratively during the training process such that an input correctly maps to its true output.

Once the model is trained on adequate data, it is evaluated on previously unseen test data where the training is no longer performed - now, the model performance is evaluated.

Adversarial ML attack during the training stage

An adversarial ML attack during the training stage involves the modification of input data, features or the corresponding output labels.

Problem: Manipulating training data distributions

A model trained on sufficient data can model its underlying data distribution with high accuracy. This training data can belong to a complex set of data distributions.

An adversarial machine learning attack can be executed by manipulating the training data such that it partially or incorrectly captures the behavior of this underlying distribution. For example, the training data may not be sufficiently diverse, it may be altered or deleted.

Problem: Altering training labels

The training labels may be intentionally altered during the training stage. During the training process, the same model weights or parameters guide the model trajectory to a fixed decision boundary.

By altering the output class, features, categories or labels of the input data, the trained model weights cannot guide the output outside of this decision boundary and therefore produce incorrect results.

Problem: Injecting bad data

The training data may be injected with incorrect and malicious data. This process may subtly shift the decision boundary such that the evaluation metrics are generally within the acceptable performance thresholds, but the corresponding output classification may be significantly altered.

Adversarial impact on black box systems

Another important type of adversarial attack involves a framework that exploits an inherent problem in AI systems: most AI models are black-box systems.

In black-box AI, the systems are highly nonlinear and therefore exhibit high sensitivity and instability. These models are developed based on a set of input data and its corresponding output. We do not (and cannot) have knowledge of the inner workings of the system, but the model correctly maps an input to its true output.

White-box systems on the other hand are fully interpretable. We can understand how the model behaves and we have access to the model parameters with a complete understanding of its impact on the system behavior.

Black-box system attacks

Adversaries cannot obtain knowledge of the model underlying a black-box AI system. However, they can use any synthetic data that closely resembles the input and output from such a system to train a substitute model that emulates the behavior of a target model. This occurs due to the transferability characteristics of the AI model.

Transferability is the phenomenon where, given an adversary can construct adversarial data samples to exploit a model M₁ by using knowledge of another model M₂, as long as the model M₂ can sufficiently perform the tasks that the model M₁ is designed for.

White-box system attacks

In a white-box AI attack, adversaries have knowledge of the target model, including:

Its parameters
The algorithms used to train the model

A popular example involves the use of small perturbations to the input dataset such that it produces an incorrect output with high confidence of accuracy. These perturbations reflect worst-case scenarios that are used to exploit the sensitivity and nonlinear behavior of the neural networks model, which then converges to an incorrect decision class.

How to build robust AI systems

The same concepts of adversarial training and constructing adversarial examples can also be used to improve the robustness of an AI system. It can be used to regularize the model training, which imposes constraints on the models against extreme-case scenarios that force the model into misclassifying an output.

Adversarial training can be used to augment the training data to ensure that during the training process, the model is already exposed to a distribution of adversarial datasets. This also includes perturbed data that may be used to exploit the vulnerabilities of the AI models.

Sharing and Personal Tools

Please select the service you want to use: