Survey of Adversarial Attacks and Defenses

Posted Oct 30, 2023

DALL-E Image of Survey created using small geometric shapes in a monochrome design on a black background.

By Mehul Sen 23 min read

Introduction

As we integrate Artificial Intelligence (AI) systems more deeply into our everyday lives and crucial fields like cybersecurity, we encounter their immense potential and looming challenges. AI technologies, from face recognition to intrusion detection, are revolutionizing how we handle security by solving complex problems efficiently. Yet, one of the gravest threats to these systems is adversarial evasion. This type of cyber attack involves minor but strategic modifications—such as tweaks to malware code or network traffic—that deceive AI models into misclassifying harmful inputs as harmless.

Adversarial evasion involves strategic modifications to deceive AI models, leading them to misclassify harmful inputs as harmless. Understanding this concept is crucial for following the discussions in this blog post.

The stakes are exceptionally high in cybersecurity. Unlike in other domains, where the impact might be less severe, the cybersecurity field operates under unique pressure: while defenders must constantly secure their systems, an attacker needs only one successful breach to cause catastrophic damage. This asymmetry makes understanding and countering adversarial attacks against AI in cybersecurity not just beneficial but essential.

Adversarial tactics are alarmingly effective in cybersecurity due to their focused and destructive objectives. They challenge both attackers, who must maintain the malicious functionality of their exploits, and defenders, who strive to detect and neutralize these threats without fail. Minor alterations in a manipulated sample can drastically affect the outcome, tipping the scales in favor of the attacker.

In this blog post, I delve into the latest research on adversarial attacks and defenses in cybersecurity. I focus on malware detection and classification, URL detection, network intrusion detection, and biometric authentication. By analyzing these papers, I aim to highlight key trends and propose directions for future research to enhance our defense mechanisms against such cunning cyber threats.

Background

In 2014, Szegedy et al. introduced the concept of adversarial examples as a minimization problem, classifying adversarial attacks into three categories based on how they manipulate inputs to cause misclassification:

Understanding different attack types helps in identifying and mitigating potential threats. Each type employs unique methods to deceive AI systems, ranging from manipulating data gradients to exploiting model decision boundaries.

Gradient-based Attacks: These involve creating perturbations in the direction of the gradient of the target model, requiring detailed knowledge about the model’s gradient and architecture. They are white-box attacks, with examples like the Fast Gradient Sign Method, Carlini-Wagner Attack, and Projected Gradient Descent.
Score-based Attacks: These generate perturbations based on the confidence scores from the victim model without needing direct knowledge of the model’s gradient or architecture. These are gray-box attacks, exemplified by the Zeroth Order Optimization Attack.
Decision-based Attacks: Utilizing the transferability property, these attacks create adversarial examples that are effective against multiple models by using only the labels predicted by the target model. These are black-box attacks, with techniques such as the Generative Adversarial Network and the Boundary Attack.

Gradient-based attacks require in-depth knowledge of the target model’s architecture and are considered advanced threats in AI security.

Each type of attack also includes query-efficient variants, which limit the number of queries to the model to minimize detection risk. While white-box attacks are more precise, black-box attacks are more common in real-world scenarios.

Rosenberg et al. developed a taxonomy for adversarial attacks in cybersecurity, categorizing them into four stages: threat model, attack type, perturbed features, and attack output. This helps understand the attacker’s knowledge, the attack’s objectives, the targeted features, and the end result.

Defenses against these attacks are primarily of two types:

Detection-based Defense: Focuses on detecting adversarial examples by identifying unusual behaviors or interactions with the model.
Robustness-based Defense: This defense aims to increase the model’s robustness, making it difficult for attackers to develop effective adversarial examples.

Literature Review

For the survey, I reviewed several papers addressing adversarial attacks and defenses in the cybersecurity domain. I considered the categories defined by Rosenberg et al. in 2021 and reviewed newer documents in these fields. The four types and details about the research done in them are presented in the following subsections.

Malware Detection and Classification

Malware detection and classification is crucial to protecting computer systems from adversarial attacks. Modern defenses against malware use machine and deep learning models to detect anomalous activities and unsigned malware. Some well-known next-generation antivirus software that uses these models include SentinelOne, Microsoft ATP, and Crowdstrike. As a result, researchers are actively studying both adversarial attacks and defenses to misclassify malware and detect adversarial attacks.

Gaspari et al. researched Adversarial attacks for ransomware. They argued that current malware detectors rely on behavioral analysis techniques and are prone to evasion attacks. These detectors search for specific behavioral features such as changes in file entropy, writes that cover extended parts of a file, file deletion, processes corresponding to many user files, processes writing to files of different types, and back-to-back writes. Although such features can detect basic ransomware, it is possible to create ransomware that sidesteps these significant behavioral features, rendering the detectors ineffective. To demonstrate this, they proposed three novel attacks: process-splitting, which distributes ransomware operations evenly across multiple processes such that each only exhibits a subset of behaviors; functional-splitting, which separates ransomware operations into groups by functions such that each process only performs one function, and Mimicry which models the ransomware features on benign processes such that each process is indistinguishable from the benign process. They also developed a proof of concept ransomware called “Cerberus,” which implemented the proposed attacks. They tested their attacks against detectors such as ShieldFS and RWGuard, as well as a black-box attack on the Malwarebytes Anti-Ransomware detector, showing the feasibility of their attacks. The Mimicry attack, in particular, could evade detection entirely in a black-box setting, while functional splitting and process-splitting required more processes for complete evasion. Lastly, they trained a detector to recognize functional splitting attacks, which could quickly identify such ransomware.

Berger et al. focused their work on Android malware. They analyzed the differences and gaps between feature-space attacks and problem-space attacks. Feature-space attacks manipulate machine learning features to cause misclassification while minimizing the number of perturbations, whereas problem-space attacks change the malware code to cause misclassification. They argued that current machine learning models are vulnerable to evasion attacks, and the state-of-the-art defenses rely on adversarial training using feature-space attacks, which do not reflect actual malware samples. To evaluate the robustness of each, they retrained classifiers on both feature-space and problem-space attacks. Their studies focused on Android OS, using a dataset of 75,000 benign Android applications from AndroZoo and 5700 malicious applications from Drebin. They tested these on three malware detection systems: Drebin trained with an SVM classifier, Drebin-DNN trained with a deep neural network classifier, and MaMaDroid trained with RF, KNN, and DT classifiers. They found that feature-space attacks do not serve as reasonable proxies for problem-space malware evasion attacks, and robustness should be evaluated directly against problem-space attacks.

Rashid and Such focused their research on adversarial query attacks in malware detection. They argued that machine learning models are vulnerable to adversarial query attacks, where the attacker can iteratively query the model inputs to cause misclassifications. Suppose an attack focuses on the feature space, modifying the discrete binary feature vectors. In that case, the defenses against such attacks, which include similarity detection, are ineffective when the adversarial examples are generated differently. They propose a new stateful defense against adversarial query attacks called “MalProtect,” which analyzes the sequence of queries using multiple threat indicators. These indicators assess query similarity and any features shared across queries, the number of enabled features, and other variables. A score for each variable is calculated based on whether it might be an attack, which a decision model then aggregates to predict if an attack is occurring. Only if the prediction is that an attack is not happening does the query reach the prediction model to be processed further. This allows the defenders to modify the solution and provide the attacker with incorrect data, sabotaging them before they can cause misclassification. They evaluated their solution using Android malware from AndroZoo and Drebin and Windows malware from SLEIPNIR, testing several stateful and non-stateful defenses. Their solution reduced evasion rates of black-box and gray-box attacks by 80%-98% across the Android and Windows datasets, outperforming other defenses. Thus, employing multiple threat indicators and analysis techniques besides similarity/outlier detection can significantly improve an effective stateful defense for adversarial query attacks.

Android malware represents a growing threat as attackers continually evolve their techniques to bypass modern security measures.

URL Detection

Domain Generation Algorithms (DGAs) are used by attackers to create unpredictable domain names, complicating defense efforts. Understanding DGAs is essential for grasping the challenges in URL detection.

The Internet is a vast network consisting of billions of web pages. These web pages are accessed using Uniform Resource Locators (URLs). While most URLs are legitimate and lead users to genuine web pages, some are malicious. These malicious URLs are used by large-scale botnets or by attackers to conduct phishing attacks. Domain Generation Algorithms (DGAs) are used to create many domain names that are difficult to predict. Bots throughout the network try to communicate with these domains iteratively to find the actual Command and Control (C\&C) server. This strategy is effective since defenders need help finding and taking down malicious URLs faster than new ones can be deployed. Researchers are working on ways to predict which URLs might be malicious, generated by DGA, or part of a giant botnet.

Casino et al. focused on detecting algorithmically generated domains (AGDs). They highlighted the need to develop accurate methods for botnet detection so that take-down operations of these botnets can be sped up and large-scale malware campaigns can be thwarted. The authors designed a dataset called “HYDRAS,” consisting of 105 of the latest and most popular DGA families spanning over 95 million domains. They proposed a novel feature set, including lexical and statistical features over the collected DGAs and English gibberish detectors. They tested their dataset using a Random Forest Classifier and their proposed features. They achieved a very high accuracy (F1 Score over 99%) when identifying between benign and malicious domains generated through DGA, outperforming the state-of-the-art classifiers. They argue that by using a comprehensive dataset that accurately represents malicious domains, malicious botnet domains can be efficiently detected in real time, allowing for faster take-downs.

Suryotrisongko et al. also focused their research on detecting malware that utilized DGAs. They argued that traditional threat intelligence approaches like blocklists are ineffective against DGAs, and the current cyber threat intelligence sharing platforms need to support sharing classifier models amongst organizations. They proposed a model to detect DGA-based malicious domains using seven statistical features. They then trained a random forest classifier and evaluated their models on 55 DGA families, comparing their performance with other state-of-the-art detectors such as CharBot. Additionally, they also attempted to improve the trust in model sharing by proposing a blend of explainable AI (XAI) techniques such as SHAP, Lime, and Anchors with open-source intelligence (OSINT) methods such as Google Safe Browsing, OTX AlienVault to validate the model’s predictions. Their model achieved an accuracy of 96.3% in detecting DGAs, outperforming other state-of-art detectors. They also proposed a computable CTI paradigm that allowed for the sharing of models between organizations validated through XAI and OSINT, improving automation and reducing manual analysis, enabling several organizations to equip themselves with the most accurate detectors. They defined a more precise detector to identify botnets using DGA for evasion. They showed how XAI and OSINT could be used to improve trust in sharing models for cyber threat intelligence.

Apruzzese et al. researched phishing website detectors and approached adversarial attacks differently. They argued that adversarial machine learning and its defenses tend to focus on unrealistic threat models, crafting adversarial examples through the feature space. While these attacks are possible, they are not entirely physically realizable in the problem space. The authors researched low-cost, pragmatic attacks that are more likely to be used by attackers. They focused their work on detecting phishing websites by proposing a realistic threat model for low-cost website-space perturbations that a typical phisher may use. Additionally, they also defined “evasion space” by dissecting the architecture of phishing website detectors, which is categorized as website space where the website gets generated by attackers; the preprocessing space, which involves feature extraction; the machine-learning space which analyzes the features; and the output-space which contains the output for the classifier’s decision. They evaluated the robustness of 18 machine learning-based phishing website detectors against 12 attacks that they labeled realistic with varying costs. Additionally, they used webpage datasets from Zenodo and δPhish. They found that some detectors are resilient to cheaper attacks; however, slightly more complex methods can outperform a more significant number of detectors. Additionally, the greatest threat to detectors is the cheap website attacks. They induce small but significant degradations in most detectors and can be quickly developed due to their low cost. They highlighted the need to focus on more likely real-world attacks and provided benchmark results demonstrating the impact of realistic attacks on machine learning-based phishing website detectors.

Network Intrusion Detection Systems

Network Intrusion Detection Systems (NIDS) are an essential component of network security, used to identify any potential attacks within a network. These systems monitor all traffic passing through a defined strategic point and alert administrators if they encounter any activity that is classified as malicious. While earlier versions of NIDS relied on rules to identify malicious traffic, newer versions have been equipped with Machine Learning (ML) and Artificial Intelligence (AI) models trained to recognize a wide range of network attacks and malicious traffic.

Recent research has focused on both attacking and defending these AI-powered models. Mohammadian et al. developed a novel adversarial attack against deep learning-based NIDS. They proposed a white-box attack model that manipulates the feature space of a trained deep neural network. Their approach involves creating feature combinations to identify the best combination of features to perform the attack. They then use a saliency map of the trained model to rank the combinations and select the best ones for the attack. Finally, they generate adversarial samples based on these features to misclassify the model. Their testing revealed that their method effectively created adversarial samples for over 18% of samples in CIC-IDS2017, 15% in CIC-IDS2018, and 14% in CIC-DDoS2019. They discovered that increasing the number of features and the magnitude of perturbations improved the effectiveness of the attack.

Sharon et al. also focused on developing adversarial attacks against NIDS. Their approach differs from Mohammadian et al. as they proposed a black-box attack model focusing on the problem space. They designed a timing-based adversarial network traffic reshaping attack called “TANTRA” that uses a Long Short-Term Memory(LSTM) model. The model is trained to learn benign network traffic behavior using a short history of benign packets. It then reshapes the attack’s malicious traffic, including any interpacket delay, to make it similar to the benign traffic. The modified malicious traffic is then sent to the target network, bypassing the NIDS. Their proposed attack does not require knowledge or access to the NIDS and its classifier, requiring only an ongoing connection to the target network. To evaluate their attack, they used eight common network intrusion attacks from Kitsune and CIC-IDS2017 datasets; they tested their attacks on three state-of-the-art NIDS systems that used deep learning (Autoencoder, KitNET, and Isolation Forest). On average, they achieved a 99.99% success in evading detection across all their attacks. They also discovered that changing only the timestamps, not the packet content, was sufficient to avoid detection. Lastly, they proposed a defense that involved training NIDS on the reshaped traffic as a viable solution against their attack.

Kotak and Elovici focused on identifying IoT devices on the network, which could increase security risks due to unpatched vulnerabilities. They proposed a novel black-box template-based approach that uses heat maps, such as Class activation mapping(CAM) and Grad-CAM++, to identify the essential features in the traffic packets for classifying the traffic as benign or malicious. They then use the heat map to craft adversarial examples by replacing less critical features from a benign device classified as authorized to be on the system with a malicious IoT device miming the authorized traffic. They tested their attack using the public IoT network traffic dataset (IoT Trace), additional IoT traffic generated in the lab, and six variants of payload-based Fully connected network models(FCN) for IoT device identification. They also used four convolutional neural networks(CNN) and FCN as surrogate models to craft their adversarial examples. Their testing revealed that their heat map adversarial attacks fooled target models with up to 100% success. Their attack was effective against all model variants, showing the efficacy of transferability within their attack. They highlighted vulnerabilities in ML-based IoT identification by demonstrating an evasive adversarial attack.

Adversarial attacks on Network Intrusion Detection Systems can significantly undermine network security, allowing malicious activities to go undetected.

Biometric Systems

Biometric authentication is a critical field in cybersecurity that uses classifiers and detectors to authenticate the identity of authorized personnel. These models are trained to detect and identify individual voices, faces, and fingerprints and predict when a provided sample matches any samples it was previously trained on. However, adversarial attacks against image and face recognition were some of the first in the domain and have been extensively researched. Researchers are now focusing on speech and voice recognition and how these can be manipulated through adversarial attacks.

Abdullah et al. have focused their research on adversarial attacks against Automatic Speech Recognition (ASR) and Automatic Voice Identification (AVI) systems. They argue that the current implementation of these systems is vulnerable to adversarial attacks and that all prior attacks required whitebox knowledge of the model, a large number of queries, or poor-quality audio perceptible by humans. The current ASR and AVI systems rely heavily on components of speech that are not necessary for human comprehension. As a solution, they proposed a query-efficient black-box attack that requires no knowledge about the model and its features, is transferrable, and operates in near real-time. They do so by removing low-intensity components from the benign audio sample using signal processing techniques like Discrete Fourier Transforms (DFT), which exposes audio frequency information, and Singular Spectrum Analysis (SSA), which decomposes an arbitrary time series into components called eigenvectors. Based on these, they optimize the distortion threshold using binary search to minimize the impact on audio quality, causing ASR systems to mistranscribe the audio and AVI systems to misidentify speakers by perturbing phonemes like vowels while staying comprehensible by humans. Their attack involved two subcategories: the word level attack, which perturbed total words to induce misclassification, and the phoneme level attack, which perturbed only small parts of the audio signal in targeted ways. They tested their attacks on Google Speech API, Facebook Wit.ai, DeepSpeech, CMU Sphinx, and Microsoft Azure Speaker Identification with datasets such as the TIMIT corpus of phonetically diverse English sentences spoken by 630 speakers and a dataset of 1000 most common English words. The word-level attack causes 50% mistranscription with minimal distortion. In contrast, the phoneme-level attack was most effective when perturbing vowels, with ASR mistranscribing them at 88% compared to other phonemes, which were mistranscribed less than 60%. They also showed that their attacks were transferable between models with up to 100% success and could evade existing detection methods. They also proposed a defense for their attack that involved adversarial training using the perturbed samples.

Xue et al. also researched adversarial attacks against intelligent audio systems. They argue that existing adversarial attacks against these systems are not imperceptible and can easily identified as potential attacks on humans or require specific conditions like background noise. These attacks are less stealthy or require white-box access to the model to generate stealthy evasion attacks. The authors proposed a targeted imperceptible black-box attack against intelligent audio systems called “Echo,” which integrates adversarial noise into the natural reverberations of audio samples to misclassify the perturbed samples as belonging to a specific class. Their goal differs from Abdullah et al. as they want the audio samples to be misidentified, but they want them to be misidentified as a particular target. Their attack consists of four modules: the inconspicuous reverberation fusion module, which merges adversarial perturbations into the physical reverberation features of voices; the fast adversarial perturbation generation module, which efficiently attacks the target system; the targeted black-box attack module, which uses a novel method of searching gradient information of the targeted system using neuro-evolution, which they call “Newav,” and the robust in over-the-air attack module, which incorporates optimization processes to improve the robustness of Echo under the over-the-air conditions. They tested their attack on the state-of-the-art X-Vector model trained on Voxceleb2, Mini Librispeech, and VCTK datasets for speaker recognition to evaluate their solution. They also tried their attack on VGG_BN19 and Resnet34 on Google Speech Commands V2 for speech command recognition. Echo achieved an average success rate of 98.86% on speaker recognition and 99.24% on speech command attacks. Their adversarial examples were generated in less than 0.4ms and were robust up to a distance of up to 4m, making them suitable for real-time attacks. They also tested current defenses against attacks such as filtering and compression, which were ineffective against Echo. Lastly, they proposed a defense against their attack that would involve preprocessing the audio to remove the adversarial noise, but this defense was also ineffective against Echo.

Adversarial techniques targeting biometric systems manipulate recognition algorithms to misidentify or fail to detect unauthorized users. Understanding these attacks is crucial for developing robust defenses.

Conclusion

Reflecting on the various research papers I’ve reviewed, several key insights and trends emerge that shape our understanding of cybersecurity in the realm of adversarial attacks.

Firstly, the distinction between problem-space and feature-space attacks is particularly striking. While feature-space attacks might be more straightforward to execute and achieve high evasion rates, problem-space attacks present a more realistic threat scenario. This distinction highlights a significant challenge for current defenses, often developed based on feature-space manipulations and may not effectively counteract more sophisticated problem-space attacks.

The diversity of attack techniques discussed in the literature—from process splitting and Mimicry to timing perturbations and distorting reverberations—demonstrates the complex nature of adversarial tactics. These methods enable attackers to seamlessly blend malicious activities within benign processes or subtly alter communication patterns, illustrating the advanced strategies employed to evade detection.

Another critical aspect is attack transferability. The ability to develop attacks on one system and successfully apply them to another without access to the underlying model underscores the potential for widespread vulnerabilities through black-box attacks. This calls for defenses that are not only robust but also adaptable across different systems and architectures.

Moreover, while many studies advocate retraining models on adversarial examples as a defense mechanism, more than this approach is needed, given the breadth of attack vectors adversaries employ. The real-world application of adversarial attacks involves a dynamic array of strategies that a single method, such as adversarial training, may not fully mitigate.

Moving forward, exploring comprehensive defense strategies beyond conventional retraining is imperative. Future research should delve into developing multi-faceted, stateful defenses that can detect and counteract complex attack patterns. Additionally, harnessing the power of explainable AI and open-source intelligence can enhance the security of machine learning models and facilitate secure model sharing among organizations. As we navigate this evolving landscape, these areas will likely be pivotal in fortifying our defenses against increasingly sophisticated cyber threats.

Stay informed about the latest in cybersecurity by regularly reviewing recent research and participating in related discussions. This can help you better understand and mitigate potential threats.

References

Blogging, Explanation

This post is licensed under CC BY 4.0 by the author.