AI Security and Privacy

For AI security, the current focus is on detecting either manipulated machine learning models (e.g., those with Trojan backdoors) or malicious input samples (e.g., those with backdoor triggers or perturbed to mislead classifiers). The goal is to develop generalizable detection techniques that are model-agnostic, in the sense that the defender can treat the model as a black box without knowing its implementation details, which we believe is a more realistic setting given today how deep learning is typically used in practice. The project further looks into privacy issues in machine learning and data analytics in general. It develops techniques that apply differential privacy to graph data analysis and federated learning to prevent inference attacks and strike a good balance between privacy and utility.

Description, Goals, and Focus

The goal of this project is to develop new attacks/defenses against ML/DL models including poisoning and inference vulnerabilities. We explore DL models that include CNNs, GNNs, and Hypernets both in centralized and distributed settings (e.g., federated learning). Below are the directions we are exploring under this theme:

Detection of backdoored ML models and samples: We have constructed strong detection signals by measuring model sensitivity to candidate triggers, classifying model activation patterns/weights/gradients, analyzing frequencies of gradients with respect to input samples, and employing various search algorithms in finding the trigger. Our detection system won second place and the data-efficiency award in the Trojan Detection Challenge at NeurIPS 2022. The next phase of the project will be integrating our findings and detection signals to create a unified, black-box, architecture-agnostic, data-free method for detecting Trojan models, identifying their target label, and synthesizing their trigger’s location and pattern. We further develop techniques following another strategy to sanitize input samples without detecting whether the model is backdoored. The idea is to monitor the changes in the prediction confidence when the input is repeatedly perturbed by random noise. Extensive empirical evaluations show that our approach significantly outperforms the state-of-the-art defenses and is highly stable under different settings, even when the classifier architecture, the training process, or the hyperparameters change.
Defense against adversarial examples: Zero knowledge adversarial training is proposed as an adaptive adversarial training which is independent of input examples. The idea here is to replace adversarial examples with random noise perturbations while retraining DL models. In this work, we propose a GAN based zero knowledge adversarial training defense, dubbed ZK-GanDef, that combines adversarial training and feature learning. It forms a competition game between two DL models: a classifier and a discriminator. We show analytically that the solution of this competition game generates a classifier which usually makes right predictions and only relies on perturbation invariant features. We conducted extensive experiments to evaluate the performance and the prediction accuracy of ZK-GanDef on MNIST, FashionMNIST and CIFAR10 datasets. Compared to the state-of-the-art approaches (CLP and CLS), the results show that ZK-GanDef has the highest test accuracy in classifying different white-box adversarial examples with significant superiority.
Privacy-preserving federated learning: Recent attacks show that clients’ training data samples can be extracted from the shared gradient even under federating learning. Local differential privacy (LDP) is one of the viable options to protect the privacy of training data in federated learning systems. However, applying LDP to FL is challenging. First, noise needs to be repeatedly applied over the training rounds, which significantly degrades the privacy guarantees proportional to the number of training rounds. Second, existing LDP-preserving mechanisms suffer from the curse of dimensionality: increasing the number of features leads to either loose privacy protection or poor utility. We explore ways to mitigate the impact of increasing the dimension of the feature vector and design a Scalable Randomized Response (ScalableRR) mechanism. ScalableRR integrated dimensionality, data utility, and privacy guarantees in a unified framework that is jointly optimized for better randomization probabilities. The key idea that enables ScalableRR to maintain a good privacy-utility balance for high-dimensional feature input is the use of variable randomization probabilities at the binary level representation of every feature as opposed to the use of uniformly distributed randomization probabilities.
Understand the Interplay between privacy, fairness, and utility: Group fairness is a critical quality of a decision model that ensures the decision made by the model will not be biased to any group of data points divided by sensitive features (e.g., gender, race). Although existing work sheds light on the interplay between fairness and privacy, they do not explain to which level of noise we achieve a level of privacy and fairness. In this work, we propose a method achieving DP and group fairness at the same time and make a step forward to which we provide a guarantee for group fairness and DP protection. Unlike the trade-off between utility and privacy or the trade-off between utility and fairness, the connection between privacy and fairness is still an open question: some claim that fairness is a general idea of privacy and can be achieved through the Lipschitz property of DP, while others proposed that exact fairness cannot be achieved with DP, and only approximate fairness is possible under DP. In this work we try to resolve this conflict by proposing mechanisms that are theoretically proven to achieve (ϵ, δ)-DP and τ -fairness. Our key idea is to separate the DP noise added to each group and analyze the deviation effect of these noises under a functional analysis point of view to derive the theoretical guarantee for τ -fairness. Since the fairness guarantee is derived from the DP noise, our proposed mechanism only pays a minimal cost of fairness resulting in a marginal loss in performance compared with pure privacy preserving mechanisms.

Publications

G Liu, A Khreishah, F Sharadgah, I. Khalil, “An Adaptive Black-Box Defense Against Trojan Attacks (TrojDef),” IEEE Transactions on Neural Networks and Learning Systems. pp 1-15, 2022. PDF

K. Tran, P. Lai, H. Phan, I. Khalil, Y. Ma, A. Khreishah, M. Thai, and X. Wu, "Heterogeneous Randomized Response for Differential Privacy in Graph Neural Networks," IEEE BigData, 2022. PDF

Yun Shen, Yufei Han, ZHikun Zhang, Min Chen, Ting Yu, Michael Backes, Yang Zhang, Gianluca Stringhini, “Find MNEMON: Reviving Memories of Node Embeddings,” ACM Conference on Computer and Communications Security (CCS), 2022:2643-2657. PDF

G. Liu, I. Khalil, and A. Khreishah, “Using Single-Step Adversarial Training to Defend Iterative Adversarial Examples,” CODASPY, 2021, pp. 17-27.

G. Liu, I. Khalil, A. Khreishah, A. Algosaibi, A. Aldalbahi, M. Alnaeem, A. Alhumam, and M. Anan, “ManiGen: A Manifold Aided Black-Box Generator of Adversarial Examples,” IEEE Access 8: 197086-197096 (2020).

H. Sun, X. Xiao, I. Khalil, Y. Yang, Z. Qin, W. Wang, T. Yu, “Analyzing Subgraph Statistics from Extended Local Views with Decentralized Differential Privacy,” ACM SIGSAC Conference on Computer and Communications Security (CCS) 2019: 703-717.

G. Liu, I. Khalil, and A. Khreishah, “ZK-GanDef: A GAN Based Zero Knowledge Adversarial Training Defense for Neural Networks,” IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) 2019: 64-75.

G. Liu, I. Khalil, and A. Khreishah,” Using Intuition from Empirical Properties to Simplify Adversarial Training Defense,” IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) Workshops 2019: 58-61.

G. Liu, I. Khalil, and A. Khreishah, “GanDef: A GAN Based Adversarial Training Defense for Neural Network Classifier,” IFIP International Conference on ICT Systems Security and Privacy Protection (IFIP SEC) 2019: 19-32.

Z. Qin, T. Yu, Y. Yang, I. Khalil, X. Xiao, and K. Ren, “Generating Synthetic Decentralized Social Graphs with Local Differential Privacy” Accepted to appear in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS 2017).

Z. Qin, Y. Yang, T. Yu, I. Khalil, X. Xiao, K. Ren, “Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy” Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS 2016), pp. 192-203, Vienna, Austria, Oct 24-28, 2016.

Direction and Future Plans

We are currently exploring ways to make AI models robust and reliable. We also actively explore private and trustworthy learning approaches for DL models. In the future we plan to explore the interplay among fairness, privacy, robustness, and utility. We plan to propose approaches that optimize such potential conflicting requirements for different application domains. In the long term, we consider exploring uncertainty estimation and explainability of DL models as well as availability attacks that could lead to poor behavior during training and inference.