Deepfake Detection and Explainability

This project focuses on detecting manipulated and AI-generated images that contribute to misinformation on social media. As deepfake technologies rapidly advance, synthetic images can closely resemble authentic photographs, posing serious risks such as identity fraud, copyright violations, security threats, and social manipulation. Detecting such content is increasingly challenging, particularly when manipulations are subtle, localized, and difficult for humans to perceive.

Our research addresses three fundamental challenges in modern deepfake detections

  1. Generalization and Robustness
    We develop detection models that remain effective across unseen generative models and datasets, while maintaining robustness against adversarial manipulations designed to bypass verification systems.
  2. Subtle Manipulation Detection
    We target realistic image edits involving small, localized modifications, achieving strong performance on fine-grained manipulations that are often missed by existing methods.
  3. Explainability and Trust
    Beyond binary classification, we emphasize why an image is flagged as a deepfake. Our methods provide human-interpretable explanations aligned with visual evidence, improving transparency and user trust.
 

Key Contributions:

  1. Multimodal Deepfake Detection
    Deepfake artifacts extend beyond the visual domain. We propose multimodal frameworks that jointly leverage visual, textual, and frequency-domain cues, capturing complementary signals that single-modality methods often overlook.
    • CAMME employs cross-attention over multimodal embeddings, significantly improving transferability and performance on fully synthetic images.
    • CapsFake introduces dynamic routing through multimodal capsules, enabling low-level artifacts across modalities to agree on high-level forgery representations. This approach achieves state-of-the-art results on subtle image edits and demonstrates strong robustness to adversarial perturbations.
  2. Generalization to Unseen Domains
    We address real-world deployment challenges by focusing on inter-domain robustness. Our methods align multimodal representations to maintain high performance under distribution shifts caused by new generative tools and real-world transformations such as blur, noise, and compression.
  3. Deepfake Explainability
    We propose a novel reasoning framework that explains detection decisions using coherent, image-grounded narratives. Our PRPO (Paragraph-level Relative Policy Optimization) method aligns multi-paragraph explanations at test time, producing reasoning that reliably matches visual evidence and improves detection accuracy.
 

Impact and Vision:

This project advances deepfake detection from artifact-based binary classification toward robust, explainable, and generalizable multimodal systems. By integrating detection, localization, and reasoning, our work delivers practical and trustworthy solutions for media authentication. It lays the foundation for next-generation AI forensic systems capable of keeping pace with rapidly evolving generative technologies.

Team

tuan

Tuan Nguyen

Postdoc Researcher

naseem

Naseem Khan

Researcher

hefeeda

Mohamed Hefeeda

Principal Scientist