Autumn 2025
Autumn 2025
Time & Location
Monday 15:00-17:45 @ College of Electronics Information and Applied Science 211-2
Objectives
Understand how to analyze existing AI models in various aspects
Gain insights into the design, functionality, and potential vulnerabilities of AI models
Have a hands-on experience with AI reverse engineering
Structure
The course will consists of three parts:
Lecture: The lecturer will introduce the basic concepts of AI reverse engineering.
Seminar: Each student will present a review of recent research papers on AI reverse engineering.
Project: Each student will work on a small project regarding AI reverse engineering
Week 1 2025.9.01 Introduction
Week 2 2025.9.08 Model Architecture Analysis
Week 3 2025.9.15 Interpretability & Feature Analysis
Week 4 2025.9.22 Paper Reviews
Week 5 2025.9.29 Paper Reviews
Week 6 2025.10.06 No Lecture
Week 7 2025.10.13 Paper Reviews
Week 8 2025.10.20 Paper Reviews (Remote)
Week 9 2025.10.27 Model Extraction Attack
Week 10 2025.11.03 Data Inference Attack
Week 11 2025.11.10 Paper Reviews
Week 12 2025.11.17 Paper Reviews
Week 13 2025.11.24 Paper Reviews
Week 14 2025.12.01 Paper Reviews
Week 15 2025.12.08 Project Presentation (Final)
Week 16 2025.12.15 Project Presentation (Final)
Part I: Analysis & Optimization
Week 4 2025.9.22
RepViT: Revisiting Mobile CNN From ViT Perspective, CVPR 2024, Muhammad Talha
DeepVID: Deep Visual Interpretation and Diagnosis for Image Classifiers via Knowledge Distillation, TVCG 2019, Pham Thanh Trung
AGAIN: Adversarial Training with Attribution Span Enlargement and Hybrid Feature Fusion, CVPR 2023, Tufail Hafiz Zahid
B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable, NeurIPS 2024, 김명철
Reverse Engineering Learned Optimizers: Mechanisms and Interpretations, NeurIPS 2021, 강은애
Towards Automated Circuit Discovery for Mechanistic Interpretability, NeurIPS 2023, 이슬찬
Week 5 2025.9.29
A ConvNet for the 2020s, CVPR 2022, 이태화
Scalable Image Coding for Humans and Machines, IEEE TIP 2022, 임달홍
This Looks Like That: Deep Learning for Interpretable Image Recognition ,NeurIPS2019, 옥윤승
Reverse-Engineering the Retrieval Process in GenIR Models, SIGIR 2025, 임준원
What Matters in Transformers? Not All Attention is Needed, arXiv 2024, 곽교린
Smoothquant: Accurate and efficient post-training quantization for large language models, ICML 2023, Faizan Rao
Week 7 2025.10.13
Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks, CVPR 2020, 최윤정
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization, NeurIPS 2024, 한지윤
VanillaNet: the Power of Minimalism in Deep Learning, NeurIPS 2025, 조수현
Vision Transformers Need Registers, ICLR 2024, 진일성
LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models, NAACL 2024, 김민환
On the Faithfulness of Vision Transformer Explanations, CVPR 2024, 이소연
Week 8 2025.10.20
Not All Adapters Matter: Selective Adapter Freezing for Memory-Efficient Fine-Tuning of Language Models, NAACL 2025, 박은주
Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas, ICML 2025, 김하늘
Quantized Feature Distillation for Network Quantization, AAAI 2023, 이정현
WWW: A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concepts. CVPR 2024, 안용현
CRAFT: Concept Recursive Activation FacTorization for Explainability, CVPR 2023, 조은기
What Does BERT Look At? An Analysis of BERT’s Attention, ArXiv 2019, 장하록
Subspace optimization for large language models with convergence guarantees, ICML 2025, Bold Chinguun
Part II: Security & Threats
Week 11 2025.11.10
Locating and Editing Factual Associations in GPT, NeurIPS 2022,김원진
A Method to Facilitate Membership Inference Attacks in Deep Learning Models, NDSS 2025, 최용진
Gradient Inversion Attacks on Parameter-Efficient Fine-Tuning, CVPR 2025, 김정현
Transferable Structural Sparse Adversarial Attack Via Exact Group Sparsity Training, CVPR 2024, 김영웅
LoRA-Leak: Membership Inference Attacks Against LoRA Fine-tuned Language Models, arXiv 2025, 손소현
Implicit Bias Injection Attacks against Text-to-Image Diffusion Models, CVPR 2025, 나웅재
Week 12 2025.11.17
Membership Inference Attacks against Large Vision-Language Models, NeurIPS 2024, 이민재
Assessing Prompt Injection Risks in 200+ Custom GPTs, ICLR 2024 Workshop, 윤희균
Text Embeddings Reveal (Almost) As Much As Text, EMNLP 2023, 최용빈
AUTODAN: GENERATING STEALTHY JAILBREAK PROMPTS ON ALIGNED LARGE LANGUAGE MODELS, ICLR 2024, Jin Zhengxun
In-Context Unlearning: Language Models as Few Shot Unlearners, ICML 2024 (Poster), 강명구
Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework, EMNLP 2025, 박재윤
Week 13 2025.11.24
Fully Exploiting Every Real Sample: SuperPixel Sample Gradient Model Stealing, CVPR 2024, 이찬
Prediction Exposes Your Face: Black-box Model Inversion via Prediction Alignment, ECCV 2024, 최수용
Prompt Inversion Attack against Collaborative Inference of Large Language Models, IEEE S&P 2025, 김민식
FedMIA: An Effective Membership Inference Attack Exploiting “All for One” Principle in Federated Learning, CVPR 2025. Afsana Kabir Sinthia
Towards General Visual-Linguistic Face Forgery Detection, CVPR 2025, 김민국
Shadowcast: Stealthy Data Poisoning Attacks against Vision-Language Models, NeurIPS 2024, 윤수용
Week 14 2025.12.01
Towards Data-Free Model Stealing in a Hard Label Setting, CVPR 2022, 김석원
"Yes, My LoRD." Guiding Language Model Extraction with Locality Reinforced Distillation, ACL 2025, 원영섭
Marich: A Query-efficient Distributionally Equivalent Model Extraction Attack using Public Data, NeurIPS 2023, 박인성
Extracting Training Data from Diffusion Models, Security 2023, 정호연
Private Image Generation with Dual-Purpose Auxiliary Classifier, CVPR 2023, 지성훈
Re-thinking Model Inversion Attacks Against Deep Neural Networks, CVPR 2023, 박지우
Universal and Transferable Adversarial Attacks on Aligned Language Models, NeurIPS 2023, 고경택
The grade will be given according to the following grading percentages.
Presentation 60%
Project 30%
Attendance 10%