Autumn 2025
Autumn 2025
Time & Location
Monday 15:00-17:45 @ College of Electronics Information and Applied Science 211-2
Objectives
Understand how to analyze existing AI models in various aspects
Gain insights into the design, functionality, and potential vulnerabilities of AI models
Have a hands-on experience with AI reverse engineering
Structure
The course will consists of three parts:
Lecture: The lecturer will introduce the basic concepts of AI reverse engineering.
Seminar: Each student will present a review of recent research papers on AI reverse engineering.
Project: Each student will work on a small project regarding AI reverse engineering
Week 1 2025.9.01 Introduction
Week 2 2025.9.08 Model Architecture Analysis
Week 3 2025.9.15 Interpretability & Feature Analysis
Week 4 2025.9.22 Paper Reviews
Week 5 2025.9.29 Paper Reviews
Week 6 2025.10.06 No Lecture
Week 7 2025.10.13 Paper Reviews
Week 8 2025.10.20 Paper Reviews (Remote)
Week 9 2025.10.27 Model Extraction Attack
Week 10 2025.11.03 Data Inference Attack
Week 11 2025.11.10 Paper Reviews
Week 12 2025.11.17 Paper Reviews
Week 13 2025.11.24 Paper Reviews
Week 14 2025.12.01 Paper Reviews
Week 15 2025.12.08 Project Presentation (Final)
Week 16 2025.12.15 Project Presentation (Final)
Part I: Analysis & Optimization
Week 4 2025.9.22
RepViT: Revisiting Mobile CNN From ViT Perspective, CVPR 2024, Muhammad Talha
DeepVID: Deep Visual Interpretation and Diagnosis for Image Classifiers via Knowledge Distillation, TVCG 2019, Pham Thanh Trung
AGAIN: Adversarial Training with Attribution Span Enlargement and Hybrid Feature Fusion, CVPR 2023, Tufail Hafiz Zahid
B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable, NeurIPS 2024, 김명철
Reverse Engineering Learned Optimizers: Mechanisms and Interpretations, NeurIPS 2021, 강은애
Towards Automated Circuit Discovery for Mechanistic Interpretability, NeurIPS 2023, 이슬찬
Week 5 2025.9.29
A ConvNet for the 2020s, CVPR 2022, 이태화
Scalable Image Coding for Humans and Machines, IEEE TIP 2022, 임달홍
This Looks Like That: Deep Learning for Interpretable Image Recognition ,NeurIPS2019, 옥윤승
Reverse-Engineering the Retrieval Process in GenIR Models, SIGIR 2025, 임준원
What Matters in Transformers? Not All Attention is Needed, arXiv 2024, 곽교린
Smoothquant: Accurate and efficient post-training quantization for large language models, ICML 2023, Faizan Rao
Week 7 2025.10.13
Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks, CVPR 2020, 최윤정
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization, NeurIPS 2024, 한지윤
VanillaNet: the Power of Minimalism in Deep Learning, NeurIPS 2025, 조수현
Vision Transformers Need Registers, ICLR 2024, 진일성
LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models, NAACL 2024, 김민환
On the Faithfulness of Vision Transformer Explanations, CVPR 2024, 이소연
Week 8 2025.10.20
Not All Adapters Matter: Selective Adapter Freezing for Memory-Efficient Fine-Tuning of Language Models, NAACL 2025, 박은주
Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas, ICML 2025, 김하늘
Quantized Feature Distillation for Network Quantization, AAAI 2023, 이정현
WWW: A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concepts. CVPR 2024, 안용현
CRAFT: Concept Recursive Activation FacTorization for Explainability, CVPR 2023, 조은기
What Does BERT Look At? An Analysis of BERT’s Attention, ArXiv 2019, 장하록
Subspace optimization for large language models with convergence guarantees, ICML 2025, Bold Chinguun
Part II: Security & Threats
Week 11 2025.11.10
Locating and Editing Factual Associations in GPT, NeurIPS 2022,김원진
A Method to Facilitate Membership Inference Attacks in Deep Learning Models, NDSS 2025, 최용진
Gradient Inversion Attacks on Parameter-Efficient Fine-Tuning, CVPR 2025, 김정현
Transferable Structural Sparse Adversarial Attack Via Exact Group Sparsity Training, CVPR 2024, 김영웅
LoRA-Leak: Membership Inference Attacks Against LoRA Fine-tuned Language Models, arXiv 2025, 손소현
Implicit Bias Injection Attacks against Text-to-Image Diffusion Models, CVPR 2025, 나웅재
Week 12 2025.11.17
Membership Inference Attacks against Large Vision-Language Models, NeurIPS 2024, 이민재
Assessing Prompt Injection Risks in 200+ Custom GPTs, ICLR 2024 Workshop, 윤희균
Text Embeddings Reveal (Almost) As Much As Text, EMNLP 2023, 최용빈
AUTODAN: GENERATING STEALTHY JAILBREAK PROMPTS ON ALIGNED LARGE LANGUAGE MODELS, ICLR 2024, Jin Zhengxun
In-Context Unlearning: Language Models as Few Shot Unlearners, ICML 2024 (Poster), 강명구
Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework, EMNLP 2025, 박재윤
Week 13 2025.11.24
Week 14 2025.12.01
The grade will be given according to the following grading percentages.
Presentation 60%
Project 30%
Attendance 10%