[논문리뷰] CoA: Towards Real Image Dehazing via Compression-and-Adaptation (CVPR 2025)

Notice

Recent Posts

Recent Comments

Link

« 2026/04 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

Wooks_learning

[논문리뷰] CoA: Towards Real Image Dehazing via Compression-and-Adaptation (CVPR 2025) 본문

딥러닝/논문 리뷰

[논문리뷰] CoA: Towards Real Image Dehazing via Compression-and-Adaptation (CVPR 2025)

Wooks_ 2026. 3. 27. 18:27

이번 글에선, CVPR 2025에 소개된 "CoA: Towards Real Image Dehazing via Compression-and-Adaptation" 논문에 대해 리뷰할 예정이다.

Real-world image dehazing 분야에서 efficiency와 adaptability를 동시에 잡은 논문으로, divide-and-conquer 전략을 기반으로 한 참신한 접근이 인상적이었다.

리뷰 순서는 아래와 같다.

Introduction
Rethinking Real Image Dehazing
The Proposed Method
- 3.1 MoC: Model Compression in Synthetic Domain
- 3.2 BiA: Bilevel Adaptation to Real Domain
Exploring Algorithmic Property
Experimental Results
Conclusion

1. Introduction

Image dehazing은 atmospheric optical model과 data-driven learning을 활용해 haze로 인해 손실된 scene 정보를 복원하는 task다.

최근 learning-based 방법들이 synthetic domain에서는 눈에 띄는 성과를 보이고 있지만, real-world dehazing에는 두 가지 근본적인 문제가 존재한다.

Efficiency 문제: Edge device처럼 resource-constrained 환경에서는 모델이 가볍고 빠르게 동작해야 한다.
Adaptability 문제: 실제 haze는 daytime, nighttime, dusty, underwater 등 scene에 따라 매우 다양하기 때문에, 고정된 모델로는 모든 상황에 대응하기 어렵다.

기존 방법들은 이 두 가지를 동시에 만족하지 못한다. Efficiency-oriented 방법들은 real-time processing은 가능하지만 scene adaptability가 부족하고, adaptability-focused 방법들은 성능은 좋지만 computational cost가 너무 높아 실제 환경에 deploy하기 어렵다.

이 논문에서는 이 두 가지를 동시에 해결하기 위해 Compression-and-Adaptation (CoA) 라는 computational flow를 제안한다.

2. Rethinking Real Image Dehazing

논문에서 real image dehazing의 핵심 목표를 수식으로 정리하면 아래와 같다.

$$\min_{\theta_{real}} f(\theta_{real} | \theta_{syn}), \quad s.t. \begin{cases} \kappa(\theta_{real}) < \kappa(\theta_{syn}) & \text{(efficiency)} \ \zeta(\theta_{real}) > \zeta(\theta_{syn}) & \text{(adaptability)} \end{cases}$$

즉, synthetic domain에서 학습한 파라미터 $\theta_{syn}$보다 더 효율적이면서(efficiency ↑), 더 잘 적응하는(adaptability ↑) real domain 파라미터 $\theta_{real}$을 찾는 것이 목표다.

이를 해결하기 위해 divide-and-conquer 전략을 채택한다.

Phase 1 (MoC): Synthetic domain에서 model compression → efficiency 확보
Phase 2 (BiA): Real domain에서 bilevel adaptation → adaptability 확보

3. The Proposed Method

3.1 MoC: Model Compression in Synthetic Domain

그림 1. MoC phase의 효과 비교.

MoC의 핵심 문제는 large-scale parameter space의 dehazing capability를 compact한 student model로 옮기는 것이다.

이를 위해 아래와 같은 composite loss function을 사용한다.

$$\min_{\theta_{syn}^s} \mathcal{L}{syn}(\theta{syn}^s) + \mathcal{L}a(\theta{syn}^s, \theta_{syn}^t)$$

$\theta_{syn}^t$: pre-trained teacher model (frozen)
$\theta_{syn}^s$: 학습시키고자 하는 student model
$\mathcal{L}_{syn}$: supervised loss (L1 norm + SSIM + perceptual loss)
$\mathcal{L}_a$: context alignment loss

여기서 중요한 것은 context alignment loss ($\mathcal{L}_a$) 다.

$$\mathcal{L}a = \sum{i=0}^{L-1} w_i \cdot (T_i - S_i)^2$$

Teacher와 student의 각 layer의 feature map을 직접 비교해서, student가 teacher의 내부 표현 방식까지 따라가도록 학습한다. 단순히 최종 output만 맞추는 게 아니라, layer 단위의 fine-grained feature alignment를 수행하는 것이다.

실제 코드를 확인해보면, $w_i$는 단순 hyperparameter가 아니라 각 layer에서 teacher와 student의 cosine similarity를 계산한 후 softmax로 정규화한 동적 attention weight다. 즉 현재 학습 상태에 따라 가중치가 자동으로 조정된다.

또한 loss를 자세히 보면 단순 MSE가 아니라 global EMD loss + local patch EMD loss + Gaussian loss(mean, variance 비교) 를 조합해, distribution 레벨까지 정밀하게 맞추는 구조다.

그림 1을 보면, MoC 과정 없이 단순히 small architecture를 학습한 "Naive for Student"는 haze 제거가 미흡한 반면, MoC를 적용한 student는 teacher의 dehazing 능력을 효과적으로 계승하고 있음을 확인할 수 있다.

3.2 BiA: Bilevel Adaptation to Real Domain

MoC로 압축된 student model을 real-world scene에 적응시키는 단계다. Real domain에는 ground truth(clean image)가 없기 때문에 일반적인 supervised learning이 불가능하다.

왜 Bilevel이 필요한가?

가장 단순한 접근은 real hazy image를 넣고 output이 clear해 보이도록 학습시키는 것이다. 하지만 이 single-level 접근은 synthetic domain에서 학습한 dehazing capability를 점점 잃어버리는 catastrophic forgetting 문제가 발생한다.

이를 해결하기 위해 bilevel programming 구조를 도입한다.

$$\min_{\theta_{rea}^s} \mathcal{L}{rea}(\theta{rea}^s, \theta_{syn}^s(\theta_{rea}^s); \mathcal{D}{rea})$$ $$s.t. \quad \theta{syn}^s(\theta_{rea}^s) \in \arg\min_{\theta_{syn}^s} \Psi(\theta_{rea}^s, \theta_{syn}^s)$$

Upper-level: $\theta_{rea}$ → real domain 성능 최적화 (목표)
Lower-level: $\theta_{syn}$ → synthetic capability 유지 (제약 조건)

핵심 아이디어는 "real domain 적응은 허용하되, synthetic dehazing 능력을 잃지 않는 범위 안에서만 허용한다" 는 것이다.

그림 2. Bilevel modeling의 필요성 ablation.

그림 2를 보면 확연히 차이가 난다. Lower-level만 사용하면 dehazing이 불완전하고, Upper-level만 사용하면 부자연스러운 color distortion이 발생한다. Bilevel 구조를 모두 적용했을 때 비로소 자연스러운 결과를 얻을 수 있다.

Bilevel Adaptive Learning (EMA 기반)

$$\theta_{syn}^s(t+1) = \theta_{syn}^s(t) - \eta_{syn}^b \frac{\partial(\mathcal{L}{rea} + \mathcal{L}1)}{\partial \theta{syn}^s}$$ $$\theta{rea}^s(t+1) = \alpha \theta_{rea}^s(t) + (1-\alpha)\theta_{syn}^s(t+1)$$

EMA(Exponential Moving Average)의 smoothing 특성을 활용해 $\theta_{syn}$의 학습 궤적을 $\theta_{rea}$로 안정적으로 흡수한다. $\alpha = 0.95$로 설정되어 매 step마다 95%는 기존 $\theta_{rea}$를 유지하고, 5%만 새 정보를 흡수한다. 급격한 변화 없이 real domain 정보를 점진적으로 누적하는 방식이다.

여기서 $\mathcal{L}_1$은 L1-norm supervised loss로, student가 real domain에 overfitting되는 것을 방지하는 anchor 역할을 한다.

L_rea: CLIP-based Real Domain Loss

Real domain에 ground truth가 없기 때문에, CLIP의 semantic 이해를 supervision signal로 활용한다.

$$\mathcal{L}{rea} = \frac{e^{cos(\Phi{image}(I_R), \Phi_{text}(T_H))}}{\sum_{i \in {H,C}} e^{cos(\Phi_{image}(I_R), \Phi_{text}(T_i))}}$$

Haze image와 clear image를 구분하는 text prompt를 학습시킨 후, model의 output이 CLIP space에서 "clear image" 방향으로 분류되도록 contrastive loss를 적용하는 방식이다. Pixel-level supervision 없이도 semantic 판단을 loss signal로 변환할 수 있다는 점이 이 설계의 핵심이다.

4. Exploring Algorithmic Property

CoA는 두 가지 핵심 algorithmic property를 가진다.

4.1 Stability across Various Synthetic Domains

그림 3. Stability와 Flexibility의 qualitative 비교.

CoA는 RESIDE, Haze4K, THaze 등 어떤 synthetic domain을 사용해도 consistently 성능이 개선된다. 특히 diverse scene을 포함한 THaze에서 near-optimal 성능을 달성했으며, 평균적으로 4개 metric에서 14.2% 향상을 이뤄냈다.

4.2 Flexibility with Different Dehazing Models

CoA는 특정 teacher model에 종속되지 않는 model-agnostic 특성을 가진다. MSBDN, DehazeFormer, DEA 세 모델에 적용했을 때 parameter가 각각 94.06%, 86.74%, 73.89% 감소하면서도 성능은 오히려 향상됐다.

5. Experimental Results

5.1 Quantitative Comparison

아래는 RTTS, URHI, FATTAL 세 real-world dataset에서의 metric 비교다.

그림 4. FADE metric 비교.

그림 5. PM2.5 metric 비교.

그림 6. Entropy metric 비교.

FADE, PM2.5, Entropy, BIQME 네 가지 no-reference metric 모두에서 CoA가 거의 모든 경우 optimal 또는 near-optimal 성능을 달성했다.

5.2 Qualitative Comparison (Daytime / Dusty)

그림 7. Daytime 및 dusty scene 비교.

기존 supervised method들(SGID, C2P, Dehamer, DEA)은 haze 제거와 generalization에 한계가 있고, RIDCP는 sandstorm 장면에서 color distortion이 발생한다. 반면 CoA는 fine texture를 유지하면서 자연스럽고 사실적인 dehazing 결과를 보인다.

5.3 Adaptability: Nighttime Haze

그림 8. Nighttime haze scene 비교.

Nighttime dehazing은 overexposure와 인공 광원의 scattering 처리가 매우 어렵다. 기존 방법들은 이로 인해 지나치게 어둡거나 왜곡된 결과를 내지만, CoA는 haze 제거, noise 억제, detail 보존을 동시에 달성한다.

5.4 Efficiency 비교

Method SIZE (M) FLOPs (G) Time @ 1920×1080 (ms)

SGID	13.87	108.40	878.94
Dehamer	132.40	48.91	295.98
C2P	7.17	352.90	2531.43
RIDCP	28.72	144.43	1588.26
KANet	55.25	4.42	205.40
DEA	3.65	32.20	190.78
CoA (Ours)	1.69	2.67	52.52

CoA는 parameter 수와 FLOPs 모두에서 최소값을 기록했다. 특히 고해상도(1920×1080)에서 처리 속도가 압도적으로 빠르다. 해상도가 높아질수록 그 차이가 더 벌어지는 점도 주목할 만하다.

6. Conclusion

이 논문에서는 real image dehazing의 두 핵심 과제인 efficiency와 adaptability를 divide-and-conquer 전략으로 해결하는 CoA (Compression-and-Adaptation) 를 제안했다.

핵심 contribution을 정리하면 다음과 같다.

Phase 1 (MoC): Synthetic domain에서 teacher → student로 knowledge distillation. Context alignment loss를 통한 fine-grained feature transfer로 teacher의 dehazing capability를 compact model에 이식.
Phase 2 (BiA): Bilevel programming 구조로 real domain adaptation. CLIP-based loss로 ground truth 없이도 semantic supervision을 활용하고, EMA 기반 업데이트로 synthetic capability를 보존하면서 안정적으로 real domain에 적응.
Domain-irrelevant Stability + Model-agnostic Flexibility: 어떤 synthetic domain을 써도 consistent하게 향상되고, 다양한 teacher model에 plug-and-play 방식으로 적용 가능.

개인적으로 가장 흥미로웠던 부분은 CLIP을 real domain loss로 활용한 아이디어다. Ground truth가 없는 unsupervised 환경에서 "이 이미지가 hazy한가, clear한가"라는 semantic 판단 자체를 loss signal로 변환한 설계가 매우 영리하다고 생각한다.

한계점으로는 remote sensing haze image(hyperspectral)에서 spectral band마다 haze 특성이 달라 non-uniform fog occlusion이 남는다는 점이 있다. 향후 multi-band integration 연구가 필요한 부분이다.

논문 링크: arXiv:2504.05590
코드: https://github.com/YanZhang-zy/CoA

'딥러닝 > 논문 리뷰' 카테고리의 다른 글

[논문 리뷰] YOLOv10 Real-Time End-to-End Object Detection_1부 (0)	2024.12.17
[논문리뷰] Segment Anything (4)	2024.10.07
[논문리뷰] UNIVERSAL FEW-SHOT LEARNING OF DENSE PREDICTION TASKS WITH VISUAL TOKEN MATCHING (0)	2023.09.10
[논문리뷰] Zero shot text-to-image generation (0)	2023.09.06
End-to-End Object Detection with Transformers [논문 리뷰] (0)	2022.04.23

'딥러닝/논문 리뷰' Related Articles

Comments

Wooks_learning

[논문리뷰] CoA: Towards Real Image Dehazing via Compression-and-Adaptation (CVPR 2025) 본문

[논문리뷰] CoA: Towards Real Image Dehazing via Compression-and-Adaptation (CVPR 2025)

1. Introduction

2. Rethinking Real Image Dehazing

3. The Proposed Method

3.1 MoC: Model Compression in Synthetic Domain

3.2 BiA: Bilevel Adaptation to Real Domain

왜 Bilevel이 필요한가?

Bilevel Adaptive Learning (EMA 기반)

L_rea: CLIP-based Real Domain Loss

4. Exploring Algorithmic Property

4.1 Stability across Various Synthetic Domains

4.2 Flexibility with Different Dehazing Models

5. Experimental Results

5.1 Quantitative Comparison

5.2 Qualitative Comparison (Daytime / Dusty)

5.3 Adaptability: Nighttime Haze

5.4 Efficiency 비교

6. Conclusion

'딥러닝 > 논문 리뷰' 카테고리의 다른 글

티스토리툴바