Model Immunization from a Condition Number Perspective
Department of Computer Science, Purdue University
ICML 2025 (Oral)
Summary: We provide a theoretical framework for model immunization, showing how the condition number of the feature covariance matrix governs the ability to immunize models against harmful fine-tuning via linear probing, while preserving performance on benign tasks.
Preliminaries
Condition Number & Gradient Descent
The condition number of a matrix $\mathbf{S}$ is defined as
\[
\kappa(\mathbf{S}) = \|\mathbf{S}\|_2 \|\mathbf{S}^{\dagger}\|_2 = \frac{\sigma_{\mathbf{S}}^{\mathrm{max}}}{\sigma_{\mathbf{S}}^{\mathrm{min}}}
\]
where $\dagger$ is the pseudoinverse and $\sigma_{\mathbf{S}}^{\mathrm{max}}$, $\sigma_{\mathbf{S}}^{\mathrm{min}}$ are the largest and smallest singular values. The convergence rate of gradient descent for strongly convex $\mathcal{L}$ is:
\[
\|\mathbf{w}_t - \mathbf{w}^\ast\|^2 \leq \left(1 - \frac{\sigma^{\mathrm{min}}}{\sigma^{\mathrm{max}}}\right)^t \|\mathbf{w}_0 - \mathbf{w}^\ast\|^2
\]
Larger condition number $\to$ slower convergence.
Transfer Learning via Linear Probing
Given a pre-trained feature extractor $f_{\theta}$, linear probing learns a classifier $h_{\mathbf{w}}$ over a target dataset $\mathcal{D}$:
\[
\min_{\mathbf{w}} \mathcal{L}(\mathcal{D}, \mathbf{w}, \theta) = \sum_{(\mathbf{x}, \mathbf{y}) \in \mathcal{D}} \ell(h_{\mathbf{w}} \circ f_{\theta}(\mathbf{x}), \mathbf{y})
\]
Immunization with Condition Number
Goal & Setting
Learn a pre-trained model $g_{\omega} \circ f_{\theta^{\mathrm{I}}}$ such that fine-tuning on a harmful task is difficult, but not for other tasks. The model should maintain good pre-training performance. We focus on linear probing with gradient descent.
- $\mathcal{D}_{\mathrm{H}}$: Harmful task dataset
- $\mathcal{D}_{\mathrm{P}}$: Pre-training (benign) task dataset
- $\theta^{\mathrm{I}}$: Immunized feature parameters
- $\mathbf{w},\omega$: Classifier head parameters
- $\kappa(\cdot)$: Condition number
Definition (Immunized Model):
- (a) $\kappa(\nabla^2_{\mathbf{w}}\mathcal{L}(\mathcal{D}_{\mathrm{H}}, \mathbf{w}, \theta^{\mathrm{I}})) \gg \kappa(\nabla^2_{\mathbf{w}}\mathcal{L}(\mathcal{D}_{\mathrm{H}}, \mathbf{w}, \mathbf{I}))$
- (b) $\kappa(\nabla^2_{\omega}\mathcal{L}(\mathcal{D}_{\mathrm{P}}, \omega, \theta^{\mathrm{I}})) \leq \kappa(\nabla^2_{\omega}\mathcal{L}(\mathcal{D}_{\mathrm{P}}, \omega, \mathbf{I}))$
- (c) $\min_{\omega,\theta} \mathcal{L}(\mathcal{D}_{\mathrm{P}}, \omega, \theta) \approx \min_{\omega} \mathcal{L}(\mathcal{D}_{\mathrm{P}}, \omega, \theta^{\mathrm{I}})$
Hessian for Linear Probing
For $f_{\theta}(\mathbf{x}) = \mathbf{x}^\top\theta$ and $\ell_2$ loss, the Hessian is:
\[
\mathbf{H}_{\mathrm{H}}(\theta) = \nabla^2_{\mathbf{w}}\mathcal{L}(\mathcal{D}_{\mathrm{H}}, \mathbf{w}, \theta) = \theta^\top \mathbf{K}_{\mathrm{H}} \theta
\]
where $\mathbf{K}_{\mathrm{H}} = \mathbf{X}_{\mathrm{H}}^\top \mathbf{X}_{\mathrm{H}}$ is the data covariance.
Proposition: The singular values of the Hessian are
\[
\sigma_i = \sum_{j=1}^{D_{\mathrm{in}}} \left(\sigma_{\theta,i} (\mathbf{u}_{\theta,i}^\top \mathbf{q}_j) \sqrt{\gamma_j} \right)^2
\]
where $\sigma_{\theta,i}, \mathbf{u}_{\theta,i}$ are the $i$-th singular value/vector of $\theta$, and $\gamma_j, \mathbf{q}_j$ are for $\mathbf{K}$.
Insight: Immunization strength depends on the relative angle between the singular vectors of $\theta$ and the data covariance matrices.
Algorithm for Immunizing a Model
Optimization Objective
\[
\min_{\omega, \theta} \mathcal{R}_{\mathrm{ill}}(\mathbf{H}_{\mathrm{H}}(\theta)) + \mathcal{R}_{\mathrm{well}}(\mathbf{H}_{\mathrm{P}}(\theta)) + \mathcal{L}(\mathcal{D}_{\mathrm{P}},\omega, \theta)
\]
Algorithm
\[
\begin{aligned}
&\textbf{Input:}~ \mathcal{D}_{\mathrm{P}} = (\mathbf{X}_{\mathrm{P}}, \mathbf{Y}_{\mathrm{P}}),~ \mathbf{X}_{\mathrm{H}},~ \mathcal{L},~ \eta,~ \lambda_{\mathrm{P}},~ \lambda_{\mathrm{H}},~ \theta_0,~ \omega_0 \\
&\mathbf{K}_{\mathrm{P}} = \mathbf{X}_{\mathrm{P}}^\top \mathbf{X}_{\mathrm{P}} \\
&\mathbf{K}_{\mathrm{H}} = \mathbf{X}_{\mathrm{H}}^\top \mathbf{X}_{\mathrm{H}} \\
&\text{For } t = 0, 1, \dots, T-1: \\
&\quad \omega_{t+1} = \omega_t - \eta \nabla_{\omega} \mathcal{L}(\omega_t, \theta_t; \mathcal{D}_{\mathrm{P}}) \\
&\quad \mathbf{H}_{\mathrm{P}}(\theta_t) = \theta_t^\top \mathbf{K}_{\mathrm{P}} \theta_t \\
&\quad \mathbf{H}_{\mathrm{H}}(\theta_t) = \theta_t^\top \mathbf{K}_{\mathrm{H}} \theta_t \\
&\quad \theta_{t+1} = \theta_t - \eta \nabla_{\theta} \mathcal{L}(\omega_t, \theta_t; \mathbf{X}_1) \\
&\qquad - \eta \lambda_{\mathrm{P}} \mathbf{K}_{\mathrm{P}}^{-1} \nabla_{\theta} \mathcal{R}_{\mathrm{well}}(\mathbf{H}_{\mathrm{P}}(\theta_t)) \\
&\qquad - \eta \lambda_{\mathrm{H}} \mathbf{K}_{\mathrm{H}}^{-1} \nabla_{\theta} \mathcal{R}_{\mathrm{ill}}(\mathbf{H}_{\mathrm{H}}(\theta_t)) \\
&\textbf{Output:}~ \theta_{\mathrm{I}} = \theta_T
\end{aligned}
\]
Regularizer for Maximizing Condition Number:
\[
\mathcal{R}_{\mathrm{ill}}(\mathbf{S}) = \frac{1}{\frac{1}{2k} \|\mathbf{S}\|_F^2 - \frac{1}{2}(\sigma_{\mathbf{S}}^{\mathrm{min}})^2}
\]
Regularizer for Minimizing Condition Number:
\[
\mathcal{R}_{\mathrm{well}}(\mathbf{S}) = \frac{1}{2}\|\mathbf{S}\|_2^2 - \frac{1}{2p} \|\mathbf{S}\|_F^2
\]
Theoretical Guarantees
Upper Bound:
The regularizer $\mathcal{R}_{\mathrm{well}}(\mathbf{S})$ is an upper bound on $\log \kappa(\mathbf{S})$, and $\mathcal{R}_{\mathrm{ill}}(\mathbf{S})$ is an upper bound on $1/\log \kappa(\mathbf{S})$:
\[
\mathcal{R}_{\mathrm{well}}(\mathbf{S}) \geq \log \kappa(\mathbf{S}), \quad \mathcal{R}_{\mathrm{ill}}(\mathbf{S}) \geq \frac{1}{\log \kappa(\mathbf{S})}
\]
Differentiability:
If the minimum (or maximum) singular value of $\mathbf{S}$ is unique, then $\mathcal{R}_{\mathrm{ill}}(\mathbf{S})$ and $\mathcal{R}_{\mathrm{well}}(\mathbf{S})$ are differentiable, and their gradients have closed forms.
Monotonicity:
Gradient descent on $\mathcal{R}_{\mathrm{ill}}$ (resp. $\mathcal{R}_{\mathrm{well}}$) guarantees a monotonic increase (resp. decrease) in the condition number for $\theta^\top \mathbf{K} \theta$:
\[
\kappa(\mathbf{H}(\theta_{t+1})) > \kappa(\mathbf{H}(\theta_t)) \quad \text{or} \quad \kappa(\mathbf{H}(\theta_{t+1})) < \kappa(\mathbf{H}(\theta_t))
\]
for a suitable step size.
Experiments
Evaluation Metric
We introduce the relative immunization ratio (RIR) to quantify immunization effectiveness:
\[
\mathrm{RIR} = \frac{\kappa(\mathbf{H}_{\mathrm{H}}(\theta_{\mathrm{I}}))/\kappa(\mathbf{H}_{\mathrm{H}}(\mathbf{I}))}{\kappa(\mathbf{H}_{\mathrm{P}}(\theta_{\mathrm{I}}))/\kappa(\mathbf{H}_{\mathrm{P}}(\mathbf{I}))}
\]
A successful immunization has $\mathrm{RIR} \gg 1$, making harmful fine-tuning difficult while preserving performance on benign tasks.
House Price Regression
We first demonstrate our method on a house price regression task. Our approach achieves the highest RIR (356.20 ± 5.49), significantly outperforming baselines:
- Baseline A: Only maximizes harmful task condition number (RIR = 1.24)
- Baseline B: Uses bi-level optimization (RIR = 2.00)
- Baseline C: Directly optimizes condition number difference (RIR = 92.58)
To demonstrate how a large condition number affects convergence, we analyze the norm ratio $\|\mathbf{w}_t - \mathbf{w}^\star\|_2^2 / \|\mathbf{w}_0 - \mathbf{w}^\star\|_2^2$, which measures how the classifier weights $\mathbf{w}_t$ at step $t$ approach the optimal weights $\mathbf{w}^\star$ during fine-tuning. We use exact line search to ensure fair comparison across methods. Our method achieves immunization while maintaining good performance on pre-training tasks.

Pre-training task: Our method accelerates convergence

Harmful task: Our method significantly slows convergence
MNIST Classification
We evaluate on MNIST digit classification, where each digit pair forms a task. Our method achieves robust immunization across all digit pairs with RIR = 70.04 ± 3.28, while maintaining good performance on benign tasks. The heatmaps show the $\log$ RIR for each digit pair.

Baseline A

Baseline B

Baseline C

Ours: Robust immunization across all digit pairs
Deep Neural Networks
We further evaluate our method on ImageNet pre-trained models (ResNet18 and ViT) with transfer tasks. We provide the test accuracy on the benign task ($\mathcal{D}_{\mathrm{P}}$ = ImageNet1K) after immunization.:
- ResNet18: Test accuracy drops from 68.24% (initialization) to 62.36% (Cars as $\mathcal{D}_{\mathrm{H}}$) and 65.01% (Country211 as $\mathcal{D}_{\mathrm{H}}$).
- ViT: Test accuracy increases from 81.78% to 82.79% (Cars) and 83.17% (Country211).
To show the effect of immunization, we fine-tune the immunized model on harmful tasks and report the test accuracy. These results suggest it is possible to immunize a non-linear model against a harmful task without losing (and even improving) performance on the benign task.
Fine-tuning accuracy with ResNet-18
Fine-tuning accuracy with ViT
Citation
@inproceedings{zheng2025model,
title={Model Immunization from a Condition Number Perspective},
author={Zheng, Amber Yijia* and Bai, Cedar Site* and Bullins, Brian and Yeh, Raymond A},
booktitle={Proc. ICML},
year={2025}
}