Model Immunization from a Condition Number Perspective

Amber Yijia Zheng* Cedar Site Bai* Brian Bullins Raymond A. Yeh

Department of Computer Science, Purdue University

ICML 2025 (Oral)

Code arXiv Paper

Summary: We provide a theoretical framework for model immunization, showing how the condition number of the feature covariance matrix governs the ability to immunize models against harmful fine-tuning via linear probing, while preserving performance on benign tasks.

Preliminaries

Condition Number & Gradient Descent

The condition number of a matrix $\mathbf{S}$ is defined as \[ \kappa(\mathbf{S}) = \|\mathbf{S}\|_2 \|\mathbf{S}^{\dagger}\|_2 = \frac{\sigma_{\mathbf{S}}^{\mathrm{max}}}{\sigma_{\mathbf{S}}^{\mathrm{min}}} \] where $\dagger$ is the pseudoinverse and $\sigma_{\mathbf{S}}^{\mathrm{max}}$, $\sigma_{\mathbf{S}}^{\mathrm{min}}$ are the largest and smallest singular values. The convergence rate of gradient descent for strongly convex $\mathcal{L}$ is: \[ \|\mathbf{w}_t - \mathbf{w}^\ast\|^2 \leq \left(1 - \frac{\sigma^{\mathrm{min}}}{\sigma^{\mathrm{max}}}\right)^t \|\mathbf{w}_0 - \mathbf{w}^\ast\|^2 \] Larger condition number $\to$ slower convergence.

Transfer Learning via Linear Probing

Given a pre-trained feature extractor $f_{\theta}$, linear probing learns a classifier $h_{\mathbf{w}}$ over a target dataset $\mathcal{D}$: \[ \min_{\mathbf{w}} \mathcal{L}(\mathcal{D}, \mathbf{w}, \theta) = \sum_{(\mathbf{x}, \mathbf{y}) \in \mathcal{D}} \ell(h_{\mathbf{w}} \circ f_{\theta}(\mathbf{x}), \mathbf{y}) \]

Immunization with Condition Number

Goal & Setting

Learn a pre-trained model $g_{\omega} \circ f_{\theta^{\mathrm{I}}}$ such that fine-tuning on a harmful task is difficult, but not for other tasks. The model should maintain good pre-training performance. We focus on linear probing with gradient descent.

$\mathcal{D}_{\mathrm{H}}$: Harmful task dataset
$\mathcal{D}_{\mathrm{P}}$: Pre-training (benign) task dataset
$\theta^{\mathrm{I}}$: Immunized feature parameters
$\mathbf{w},\omega$: Classifier head parameters
$\kappa(\cdot)$: Condition number

Definition (Immunized Model):

(a) $\kappa(\nabla^2_{\mathbf{w}}\mathcal{L}(\mathcal{D}_{\mathrm{H}}, \mathbf{w}, \theta^{\mathrm{I}})) \gg \kappa(\nabla^2_{\mathbf{w}}\mathcal{L}(\mathcal{D}_{\mathrm{H}}, \mathbf{w}, \mathbf{I}))$
(b) $\kappa(\nabla^2_{\omega}\mathcal{L}(\mathcal{D}_{\mathrm{P}}, \omega, \theta^{\mathrm{I}})) \leq \kappa(\nabla^2_{\omega}\mathcal{L}(\mathcal{D}_{\mathrm{P}}, \omega, \mathbf{I}))$
(c) $\min_{\omega,\theta} \mathcal{L}(\mathcal{D}_{\mathrm{P}}, \omega, \theta) \approx \min_{\omega} \mathcal{L}(\mathcal{D}_{\mathrm{P}}, \omega, \theta^{\mathrm{I}})$

Hessian for Linear Probing

For $f_{\theta}(\mathbf{x}) = \mathbf{x}^\top\theta$ and $\ell_2$ loss, the Hessian is: \[ \mathbf{H}_{\mathrm{H}}(\theta) = \nabla^2_{\mathbf{w}}\mathcal{L}(\mathcal{D}_{\mathrm{H}}, \mathbf{w}, \theta) = \theta^\top \mathbf{K}_{\mathrm{H}} \theta \] where $\mathbf{K}_{\mathrm{H}} = \mathbf{X}_{\mathrm{H}}^\top \mathbf{X}_{\mathrm{H}}$ is the data covariance.

Proposition: The singular values of the Hessian are \[ \sigma_i = \sum_{j=1}^{D_{\mathrm{in}}} \left(\sigma_{\theta,i} (\mathbf{u}_{\theta,i}^\top \mathbf{q}_j) \sqrt{\gamma_j} \right)^2 \] where $\sigma_{\theta,i}, \mathbf{u}_{\theta,i}$ are the $i$-th singular value/vector of $\theta$, and $\gamma_j, \mathbf{q}_j$ are for $\mathbf{K}$.

Insight: Immunization strength depends on the relative angle between the singular vectors of $\theta$ and the data covariance matrices.

Algorithm for Immunizing a Model

Optimization Objective

\[ \min_{\omega, \theta} \mathcal{R}_{\mathrm{ill}}(\mathbf{H}_{\mathrm{H}}(\theta)) + \mathcal{R}_{\mathrm{well}}(\mathbf{H}_{\mathrm{P}}(\theta)) + \mathcal{L}(\mathcal{D}_{\mathrm{P}},\omega, \theta) \]

Algorithm

\[ \begin{aligned} &\textbf{Input:}~ \mathcal{D}_{\mathrm{P}} = (\mathbf{X}_{\mathrm{P}}, \mathbf{Y}_{\mathrm{P}}),~ \mathbf{X}_{\mathrm{H}},~ \mathcal{L},~ \eta,~ \lambda_{\mathrm{P}},~ \lambda_{\mathrm{H}},~ \theta_0,~ \omega_0 \\ &\mathbf{K}_{\mathrm{P}} = \mathbf{X}_{\mathrm{P}}^\top \mathbf{X}_{\mathrm{P}} \\ &\mathbf{K}_{\mathrm{H}} = \mathbf{X}_{\mathrm{H}}^\top \mathbf{X}_{\mathrm{H}} \\ &\text{For } t = 0, 1, \dots, T-1: \\ &\quad \omega_{t+1} = \omega_t - \eta \nabla_{\omega} \mathcal{L}(\omega_t, \theta_t; \mathcal{D}_{\mathrm{P}}) \\ &\quad \mathbf{H}_{\mathrm{P}}(\theta_t) = \theta_t^\top \mathbf{K}_{\mathrm{P}} \theta_t \\ &\quad \mathbf{H}_{\mathrm{H}}(\theta_t) = \theta_t^\top \mathbf{K}_{\mathrm{H}} \theta_t \\ &\quad \theta_{t+1} = \theta_t - \eta \nabla_{\theta} \mathcal{L}(\omega_t, \theta_t; \mathbf{X}_1) \\ &\qquad - \eta \lambda_{\mathrm{P}} \mathbf{K}_{\mathrm{P}}^{-1} \nabla_{\theta} \mathcal{R}_{\mathrm{well}}(\mathbf{H}_{\mathrm{P}}(\theta_t)) \\ &\qquad - \eta \lambda_{\mathrm{H}} \mathbf{K}_{\mathrm{H}}^{-1} \nabla_{\theta} \mathcal{R}_{\mathrm{ill}}(\mathbf{H}_{\mathrm{H}}(\theta_t)) \\ &\textbf{Output:}~ \theta_{\mathrm{I}} = \theta_T \end{aligned} \]

Regularizer for Maximizing Condition Number: \[ \mathcal{R}_{\mathrm{ill}}(\mathbf{S}) = \frac{1}{\frac{1}{2k} \|\mathbf{S}\|_F^2 - \frac{1}{2}(\sigma_{\mathbf{S}}^{\mathrm{min}})^2} \]

Regularizer for Minimizing Condition Number: \[ \mathcal{R}_{\mathrm{well}}(\mathbf{S}) = \frac{1}{2}\|\mathbf{S}\|_2^2 - \frac{1}{2p} \|\mathbf{S}\|_F^2 \]

Theoretical Guarantees

Upper Bound:
The regularizer $\mathcal{R}_{\mathrm{well}}(\mathbf{S})$ is an upper bound on $\log \kappa(\mathbf{S})$, and $\mathcal{R}_{\mathrm{ill}}(\mathbf{S})$ is an upper bound on $1/\log \kappa(\mathbf{S})$: \[ \mathcal{R}_{\mathrm{well}}(\mathbf{S}) \geq \log \kappa(\mathbf{S}), \quad \mathcal{R}_{\mathrm{ill}}(\mathbf{S}) \geq \frac{1}{\log \kappa(\mathbf{S})} \]

Differentiability:
If the minimum (or maximum) singular value of $\mathbf{S}$ is unique, then $\mathcal{R}_{\mathrm{ill}}(\mathbf{S})$ and $\mathcal{R}_{\mathrm{well}}(\mathbf{S})$ are differentiable, and their gradients have closed forms.

Monotonicity:
Gradient descent on $\mathcal{R}_{\mathrm{ill}}$ (resp. $\mathcal{R}_{\mathrm{well}}$) guarantees a monotonic increase (resp. decrease) in the condition number for $\theta^\top \mathbf{K} \theta$: \[ \kappa(\mathbf{H}(\theta_{t+1})) > \kappa(\mathbf{H}(\theta_t)) \quad \text{or} \quad \kappa(\mathbf{H}(\theta_{t+1})) < \kappa(\mathbf{H}(\theta_t)) \] for a suitable step size.

Experiments

Evaluation Metric

We introduce the relative immunization ratio (RIR) to quantify immunization effectiveness: \[ \mathrm{RIR} = \frac{\kappa(\mathbf{H}_{\mathrm{H}}(\theta_{\mathrm{I}}))/\kappa(\mathbf{H}_{\mathrm{H}}(\mathbf{I}))}{\kappa(\mathbf{H}_{\mathrm{P}}(\theta_{\mathrm{I}}))/\kappa(\mathbf{H}_{\mathrm{P}}(\mathbf{I}))} \] A successful immunization has $\mathrm{RIR} \gg 1$, making harmful fine-tuning difficult while preserving performance on benign tasks.

House Price Regression

We first demonstrate our method on a house price regression task. Our approach achieves the highest RIR (356.20 ± 5.49), significantly outperforming baselines:

Baseline A: Only maximizes harmful task condition number (RIR = 1.24)
Baseline B: Uses bi-level optimization (RIR = 2.00)
Baseline C: Directly optimizes condition number difference (RIR = 92.58)

To demonstrate how a large condition number affects convergence, we analyze the norm ratio $\|\mathbf{w}_t - \mathbf{w}^\star\|_2^2 / \|\mathbf{w}_0 - \mathbf{w}^\star\|_2^2$, which measures how the classifier weights $\mathbf{w}_t$ at step $t$ approach the optimal weights $\mathbf{w}^\star$ during fine-tuning. We use exact line search to ensure fair comparison across methods. Our method achieves immunization while maintaining good performance on pre-training tasks.

Pre-training task: Our method accelerates convergence

Harmful task: Our method significantly slows convergence

MNIST Classification

We evaluate on MNIST digit classification, where each digit pair forms a task. Our method achieves robust immunization across all digit pairs with RIR = 70.04 ± 3.28, while maintaining good performance on benign tasks. The heatmaps show the $\log$ RIR for each digit pair.

Baseline A

Baseline B

Baseline C

Ours: Robust immunization across all digit pairs

Deep Neural Networks

We further evaluate our method on ImageNet pre-trained models (ResNet18 and ViT) with transfer tasks. We provide the test accuracy on the benign task ($\mathcal{D}_{\mathrm{P}}$ = ImageNet1K) after immunization.:

ResNet18: Test accuracy drops from 68.24% (initialization) to 62.36% (Cars as $\mathcal{D}_{\mathrm{H}}$) and 65.01% (Country211 as $\mathcal{D}_{\mathrm{H}}$).
ViT: Test accuracy increases from 81.78% to 82.79% (Cars) and 83.17% (Country211).

To show the effect of immunization, we fine-tune the immunized model on harmful tasks and report the test accuracy. These results suggest it is possible to immunize a non-linear model against a harmful task without losing (and even improving) performance on the benign task.

Fine-tuning accuracy with ResNet-18

Fine-tuning accuracy with ViT

Citation

@inproceedings{zheng2025model,
    title={Model Immunization from a Condition Number Perspective},
    author={Zheng, Amber Yijia* and Bai, Cedar Site* and Bullins, Brian and Yeh, Raymond A},
    booktitle={Proc. ICML},
    year={2025}
}