Abstract
We propose a parameter-efficient fine-tuning approach that reduces compute requirements by 4x while maintaining 97% of full fine-tuning performance on domain-specific benchmarks. Our method combines adaptive rank selection with gradient-aware layer freezing.
Key Contributions
- Adaptive LoRA rank selection — dynamically adjusts rank per layer based on gradient magnitude during training
- Layer-wise freezing scheduler — progressively freezes converged layers to redirect compute to under-trained parameters
- Domain benchmark suite — released evaluation suite covering legal, medical, and financial domains
Takeaways
The conference provided excellent networking with teams from DeepMind, Meta FAIR, and several university labs working on similar efficiency problems. Led to two follow-up collaborations.