Base LLMs generate generic code. When your team has internal SDKs, specific API patterns, and coding conventions, the model needs domain knowledge it was never trained on.
Approach
- Curated training dataset from internal repos (~50K code samples with docstrings)
- QLoRA fine-tuning with Unsloth for memory-efficient training on a single A100
- Custom evaluation harness testing function correctness, API usage accuracy, and style compliance
Results So Far
- 73% pass rate on internal API usage tests (vs 12% for base Llama 3)
- 3x faster inference with vLLM serving + speculative decoding