Domain-Specific LLM Fine-Tuning with Unsloth

Fine-tuned open-weight LLMs for enterprise use cases — legal document summarization, code review, and customer support triage.

Approach

Base Models — Llama 3 8B, Mistral 7B, Phi-3 Mini
Method — QLoRA (4-bit quantization + Low-Rank Adaptation) via Unsloth
Data — curated instruction datasets (5K-20K examples per domain)
Evaluation — custom benchmarks + human preference ranking

Why Unsloth

Unsloth’s fused kernels and memory optimizations let us fine-tune 8B models on a single A100 in under 2 hours — compared to 5+ hours with vanilla PEFT. The 4-bit training path kept VRAM under 24GB.

Results

Model	Task	Accuracy	vs Base
Llama 3 8B	Legal Summarization	91.3%	+18.7%
Mistral 7B	Code Review	87.5%	+22.1%
Phi-3 Mini	Support Triage	94.0%	+15.3%

Deployment

Models exported to GGUF format for llama.cpp inference and served via vLLM behind a FastAPI gateway with streaming support.