Domain-Specific LLM Fine-Tuning with Unsloth

Project

at Acme AI

Fine-tuned open-weight LLMs for enterprise use cases — legal document summarization, code review, and customer support triage.

Approach

Why Unsloth

Unsloth’s fused kernels and memory optimizations let us fine-tune 8B models on a single A100 in under 2 hours — compared to 5+ hours with vanilla PEFT. The 4-bit training path kept VRAM under 24GB.

Results

ModelTaskAccuracyvs Base
Llama 3 8BLegal Summarization91.3%+18.7%
Mistral 7BCode Review87.5%+22.1%
Phi-3 MiniSupport Triage94.0%+15.3%

Deployment

Models exported to GGUF format for llama.cpp inference and served via vLLM behind a FastAPI gateway with streaming support.