Sandeep Yadav

Hi, I'm @sandeep,

|

building production-grade AI systems

with Evals + LLMOps + HITL

AI Engineer New Delhi, India

4+ years building everything from scratch.

From fine-tuning LLMs to shipping agentic workflows handling real-world traffic — I've owned the full lifecycle.

One day, I stopped chasing titles. I started chasing clarity.

What drives me doesn't fit on a resume.

Building systems that last.

12+ open-source contributions. 3 production ML pipelines. Millions of inference requests served.

Training, experiment tracking, inference at scale, and the tooling that holds it all together.

Bigger problems. Harder systems. End-to-end.

Ready for what's next.

Claude Code Power User

High-Agency · AI-Fluency

Building production systems with AI-assisted development — using Claude Code for architecture design, complex refactors, and shipping full-stack features from terminal to deployment.

  • Built this entire portfolio site with Claude Code as AI pair programmer
  • Custom MCP servers, multi-agent workflows, and agentic tool chains
  • Deep expertise in prompt engineering and AI-native development patterns

Open Source

Featured Models

Work Experience

LLM Application Development

RAG pipelines, prompt engineering, multi-model orchestration, production inference

Model Fine-Tuning & Training

LoRA/QLoRA, domain adaptation, dataset curation, distributed training

Agentic Workflows

Multi-agent systems, tool use, HITL handoff, orchestration with LangGraph

ML Infrastructure & MLOps

Feature stores, experiment tracking, model registry, CI/CD for ML

Production Inference

vLLM, quantization, batching strategies, latency optimization, autoscaling

Technical Leadership

Architecture design, code review, mentoring, cross-team collaboration

AI Engineer

Acme AIJan 2024 — Present
Current

Building LLM-powered applications and agentic workflows. Deploying inference pipelines on AWS.

Multi-Agent Document Understanding

Extracting structured data from unstructured documents using specialized LLM agents.

LangGraphGPT-4FastAPI
Shipped 3 production LLM apps serving 100K+ daily users
Reduced inference latency by 40% with custom vLLM deployment
PyTorchvLLMLangGraphAWS SageMaker

ML Engineer

DataCorpMar 2022 — Dec 2023

Designed ML pipelines and built real-time feature stores serving 50M+ predictions/day.

Built real-time feature store serving 50M+ predictions/day
Reduced model training time by 60% with distributed training
MLflowDVCRayKubernetes

Software Engineer

TechStartJun 2020 — Feb 2022

Full-stack development with Python and React. Led migration of monolith to microservices on Kubernetes.

Led monolith to microservices migration on Kubernetes
Built CI/CD pipelines reducing deploy time from hours to minutes
PythonReactDockerK8s

Skills & Stack

Soft Skills

Technical WritingSystem DesignTeam LeadershipMentoringCross-functional CollaborationAgile / Scrum

Spoken Languages

EnglishFluent
HindiNative

Tech Stack

ML / AI

PyTorchHuggingFaceLangChainLangGraphUnslothvLLMONNXLoRA / QLoRARAGAgents

MLOps & Data

MLflowDVCWeights & BiasesRayAirflowKubeflowFeature StoresVector DBs

Programming

PythonTypeScriptGoSQLBashC++

Infra & Cloud

DockerKubernetesAWSGCPTerraformGitHub ActionsFastAPIgRPC

Education & Certifications

Education

M.S. Computer Science

2020
Stanford University

Focus on Machine Learning and Natural Language Processing.

B.Tech. Computer Science

2018
IIT Delhi

Graduated with honors. Thesis on deep learning for medical imaging.

AI Product Management Bootcamp

2024
Maven

Led by Dr. Marily Nika (ex-Google PM). Completed capstone project.

Awards & Certifications

Best AI Application — HuggingFace Hackathon

2024
HuggingFace

Built a multi-agent document understanding pipeline in 48 hours.

Top 10 Open Source Contributors

2023
GitHub

Recognized for sustained contributions to ML ecosystem projects.

Outstanding Graduate Thesis Award

2018
IIT Delhi

Publications

  1. Efficient Multi-Agent Architectures for Document Understanding

    Sandeep Yadav, Alice Park, Bob Liu

    arXiv preprint2024
  2. Scaling Retrieval-Augmented Generation for Enterprise Knowledge Bases

    Sandeep Yadav, Carol Zhang

    NeurIPS Workshop2023
  3. Low-Rank Adaptation Strategies for Domain-Specific LLMs

    Alice Park, Sandeep Yadav, David Kim

    EMNLP2023

Speaking & Appearances

Presentation

Building Reliable LLM Applications in Production

AI Engineer Summit 2024Oct 2024

Presentation

Fine-Tuning at Scale: Lessons from the Trenches

MLOps Community MeetupJul 2024

Podcast

The Practical Guide to RAG Systems

The ML PodcastMay 2024

Resources & Roadmaps

Curated paths and guides I maintain for the community.

Getting Started with LLMs

LLM Fundamentals

From transformers to RLHF — the essential building blocks.

Prompt Engineering Guide

Systematic techniques for reliable LLM outputs.

Fine-Tuning Playbook

When, why, and how to fine-tune open-weight models.

ML Engineering in Production

ML System Design

Patterns for building maintainable ML-powered products.

Inference Optimization

Quantization, batching, and serving at scale.

Monitoring & Evaluation

Keeping models honest after deployment.

Curated Lists

What People Say

One of the most thoughtful engineers I've worked with. Takes complex ML problems and delivers clean, production-ready solutions.

Jane SmithEngineering Manager, Acme AI

Their open-source contributions to our inference pipeline saved us weeks of work. Clear code, excellent documentation.

Alex ChenStaff Engineer, DataCorp

Rare combination of deep ML knowledge and strong engineering fundamentals. Ships reliable systems, not just notebooks.

Sam PatelCTO, TechStart

Frequently Asked Questions

Are you open to freelance or consulting work?

Yes — I take on select projects involving LLM applications, ML infrastructure, and AI strategy. Reach out via email to discuss.

What's your tech stack for most projects?

Python + PyTorch for ML, HuggingFace for models, FastAPI for serving, Docker + K8s for deployment, and AWS for cloud infrastructure.

Do you contribute to open source?

Actively. I contribute to HuggingFace Transformers, vLLM, and maintain a few of my own tools. Check the Open Source section above.

How do I book time with you?

Use the Calendly link on the contact page, or send me an email. I typically respond within 48 hours.

Let's Talk

Have a project in mind, want to collaborate, or just want to say hi? I'd love to hear from you.