Sandeep Yadav

AI Engineer

New Delhi, India

4+ Years Building

12+ OSS Contributions

3 Production Pipelines

1M+ Inference Requests

4+ years building everything from scratch.

From fine-tuning LLMs to shipping agentic workflows handling real-world traffic — I've owned the full lifecycle.

One day, I stopped chasing titles. I started chasing clarity.

What drives me doesn't fit on a resume.

Building systems that last.

12+ open-source contributions. 3 production ML pipelines. Millions of inference requests served.

Training, experiment tracking, inference at scale, and the tooling that holds it all together.

Bigger problems. Harder systems. End-to-end.

Ready for what's next.

Start Here

New here? These are the best places to begin.

My Most Popular OSS Project

A HuggingFace Transformers contribution that powers thousands of inference pipelines.

→ 02

How I Approach ML Systems

A deep dive into building reliable, production-grade ML infrastructure.

→ 03

Read My Latest Writing

Thoughts on practical AI engineering and shipping models to production.

→

2K+ GitHub Stars

50+ OSS Contributions

15+ Projects Shipped

10K+ Blog Readers

Experience

Jan 2024 – Present AI Engineer Acme AI

Building LLM-powered applications and agentic workflows. Deploying inference pipelines on AWS.

Shipped 3 production LLM apps serving 100K+ daily users
Reduced inference latency by 40% with custom vLLM deployment

Mar 2022 – Dec 2023 ML Engineer DataCorp

Designed ML pipelines and built real-time feature stores serving 50M+ predictions/day.

Built real-time feature store serving 50M+ predictions/day
Reduced model training time by 60% with distributed training

Jun 2020 – Feb 2022 Software Engineer TechStart

Full-stack development with Python and React. Led migration of monolith to microservices on Kubernetes.

Led monolith to microservices migration on Kubernetes
Built CI/CD pipelines reducing deploy time from hours to minutes

Open Source

huggingface/transformers Contributor · Python

Added efficient batch decoding for streaming inference pipelines.

★ 120K ⑂ 24K

vllm-project/vllm Contributor · Python

Implemented custom sampling strategies for domain-specific generation.

★ 35K ⑂ 5.2K

sandeepyadav1478/ml-pipeline-kit Author · Python

Opinionated ML pipeline toolkit for rapid experimentation and deployment.

★ 1.2K ⑂ 180

Speaking

Oct 2024 Building Reliable LLM Applications in Production AI Engineer Summit 2024 · talk

Jul 2024 Fine-Tuning at Scale: Lessons from the Trenches MLOps Community Meetup · talk

May 2024 The Practical Guide to RAG Systems The ML Podcast · podcast

Skills

ML / AI

PyTorchHuggingFaceLangChainLangGraphUnslothvLLMONNXLoRA / QLoRARAGAgents

MLOps & Data

MLflowDVCWeights & BiasesRayAirflowKubeflowFeature StoresVector DBs

Programming

PythonTypeScriptGoSQLBashC++

Infra & Cloud

DockerKubernetesAWSGCPTerraformGitHub ActionsFastAPIgRPC

Soft Skills

Technical WritingSystem DesignTeam LeadershipMentoringCross-functional CollaborationAgile / Scrum

Spoken Languages

English (Fluent)Hindi (Native)

Education

2020 M.S. Computer Science Stanford University

Focus on Machine Learning and Natural Language Processing.

2018 B.Tech. Computer Science IIT Delhi

Graduated with honors. Thesis on deep learning for medical imaging.

2024 AI Product Management Bootcamp Maven

Led by Dr. Marily Nika (ex-Google PM). Completed capstone project.

Awards

2024 Best AI Application — HuggingFace Hackathon HuggingFace

Built a multi-agent document understanding pipeline in 48 hours.

2023 Top 10 Open Source Contributors GitHub

Recognized for sustained contributions to ML ecosystem projects.

2018 Outstanding Graduate Thesis Award IIT Delhi

Testimonials

"One of the most thoughtful engineers I've worked with. Takes complex ML problems and delivers clean, production-ready solutions."

— Jane Smith, Engineering Manager, Acme AI

"Their open-source contributions to our inference pipeline saved us weeks of work. Clear code, excellent documentation."

— Alex Chen, Staff Engineer, DataCorp

"Rare combination of deep ML knowledge and strong engineering fundamentals. Ships reliable systems, not just notebooks."

— Sam Patel, CTO, TechStart

FAQ

Are you open to freelance or consulting work?

Yes — I take on select projects involving LLM applications, ML infrastructure, and AI strategy. Reach out via email to discuss.

What's your tech stack for most projects?

Python + PyTorch for ML, HuggingFace for models, FastAPI for serving, Docker + K8s for deployment, and AWS for cloud infrastructure.

Do you contribute to open source?

Actively. I contribute to HuggingFace Transformers, vLLM, and maintain a few of my own tools. Check the Open Source section above.

How do I book time with you?

Use the Calendly link on the contact page, or send me an email. I typically respond within 48 hours.