Sandeep Yadav

Sandeep YadavAI Engineer building production-grade intelligent systems — from model training and fine-tuning to agentic workflows, ML infrastructure, and scalable inference.https://sandeepyadav1478.github.io/Agentic Pipelines, Code Gen & RAG Evaluationhttps://sandeepyadav1478.github.io/works/now-building/https://sandeepyadav1478.github.io/works/now-building/Three active projects at the intersection of LLM applications — multi-agent document processing, domain-adapted code generation, and systematic RAG quality measurement.Wed, 01 Apr 2026 00:00:00 GMTRLHF, Sparse MoE & Rust for Inferencehttps://sandeepyadav1478.github.io/works/now-learning/https://sandeepyadav1478.github.io/works/now-learning/Deepening expertise in three areas — alignment techniques for LLMs, sparse Mixture of Experts scaling, and systems-level inference serving with Rust.Wed, 01 Apr 2026 00:00:00 GMTML Systems Design, KV-Cache Research & Staff Engineeringhttps://sandeepyadav1478.github.io/works/now-reading/https://sandeepyadav1478.github.io/works/now-reading/Books and papers shaping how I think about production ML — system design principles, efficient long-context inference, and technical leadership beyond the IC track.Wed, 01 Apr 2026 00:00:00 GMTMulti-Agent Document Understandinghttps://sandeepyadav1478.github.io/works/multi-agent-document-understanding/https://sandeepyadav1478.github.io/works/multi-agent-document-understanding/Building multi-agent systems that decompose complex documents into structured knowledge using specialized LLM agents for extraction, reasoning, and validation.Sun, 15 Mar 2026 00:00:00 GMTPrompt Engineering for Production Systemshttps://sandeepyadav1478.github.io/works/prompt-engineering-workshop/https://sandeepyadav1478.github.io/works/prompt-engineering-workshop/Workshop on writing reliable, testable prompts for production LLM applications — covering structured outputs, guardrails, and prompt versioning.Thu, 05 Mar 2026 00:00:00 GMTDeep Dive: RLHF & Alignment Techniqueshttps://sandeepyadav1478.github.io/works/learning-rlhf/https://sandeepyadav1478.github.io/works/learning-rlhf/Studying reinforcement learning from human feedback — from reward modeling to PPO and DPO, understanding how modern LLMs are aligned to human preferences.Sun, 01 Mar 2026 00:00:00 GMTExploring Mixture of Experts Architectureshttps://sandeepyadav1478.github.io/works/mixture-of-experts-study/https://sandeepyadav1478.github.io/works/mixture-of-experts-study/Research notes on Mixture of Experts — how sparse activation enables scaling model capacity without proportional compute, from Switch Transformer to Mixtral.Sun, 15 Feb 2026 00:00:00 GMTDeepAgents — Multi-Agent Orchestration Researchhttps://sandeepyadav1478.github.io/works/deepagents-research/https://sandeepyadav1478.github.io/works/deepagents-research/Contributing to DeepAgents, a framework for building hierarchical multi-agent systems with planning, tool use, and memory.Sun, 01 Feb 2026 00:00:00 GMTDomain-Specific Code Generation with Llama 3https://sandeepyadav1478.github.io/works/llama3-domain-finetuning/https://sandeepyadav1478.github.io/works/llama3-domain-finetuning/Fine-tuning Llama 3 on proprietary codebases for domain-specific code generation — internal APIs, conventions, and patterns the base model doesn't know.Sun, 01 Feb 2026 00:00:00 GMTVector Database Benchmarks — Qdrant vs Pinecone vs Weaviatehttps://sandeepyadav1478.github.io/works/vector-db-benchmarks/https://sandeepyadav1478.github.io/works/vector-db-benchmarks/Comprehensive benchmark comparing vector databases for production RAG workloads — latency, recall, cost, and operational complexity.Sun, 25 Jan 2026 00:00:00 GMTLearning Rust for High-Performance Inferencehttps://sandeepyadav1478.github.io/works/rust-inference-servers/https://sandeepyadav1478.github.io/works/rust-inference-servers/Learning Rust with a focus on building high-performance ML inference servers — async runtimes, zero-copy deserialization, and ONNX runtime bindings.Tue, 20 Jan 2026 00:00:00 GMTMulti-Agent RAG System with LangGraphhttps://sandeepyadav1478.github.io/works/langgraph-agent-framework/https://sandeepyadav1478.github.io/works/langgraph-agent-framework/Production agentic RAG system using LangGraph for multi-step reasoning over enterprise knowledge bases. Handles 10K+ queries/day with sub-2s latency.Sat, 10 Jan 2026 00:00:00 GMTOpen-Source RAG Evaluation Frameworkhttps://sandeepyadav1478.github.io/works/rag-evaluation-framework/https://sandeepyadav1478.github.io/works/rag-evaluation-framework/Building an open-source framework to systematically evaluate RAG pipeline quality — retrieval relevance, answer faithfulness, and end-to-end correctness.Sat, 10 Jan 2026 00:00:00 GMTProduction LLM Inference with vLLMhttps://sandeepyadav1478.github.io/works/vllm-inference-optimization/https://sandeepyadav1478.github.io/works/vllm-inference-optimization/How we optimized LLM serving latency by 3x using vLLM's continuous batching, PagedAttention, and quantized model deployment.Fri, 05 Dec 2025 00:00:00 GMTDomain-Specific LLM Fine-Tuning with Unslothhttps://sandeepyadav1478.github.io/works/llm-fine-tuning-unsloth/https://sandeepyadav1478.github.io/works/llm-fine-tuning-unsloth/Fine-tuned Llama 3 and Mistral models for domain-specific tasks using Unsloth + QLoRA, achieving 40% faster training with 60% less VRAM.Thu, 20 Nov 2025 00:00:00 GMTEvaluating RAG Systems — Beyond Vibeshttps://sandeepyadav1478.github.io/works/rag-evaluation-talk/https://sandeepyadav1478.github.io/works/rag-evaluation-talk/Conference talk on systematic RAG evaluation using RAGAS metrics, human preference ranking, and automated regression testing.Sat, 18 Oct 2025 00:00:00 GMTHuggingFace Transformers — Core Contributionshttps://sandeepyadav1478.github.io/works/huggingface-transformers-contrib/https://sandeepyadav1478.github.io/works/huggingface-transformers-contrib/Contributed model implementations and training optimizations to HuggingFace's Transformers library, used by 100K+ developers worldwide.Mon, 15 Sep 2025 00:00:00 GMTReproducible ML Pipelines with DVChttps://sandeepyadav1478.github.io/works/dvc-data-pipelines/https://sandeepyadav1478.github.io/works/dvc-data-pipelines/A practical guide to building reproducible, version-controlled ML data pipelines using DVC, from dataset versioning to automated retraining.Sun, 10 Aug 2025 00:00:00 GMTembed-cache — Persistent Embedding Cachehttps://sandeepyadav1478.github.io/works/embeddings-cache-library/https://sandeepyadav1478.github.io/works/embeddings-cache-library/Python library that caches OpenAI/Cohere embedding API calls to SQLite, cutting costs by 80% for iterative RAG development.Sat, 12 Jul 2025 00:00:00 GMTML Experiment Tracking Platform with MLflowhttps://sandeepyadav1478.github.io/works/mlflow-experiment-tracking/https://sandeepyadav1478.github.io/works/mlflow-experiment-tracking/Built a centralized MLflow-based experiment tracking and model registry platform serving 15+ ML engineers across 3 teams.Sun, 15 Jun 2025 00:00:00 GMTOpen Source Mentorship — First-Time Contributors Programhttps://sandeepyadav1478.github.io/works/open-source-mentorship/https://sandeepyadav1478.github.io/works/open-source-mentorship/Organized and led a 4-week open source mentorship program helping 20+ developers make their first meaningful contributions to ML/AI projects.Thu, 15 May 2025 00:00:00 GMTAWS Machine Learning — Specialtyhttps://sandeepyadav1478.github.io/works/aws-ml-specialty/https://sandeepyadav1478.github.io/works/aws-ml-specialty/AWS professional certification covering ML workloads — SageMaker, model training, feature engineering, and ML solution architecture.Sat, 10 May 2025 00:00:00 GMTLLM Monitoring Dashboard with W&Bhttps://sandeepyadav1478.github.io/works/wandb-model-monitoring/https://sandeepyadav1478.github.io/works/wandb-model-monitoring/Real-time LLM monitoring system tracking token costs, latency distributions, hallucination rates, and model drift using Weights & Biases.Sun, 20 Apr 2025 00:00:00 GMTMigrating 50 Services to Kubernetes — A Retrospectivehttps://sandeepyadav1478.github.io/works/k8s-migration-retrospective/https://sandeepyadav1478.github.io/works/k8s-migration-retrospective/What went right, what broke, and what we'd do differently migrating a monolith-era fleet to Kubernetes over six months.Thu, 20 Mar 2025 00:00:00 GMTReal-Time Feature Store Architecturehttps://sandeepyadav1478.github.io/works/feature-store-design/https://sandeepyadav1478.github.io/works/feature-store-design/Designed a dual-layer feature store with offline batch features in Parquet/S3 and online real-time features in Redis, serving 50M+ predictions/day.Mon, 10 Feb 2025 00:00:00 GMTNeurIPS 2024 — Spotlight Poster Presentationhttps://sandeepyadav1478.github.io/works/neurips-2024-conference/https://sandeepyadav1478.github.io/works/neurips-2024-conference/Presented poster on efficient fine-tuning methods for domain-specific LLMs at NeurIPS 2024 in Vancouver.Tue, 10 Dec 2024 00:00:00 GMTMLOps Community Meetup — Speaker & Organizerhttps://sandeepyadav1478.github.io/works/mlops-community-meetup/https://sandeepyadav1478.github.io/works/mlops-community-meetup/Organized and spoke at the monthly MLOps Community meetup in San Francisco on production LLM monitoring patterns.Wed, 20 Nov 2024 00:00:00 GMTtaskr — Developer Task Runner CLIhttps://sandeepyadav1478.github.io/works/cli-task-runner/https://sandeepyadav1478.github.io/works/cli-task-runner/A fast, opinionated task runner for monorepos — parallel execution, dependency graphs, and smart caching. Written in Go.Fri, 08 Nov 2024 00:00:00 GMTLangChain AI Agents Hackathon — 2nd Placehttps://sandeepyadav1478.github.io/works/ai-agents-hackathon-2024/https://sandeepyadav1478.github.io/works/ai-agents-hackathon-2024/Built an autonomous code review agent in 48 hours that analyzes PRs, suggests fixes, and auto-generates test cases. Won 2nd place out of 200+ teams.Sun, 15 Sep 2024 00:00:00 GMTCertified Kubernetes Application Developer (CKAD)https://sandeepyadav1478.github.io/works/ckad-certification/https://sandeepyadav1478.github.io/works/ckad-certification/CNCF certification covering Kubernetes application design, deployment, configuration, and observability patterns.Sun, 15 Sep 2024 00:00:00 GMTSaaS Analytics Dashboard — Full-Stack Buildhttps://sandeepyadav1478.github.io/works/fullstack-saas-dashboard/https://sandeepyadav1478.github.io/works/fullstack-saas-dashboard/Self-hosted analytics dashboard with real-time event streaming, custom SQL queries, and team collaboration. React + FastAPI + PostgreSQL.Mon, 22 Jul 2024 00:00:00 GMTterraform-modules — Reusable Cloud Infrastructurehttps://sandeepyadav1478.github.io/works/terraform-infra-modules/https://sandeepyadav1478.github.io/works/terraform-infra-modules/Collection of production-tested Terraform modules for AWS — VPC, ECS, Lambda, IAM, and monitoring with security-first defaults.Thu, 18 Apr 2024 00:00:00 GMTAI Engineer at Acme AIhttps://sandeepyadav1478.github.io/works/ai-engineer-role/https://sandeepyadav1478.github.io/works/ai-engineer-role/Building LLM-powered applications and agentic workflows. Leading fine-tuning, RAG, and inference optimization for enterprise AI products.Mon, 15 Jan 2024 00:00:00 GMTAPI Gateway Redesign — From Monolith to Microserviceshttps://sandeepyadav1478.github.io/works/api-gateway-redesign/https://sandeepyadav1478.github.io/works/api-gateway-redesign/Redesigned the API gateway layer to support 200+ microservices with rate limiting, auth delegation, and circuit breakers.Wed, 10 Jan 2024 00:00:00 GMT