<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Sandeep Yadav</title><description>AI Engineer building production-grade intelligent systems — from model training and fine-tuning to agentic workflows, ML infrastructure, and scalable inference.</description><link>https://sandeepyadav1478.github.io/</link><item><title>Agentic Pipelines, Code Gen &amp; RAG Evaluation</title><link>https://sandeepyadav1478.github.io/works/now-building/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/now-building/</guid><description>Three active projects at the intersection of LLM applications — multi-agent document processing, domain-adapted code generation, and systematic RAG quality measurement.</description><pubDate>Wed, 01 Apr 2026 00:00:00 GMT</pubDate></item><item><title>RLHF, Sparse MoE &amp; Rust for Inference</title><link>https://sandeepyadav1478.github.io/works/now-learning/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/now-learning/</guid><description>Deepening expertise in three areas — alignment techniques for LLMs, sparse Mixture of Experts scaling, and systems-level inference serving with Rust.</description><pubDate>Wed, 01 Apr 2026 00:00:00 GMT</pubDate></item><item><title>ML Systems Design, KV-Cache Research &amp; Staff Engineering</title><link>https://sandeepyadav1478.github.io/works/now-reading/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/now-reading/</guid><description>Books and papers shaping how I think about production ML — system design principles, efficient long-context inference, and technical leadership beyond the IC track.</description><pubDate>Wed, 01 Apr 2026 00:00:00 GMT</pubDate></item><item><title>Multi-Agent Document Understanding</title><link>https://sandeepyadav1478.github.io/works/multi-agent-document-understanding/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/multi-agent-document-understanding/</guid><description>Building multi-agent systems that decompose complex documents into structured knowledge using specialized LLM agents for extraction, reasoning, and validation.</description><pubDate>Sun, 15 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Prompt Engineering for Production Systems</title><link>https://sandeepyadav1478.github.io/works/prompt-engineering-workshop/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/prompt-engineering-workshop/</guid><description>Workshop on writing reliable, testable prompts for production LLM applications — covering structured outputs, guardrails, and prompt versioning.</description><pubDate>Thu, 05 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Deep Dive: RLHF &amp; Alignment Techniques</title><link>https://sandeepyadav1478.github.io/works/learning-rlhf/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/learning-rlhf/</guid><description>Studying reinforcement learning from human feedback — from reward modeling to PPO and DPO, understanding how modern LLMs are aligned to human preferences.</description><pubDate>Sun, 01 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Exploring Mixture of Experts Architectures</title><link>https://sandeepyadav1478.github.io/works/mixture-of-experts-study/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/mixture-of-experts-study/</guid><description>Research notes on Mixture of Experts — how sparse activation enables scaling model capacity without proportional compute, from Switch Transformer to Mixtral.</description><pubDate>Sun, 15 Feb 2026 00:00:00 GMT</pubDate></item><item><title>DeepAgents — Multi-Agent Orchestration Research</title><link>https://sandeepyadav1478.github.io/works/deepagents-research/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/deepagents-research/</guid><description>Contributing to DeepAgents, a framework for building hierarchical multi-agent systems with planning, tool use, and memory.</description><pubDate>Sun, 01 Feb 2026 00:00:00 GMT</pubDate></item><item><title>Domain-Specific Code Generation with Llama 3</title><link>https://sandeepyadav1478.github.io/works/llama3-domain-finetuning/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/llama3-domain-finetuning/</guid><description>Fine-tuning Llama 3 on proprietary codebases for domain-specific code generation — internal APIs, conventions, and patterns the base model doesn&apos;t know.</description><pubDate>Sun, 01 Feb 2026 00:00:00 GMT</pubDate></item><item><title>Vector Database Benchmarks — Qdrant vs Pinecone vs Weaviate</title><link>https://sandeepyadav1478.github.io/works/vector-db-benchmarks/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/vector-db-benchmarks/</guid><description>Comprehensive benchmark comparing vector databases for production RAG workloads — latency, recall, cost, and operational complexity.</description><pubDate>Sun, 25 Jan 2026 00:00:00 GMT</pubDate></item><item><title>Learning Rust for High-Performance Inference</title><link>https://sandeepyadav1478.github.io/works/rust-inference-servers/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/rust-inference-servers/</guid><description>Learning Rust with a focus on building high-performance ML inference servers — async runtimes, zero-copy deserialization, and ONNX runtime bindings.</description><pubDate>Tue, 20 Jan 2026 00:00:00 GMT</pubDate></item><item><title>Multi-Agent RAG System with LangGraph</title><link>https://sandeepyadav1478.github.io/works/langgraph-agent-framework/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/langgraph-agent-framework/</guid><description>Production agentic RAG system using LangGraph for multi-step reasoning over enterprise knowledge bases. Handles 10K+ queries/day with sub-2s latency.</description><pubDate>Sat, 10 Jan 2026 00:00:00 GMT</pubDate></item><item><title>Open-Source RAG Evaluation Framework</title><link>https://sandeepyadav1478.github.io/works/rag-evaluation-framework/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/rag-evaluation-framework/</guid><description>Building an open-source framework to systematically evaluate RAG pipeline quality — retrieval relevance, answer faithfulness, and end-to-end correctness.</description><pubDate>Sat, 10 Jan 2026 00:00:00 GMT</pubDate></item><item><title>Production LLM Inference with vLLM</title><link>https://sandeepyadav1478.github.io/works/vllm-inference-optimization/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/vllm-inference-optimization/</guid><description>How we optimized LLM serving latency by 3x using vLLM&apos;s continuous batching, PagedAttention, and quantized model deployment.</description><pubDate>Fri, 05 Dec 2025 00:00:00 GMT</pubDate></item><item><title>Domain-Specific LLM Fine-Tuning with Unsloth</title><link>https://sandeepyadav1478.github.io/works/llm-fine-tuning-unsloth/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/llm-fine-tuning-unsloth/</guid><description>Fine-tuned Llama 3 and Mistral models for domain-specific tasks using Unsloth + QLoRA, achieving 40% faster training with 60% less VRAM.</description><pubDate>Thu, 20 Nov 2025 00:00:00 GMT</pubDate></item><item><title>Evaluating RAG Systems — Beyond Vibes</title><link>https://sandeepyadav1478.github.io/works/rag-evaluation-talk/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/rag-evaluation-talk/</guid><description>Conference talk on systematic RAG evaluation using RAGAS metrics, human preference ranking, and automated regression testing.</description><pubDate>Sat, 18 Oct 2025 00:00:00 GMT</pubDate></item><item><title>HuggingFace Transformers — Core Contributions</title><link>https://sandeepyadav1478.github.io/works/huggingface-transformers-contrib/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/huggingface-transformers-contrib/</guid><description>Contributed model implementations and training optimizations to HuggingFace&apos;s Transformers library, used by 100K+ developers worldwide.</description><pubDate>Mon, 15 Sep 2025 00:00:00 GMT</pubDate></item><item><title>Reproducible ML Pipelines with DVC</title><link>https://sandeepyadav1478.github.io/works/dvc-data-pipelines/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/dvc-data-pipelines/</guid><description>A practical guide to building reproducible, version-controlled ML data pipelines using DVC, from dataset versioning to automated retraining.</description><pubDate>Sun, 10 Aug 2025 00:00:00 GMT</pubDate></item><item><title>embed-cache — Persistent Embedding Cache</title><link>https://sandeepyadav1478.github.io/works/embeddings-cache-library/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/embeddings-cache-library/</guid><description>Python library that caches OpenAI/Cohere embedding API calls to SQLite, cutting costs by 80% for iterative RAG development.</description><pubDate>Sat, 12 Jul 2025 00:00:00 GMT</pubDate></item><item><title>ML Experiment Tracking Platform with MLflow</title><link>https://sandeepyadav1478.github.io/works/mlflow-experiment-tracking/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/mlflow-experiment-tracking/</guid><description>Built a centralized MLflow-based experiment tracking and model registry platform serving 15+ ML engineers across 3 teams.</description><pubDate>Sun, 15 Jun 2025 00:00:00 GMT</pubDate></item><item><title>Open Source Mentorship — First-Time Contributors Program</title><link>https://sandeepyadav1478.github.io/works/open-source-mentorship/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/open-source-mentorship/</guid><description>Organized and led a 4-week open source mentorship program helping 20+ developers make their first meaningful contributions to ML/AI projects.</description><pubDate>Thu, 15 May 2025 00:00:00 GMT</pubDate></item><item><title>AWS Machine Learning — Specialty</title><link>https://sandeepyadav1478.github.io/works/aws-ml-specialty/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/aws-ml-specialty/</guid><description>AWS professional certification covering ML workloads — SageMaker, model training, feature engineering, and ML solution architecture.</description><pubDate>Sat, 10 May 2025 00:00:00 GMT</pubDate></item><item><title>LLM Monitoring Dashboard with W&amp;B</title><link>https://sandeepyadav1478.github.io/works/wandb-model-monitoring/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/wandb-model-monitoring/</guid><description>Real-time LLM monitoring system tracking token costs, latency distributions, hallucination rates, and model drift using Weights &amp; Biases.</description><pubDate>Sun, 20 Apr 2025 00:00:00 GMT</pubDate></item><item><title>Migrating 50 Services to Kubernetes — A Retrospective</title><link>https://sandeepyadav1478.github.io/works/k8s-migration-retrospective/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/k8s-migration-retrospective/</guid><description>What went right, what broke, and what we&apos;d do differently migrating a monolith-era fleet to Kubernetes over six months.</description><pubDate>Thu, 20 Mar 2025 00:00:00 GMT</pubDate></item><item><title>Real-Time Feature Store Architecture</title><link>https://sandeepyadav1478.github.io/works/feature-store-design/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/feature-store-design/</guid><description>Designed a dual-layer feature store with offline batch features in Parquet/S3 and online real-time features in Redis, serving 50M+ predictions/day.</description><pubDate>Mon, 10 Feb 2025 00:00:00 GMT</pubDate></item><item><title>NeurIPS 2024 — Spotlight Poster Presentation</title><link>https://sandeepyadav1478.github.io/works/neurips-2024-conference/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/neurips-2024-conference/</guid><description>Presented poster on efficient fine-tuning methods for domain-specific LLMs at NeurIPS 2024 in Vancouver.</description><pubDate>Tue, 10 Dec 2024 00:00:00 GMT</pubDate></item><item><title>MLOps Community Meetup — Speaker &amp; Organizer</title><link>https://sandeepyadav1478.github.io/works/mlops-community-meetup/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/mlops-community-meetup/</guid><description>Organized and spoke at the monthly MLOps Community meetup in San Francisco on production LLM monitoring patterns.</description><pubDate>Wed, 20 Nov 2024 00:00:00 GMT</pubDate></item><item><title>taskr — Developer Task Runner CLI</title><link>https://sandeepyadav1478.github.io/works/cli-task-runner/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/cli-task-runner/</guid><description>A fast, opinionated task runner for monorepos — parallel execution, dependency graphs, and smart caching. Written in Go.</description><pubDate>Fri, 08 Nov 2024 00:00:00 GMT</pubDate></item><item><title>LangChain AI Agents Hackathon — 2nd Place</title><link>https://sandeepyadav1478.github.io/works/ai-agents-hackathon-2024/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/ai-agents-hackathon-2024/</guid><description>Built an autonomous code review agent in 48 hours that analyzes PRs, suggests fixes, and auto-generates test cases. Won 2nd place out of 200+ teams.</description><pubDate>Sun, 15 Sep 2024 00:00:00 GMT</pubDate></item><item><title>Certified Kubernetes Application Developer (CKAD)</title><link>https://sandeepyadav1478.github.io/works/ckad-certification/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/ckad-certification/</guid><description>CNCF certification covering Kubernetes application design, deployment, configuration, and observability patterns.</description><pubDate>Sun, 15 Sep 2024 00:00:00 GMT</pubDate></item><item><title>SaaS Analytics Dashboard — Full-Stack Build</title><link>https://sandeepyadav1478.github.io/works/fullstack-saas-dashboard/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/fullstack-saas-dashboard/</guid><description>Self-hosted analytics dashboard with real-time event streaming, custom SQL queries, and team collaboration. React + FastAPI + PostgreSQL.</description><pubDate>Mon, 22 Jul 2024 00:00:00 GMT</pubDate></item><item><title>terraform-modules — Reusable Cloud Infrastructure</title><link>https://sandeepyadav1478.github.io/works/terraform-infra-modules/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/terraform-infra-modules/</guid><description>Collection of production-tested Terraform modules for AWS — VPC, ECS, Lambda, IAM, and monitoring with security-first defaults.</description><pubDate>Thu, 18 Apr 2024 00:00:00 GMT</pubDate></item><item><title>AI Engineer at Acme AI</title><link>https://sandeepyadav1478.github.io/works/ai-engineer-role/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/ai-engineer-role/</guid><description>Building LLM-powered applications and agentic workflows. Leading fine-tuning, RAG, and inference optimization for enterprise AI products.</description><pubDate>Mon, 15 Jan 2024 00:00:00 GMT</pubDate></item><item><title>API Gateway Redesign — From Monolith to Microservices</title><link>https://sandeepyadav1478.github.io/works/api-gateway-redesign/</link><guid isPermaLink="true">https://sandeepyadav1478.github.io/works/api-gateway-redesign/</guid><description>Redesigned the API gateway layer to support 200+ microservices with rate limiting, auth delegation, and circuit breakers.</description><pubDate>Wed, 10 Jan 2024 00:00:00 GMT</pubDate></item></channel></rss>