Work

Production systems with architecture decisions, trade-offs, and results.

Production ML monitoring system that detects embedding drift and model degradation in real-time

Code

Multi-model LLM router that optimizes cost and latency by intelligently routing queries to local/cloud models based on complexity analysis

Code

Intelligent embedding cache with ML-driven eviction policies and real-time performance optimization

Code

None

Code