AI/ML · infra & backend · SRE mindset
RL researcher and product engineer. I find problems, build, ship, and keep models honest in production. 2× first-authored IEEE papers in RL; shipped production RL for robotics safety certification; now SRE on a 21M+ DAU platform, where the autonomous incident-response agent I built cut MTTR by 53%.
Engineer across AI/ML and infrastructure, with an SRE mindset. Product engineer end-to-end — ideate, build, ship, operate.
I care about the boring parts of production ML: drift, train/serve skew, cache hit rates, MTTR — the things that decide whether a model is actually useful once it leaves the notebook.
MS Computer Engineering (accelerated), University of Texas at Arlington.
Empirical RL for autonomous robot navigation. Simulations, reward structures, ablations, results.
A DQN-based navigation agent trained against a physics-informed simulator. Reward engineering and ablation studies across sim conditions; failure-mode analysis to characterize where the policy breaks.
NVIDIA Sionna as the RF-propagation physics layer for an RL agent optimizing navigation paths. Higher-fidelity simulation closes the sim-to-real gap that usually makes transfer brittle in robotics.
ML infrastructure projects on github.com/msmichellesamson. Each explores one silent-failure mode in production ML: drift, skew, cache misses, cost.
Can a heuristic router approximate RouteLLM? Send ~50% of queries to a cheaper model with negligible quality loss, without the deployment burden of a learned router.
Detecting when an embedding model's output distribution has shifted underneath you — before downstream retrieval quietly degrades and nobody attributes it to drift for weeks.
Predicting LLM quality degradation from inference-side signals alone — no labels, no human feedback — before users notice. Observability for LLMs that classical APM never solved.
Predictive precomputation for RAG embedding caches. Cache hit rate is the lever for both latency and carbon cost — every hit is one fewer GPU forward pass.
Learned eviction vs. LRU/LFU for embedding caches, where “value” isn't just access frequency but semantic overlap with future queries. Benchmarks against Belady's MIN.
A sandbox for studying train/serve skew — the silent failure where feature pipelines diverge between training and serving, and model quality degrades with no exception thrown.
SRE · ML engineering · RL research.
21M+ DAU · 27K+ RPS · globally distributed
Designed and delivered end-to-end an autonomous Incident Response Agent with a subagentic blackboard architecture (specialized agents per tool/skill). Shipped ahead of Google's ADK announcement, led with 2 engineers, reduced MTTR by 53% (73 → 34 min) and eliminated ~500 hrs/yr of toil. Also: observability audit surfaced $80K/yr in cloud savings across GCP, Kubernetes, and Terraform IaC.
Built the RL model behind a first-of-its-kind AI safety certification product for robotic systems. Designed dataset labeling and model-tuning pipelines (scikit-learn + TensorFlow); deployed production inference via FastAPI.
Shipped a Kotlin/Compose Android app to Google Play end-to-end: MVVM, Coroutines, REST APIs (Node.js/Express), Firebase (Firestore/Auth/Crashlytics), GitHub Actions CI/CD. Product engineering with a real release cadence.
Implemented Deep Q-Networks for autonomous robot navigation using CUDA, NVIDIA Sionna, and Differential Ray Tracing. Produced two first-authored IEEE publications. Led a cross-functional ROS project and mentored undergraduates.
I'm actively exploring roles at AI/ML-and-systems companies — research residencies, engineering teams building models, ML infrastructure. If any of that is your world, say hi.