Building intelligent systems at scale
Senior AI/ML Engineer with +9 years of experience building agentic AI systems and large-scale ML infrastructure. Currently leading the Agent Composer platform at Contextual AI, transforming RAG pipelines into modular multi-agent systems used in production. Co-author of 14 papers across NeurIPS, ICML, ACL and Nature.
+9 yrs
Engineering Experience
200K+
Monthly Requests Handled
14
Research Publications
99.98%
System Reliability
1const juan = {2role: "Technical Lead / Staff AI Engineer",3stack: [4"Python", "React", "Next.js", "TypeScript", "SQL"5"Temporal", "GCP", "AWS", "LLMs", "Docker", "Redis"6],7publications: [8"NeurIPS (Best Paper)", "ICML", "ACL", "Nature"9],10reliability: 99.98,11monthlyRequests: 200_000,12ships: true13}
Career
Professional Experience
+9 years of experience building production ML systems, from early-stage startups to industry-defining platforms.
Leading the design and development of modular, graph-based agent frameworks for scalable AI composition.
Key Achievements
- Designed modular, graph-based agent framework using DDD, enabling scalable composition across research and sales teams
- Operated production agentic systems at scale: 200K+ monthly requests, 99.98% reliability with strict SLAs
- Migrated orchestration to Temporal, improving evaluation success from 60% to 98% and 10x throughput
- Built core platform modules (evaluation, query, feedback), accelerating internal developer velocity
Tech lead for Dynabench, the collaborative AI evaluation platform used by industry leaders.
Built end-to-end analytics platforms and ML solutions for manufacturing optimization.
Delivered predictive analytics and computer vision solutions for construction management.
Depth by Domain · Toolkit
Technical Expertise & Skills
Agentic AI Platforms
At Contextual AI, I lead the Agent Composer team -- designing modular, graph-based agent frameworks that convert RAG pipelines into scalable, observable distributed systems.
Domain-Driven Design with composable agent workflows enabling scalable composition across research, sales, and applied teams.
Temporal-based orchestration improving evaluation success from 60% to 98% with 10x throughput scaling.
Distributed logging, tracing, and metrics reducing MTTR and enabling data-driven reliability engineering.
Programming
Distributed Systems & Infra
AI / ML
Observability
Research
Selected Publications
Best Paper Award (Co-author)
Recognized for outstanding contribution to the field of machine learning evaluation and data-centric AI systems.
Nature Publication
Forthcoming publication in Nature on AI systems and their applications in scientific research.
Adversarial Nibbler: A Data-Centric Challenge
Advancing fairness-aware and data-centric evaluation practices for modern machine learning systems.
BabyLM Challenge Proceedings
Sample-efficient language modeling and robust evaluation methodologies for language model training.
Data-Centric AI Workshop
LSH methods for data deduplication in a Wikipedia artificial dataset
14 Publications Total
Spanning evaluation, NLP, computer vision, and AI systems across top-tier venues.