a.danks@gmail.com
Open to hybrid/remote
H1B Transfer eligible

Andrew Danks

Platform & Data Infrastructure architect with 12+ years of experience scaling distributed systems, orchestration and streaming platforms, cloud infrastructure, and ML systems

Sr Staff Software Engineer Affirm

  • Tech Lead, Batch & Streaming Data Platforms — Led core data infrastructure teams serving ~1000 engineers. Owned strategy and cross-functional alignment for Kafka, Spark, Temporal, Airflow, Flink
    • Established company-wide paved paths, design constraints & guardrails for safe, scalable platform adoption.
    • Stepped into management calibration; reviewed promo packets and drove multiple Junior to Staff-level promotions
    • Reduced team on-call load by 50% via self-serve automation and leadership accountability reporting
    • Overhauled company-wide System Design interview template for consistent, higher-signal evaluation
  • Temporal Platform — Designed, secured VP + eng-lead alignment, and delivered production-grade Temporal platform to GA, powering stateful, durable Agentic/LLM harnesses and Capital pipelines
  • Kubernetes Migration — Designed & led cutover of 2000+ critical jobs to 17 EKS clusters, peak 500TiB memory. Reduced env provisioning (2 months to 1 week), standardizing DevEx, 50% faster deploys
  • Platform Reliability & Risk Mitigation
    • Architected usea1→usea2 multi-region deployment for financial pipelines for EC2 control plane redundancy
    • Built a data-quality platform that blocks pipeline regressions, safeguarding billions from financial discrepancies.
    • Designed a SLA tracking system to ensure platform and financial pipelines are meeting contractual SLAs
  • Platform Modernization & Cost Efficiency
    • Luigi and Celery → Temporal: Org alignment, designed adapter for zero-code migrations. Dramatically improving reliability, o11y, and developer velocity. Presented at Replay 2026
    • Kinesis → Kafka: Managed project and designed zero-data-loss Kafka consumer cutover. $3M/year savings
    • Automated detection of over-provisioned Spark workloads. $2M/year savings
    • Unified Spark platform on autoscaling EKS stack (deprecating EMR+mrjob) — improved testability, reduced toil

Senior Software Engineer Yelp

  • Search Suggest + ML — Led re-architecture of autocomplete ranking with new ML platform and xgboost classifier for contextual filters. Improved click-through rate while maintaining low latency.
  • Realtime ML model for Store Visits — Improved Flink app throughput by >1000 msgs/sec for online ML model classifying customer visits from location pings. Featured at Flink Forward 2019
  • Chain Detection — Built Spark + ML system to detect retail chains at scale. Presented at PyBay 2018

Previous Roles

  • Software Engineer Intern, Yelp (Jun 2013 – Aug 2013, San Francisco)
  • Software Engineer Intern, Marin Software (May 2012 – Aug 2013, San Francisco)
  • Research Assistant, Computational Linguistics Group (Sep 2013 – Jun 2014, Univ. of Toronto)
Python, Java, Kotlin, Kubernetes, Spark, Flink, Temporal, Kafka, Terraform, Airflow, AWS (EKS, ECS, Aurora), Machine Learning, ElasticSearch
Computer Science & Mathematics, B.Sc, University of Toronto