Zyte

Zyte

Zyte is a leading provider of Full Stack Web Scraping API and World Class Data Extraction Services. With AI-powered web scraping platform, Zyte offers fast and reliable data extraction solutions for over 2,000 companies and 1 million developers worldwi...

Professional Services
251-1K
Founded 2010
$3M raised

Description

  • Lead and manage the Core & MLOps squad, including roadmap, prioritization, delivery, mentoring, and enforcement of high engineering standards.
  • Design, evolve, and own the core platform infrastructure (container orchestration, GPU scheduling/autoscaling, and distributed compute) that powers Zyte at scale.
  • Own and operate the model platform including model registry, experiment tracking, training orchestration, evaluation framework, serving infrastructure, and model monitoring.
  • Build and maintain the Golden Path: reference repositories, scaffold CLI, opinionated CI/CD pipelines, runtime contracts (health/metrics/tracing/SLOs), templates, and production-ready defaults.
  • Provide turnkey serving patterns (online and batch), drift/quality monitoring, rollback playbooks, and integrate managed AI capabilities with cost and data-governance guardrails.
  • Develop and maintain operators, sidecars, internal SDKs/libraries, and high-performance clients that enforce platform contracts and reliability defaults.
  • Partner with product engineering, Prod Ops, and Security to drive platform adoption, rollout plans, and cross-team integrations.
  • Run observability and billing pipelines (logging/metrics/tracing; metering/events/cost tracking) and lead efforts in supply-chain security (SBOM, image signing).
  • Champion SRE practices including SLIs/SLOs, incident management, reliability enablement, and cost governance across the platform.

Requirements

  • 5+ years of experience building distributed systems.
  • 3+ years of experience in MLOps/ML platform engineering or equivalent impact.
  • Strong knowledge of Linux/OS internals (process model, cgroups/namespaces), networking (TCP/IP, HTTP/2), concurrency, and performance profiling.
  • Deep understanding of Kubernetes (knowledge of Mesos is a bonus).
  • Proficiency developing high-performance services in Java, Rust, Go, or C++ with strong Python skills (experience with vert.x and Netty is a bonus).
  • Experience with GPU infrastructure including scheduling, containerization, and optimization.
  • Proven track record designing and operating model platforms in production (registry, training, serving, monitoring).
  • Demonstrated success leading technical teams and implementing organization-wide platform solutions.
  • Preferred: experience with streaming and workflow tools (Kafka, Argo, Temporal, Airflow), eBPF-based observability or perf tooling (io_uring), cost optimization for ML/AI, multi-tenant quotas/fairness, hands-on Golden Path authoring, and SRE practices (SLIs/SLOs, incident management).

Benefits

  • Fully remote work with freedom and flexibility to work from where you do your best work.
  • Be part of a self-motivated, progressive, multi-cultural, globally distributed team.
  • Opportunity to work with cutting-edge open-source technologies and tools.
  • Supportive environment that fosters new ideas and bringing them to market.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Machine Learning Engineer - Community Support Engineering

Airbnb 5K-10K Hotels, Restaurants & Leisure

Senior Machine Learning Engineer on Airbnb’s Community Support Products (CSP) Machine Learning team, responsible for building and deploying generative-AI-driven systems to transform and scale Airbnb’s customer support experience.

Generative AI Machine Learning
14 hours, 45 minutes ago

Staff/Principal Machine Learning Engineer (Modeling), Afterpay Risk

Block 10K-50K Capital Markets

Senior individual contributor on Afterpay's Fraud and Abuse team at Block, working remotely (US/Canada) to architect and build systems that prevent fraud and abuse across the lending lifecycle and strengthen the resilience of the lending ecosystem.

Apache Airflow Feature Engineering GitHub LightGBM Machine Learning MLflow NumPy Pandas Prefect Python PyTorch Scikit-learn Snowflake SQL XGBoost
1 month ago

Senior ServiceNow ITSM Architect with AI & ITAM exposure

Muller Internet Software & Services

ServiceNow AI & ITSM Solution Architect at Müller Solutions responsible for designing, leading, and delivering AI-enabled ServiceNow solutions across ITSM, FSM, and Asset Management to align platform capabilities with business processes, data foundations, and measurable operational outcomes.

Agile Generative AI JavaScript Machine Learning
1 month ago

AI/ML engineer

Remofirst 11-50 Professional Services

AI Engineer at a rapidly scaling, VC-backed US private company, responsible for building and deploying AI-driven product features, automations, and models to move concepts from proof-of-concept to production and accelerate company growth.

Computer Vision MLOps Neural Networks Python Rust
1 month ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers