Zyte

Zyte

Zyte is a leading provider of Full Stack Web Scraping API and World Class Data Extraction Services. With AI-powered web scraping platform, Zyte offers fast and reliable data extraction solutions for over 2,000 companies and 1 million developers worldwi...

Professional Services
251-1K
Founded 2010
$3M raised

Description

  • Lead and manage the Core & MLOps squad, including roadmap, prioritization, delivery, mentoring, and enforcement of high engineering standards.
  • Design, evolve, and own the core platform infrastructure (container orchestration, GPU scheduling/autoscaling, and distributed compute) that powers Zyte at scale.
  • Own and operate the model platform including model registry, experiment tracking, training orchestration, evaluation framework, serving infrastructure, and model monitoring.
  • Build and maintain the Golden Path: reference repositories, scaffold CLI, opinionated CI/CD pipelines, runtime contracts (health/metrics/tracing/SLOs), templates, and production-ready defaults.
  • Provide turnkey serving patterns (online and batch), drift/quality monitoring, rollback playbooks, and integrate managed AI capabilities with cost and data-governance guardrails.
  • Develop and maintain operators, sidecars, internal SDKs/libraries, and high-performance clients that enforce platform contracts and reliability defaults.
  • Partner with product engineering, Prod Ops, and Security to drive platform adoption, rollout plans, and cross-team integrations.
  • Run observability and billing pipelines (logging/metrics/tracing; metering/events/cost tracking) and lead efforts in supply-chain security (SBOM, image signing).
  • Champion SRE practices including SLIs/SLOs, incident management, reliability enablement, and cost governance across the platform.

Requirements

  • 5+ years of experience building distributed systems.
  • 3+ years of experience in MLOps/ML platform engineering or equivalent impact.
  • Strong knowledge of Linux/OS internals (process model, cgroups/namespaces), networking (TCP/IP, HTTP/2), concurrency, and performance profiling.
  • Deep understanding of Kubernetes (knowledge of Mesos is a bonus).
  • Proficiency developing high-performance services in Java, Rust, Go, or C++ with strong Python skills (experience with vert.x and Netty is a bonus).
  • Experience with GPU infrastructure including scheduling, containerization, and optimization.
  • Proven track record designing and operating model platforms in production (registry, training, serving, monitoring).
  • Demonstrated success leading technical teams and implementing organization-wide platform solutions.
  • Preferred: experience with streaming and workflow tools (Kafka, Argo, Temporal, Airflow), eBPF-based observability or perf tooling (io_uring), cost optimization for ML/AI, multi-tenant quotas/fairness, hands-on Golden Path authoring, and SRE practices (SLIs/SLOs, incident management).

Benefits

  • Fully remote work with freedom and flexibility to work from where you do your best work.
  • Be part of a self-motivated, progressive, multi-cultural, globally distributed team.
  • Opportunity to work with cutting-edge open-source technologies and tools.
  • Supportive environment that fosters new ideas and bringing them to market.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Machine Learning Engineer

airSlate 251-1K Professional Services

airSlate is seeking a Senior Machine Learning Engineer to develop and deploy ML and AI solutions that support high-impact marketing, SEO, and customer value initiatives at global scale.

AWS BERT Deep Learning Feature Engineering GPT LLM Machine Learning Python Reinforcement Learning SageMaker SEO
4 hours, 29 minutes ago

Senior Engineering Manager - Accelerated Compute Memory Systems

Pryon 51-250 Internet Software & Services

Pryon is seeking a Senior Engineering Manager to lead its Super Compute Memory team building cloud-native ingestion, retrieval, and inference infrastructure for large-scale AI memory workloads across commercial and federal deployments.

Apache Airflow AWS Azure C++ CloudFormation Datadog GCP Go Grafana Java Kafka Kubeflow Kubernetes Machine Learning NLP Prometheus Pulumi Python PyTorch RabbitMQ Rust TensorFlow Terraform
4 hours, 44 minutes ago

Principal AI Platform Engineer (US)

PointClickCare 1K-5K Health Care Providers & Services

PointClickCare is hiring a Principal AI Platform Engineer to build the infrastructure layer that powers its generative AI products and delivers AI-generated insights into agent workflows.

Generative AI Kubernetes MLflow OpenTelemetry
4 hours, 44 minutes ago

Senior Machine Learning Engineer

Spotify Media

Spotify’s Personalization team is hiring a Senior Machine Learning Engineer to help develop and improve recommendation systems that keep millions of listeners engaged across the main homepage and other personalized experiences.

Agile Apache Spark AWS GCP Java Machine Learning Python PyTorch Scala Scikit-learn Statistics TensorFlow
4 hours, 59 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers