Zyte

Zyte

Zyte is a leading provider of Full Stack Web Scraping API and World Class Data Extraction Services. With AI-powered web scraping platform, Zyte offers fast and reliable data extraction solutions for over 2,000 companies and 1 million developers worldwi...

Professional Services
251-1K
Founded 2010
$3M raised

Description

  • Design and evolve the core platform infrastructure including container orchestration, GPU scheduling/autoscaling, and distributed compute.
  • Own the model platform end-to-end: registry, experiment tracking, training orchestration, evaluation, serving, and monitoring.
  • Build and maintain the Golden Path: reference repositories, scaffold CLI, opinionated CI/CD pipelines, runtime contracts, and production-ready defaults.
  • Operate a secure, multi-tenant model registry and training platform with standardized experiment and evaluation harnesses.
  • Provide turnkey serving patterns (online and batch), drift and quality monitoring, and rollback/playbook procedures.
  • Integrate public and open-source AI capabilities as managed platform services with cost and data-governance guardrails.
  • Run the squad: set roadmap and priorities, drive delivery, mentor team members, and uphold high engineering standards.
  • Partner with product engineering, Prod Ops, and Security on platform adoption, rollout plans, observability, billing/metering, and supply-chain security.

Requirements

  • 5+ years experience building distributed systems.
  • 3+ years in MLOps or ML platform engineering (or equivalent impact).
  • Deep understanding of Kubernetes (Mesos experience is a bonus).
  • Strong knowledge of Linux/OS internals (process model, cgroups/namespaces), networking (TCP/IP, HTTP/2), concurrency, and performance profiling.
  • Proficiency developing high-performance services in Java, Rust, Go, or C++ (bonus: familiarity with vert.x and Netty); strong Python skills.
  • Experience with GPU infrastructure including scheduling, containerization, and optimization.
  • Proven track record designing and operating model platforms in production (registry, training, serving, monitoring).
  • Demonstrated success leading technical teams and implementing organization-wide platform solutions.
  • Preferred: experience with streaming and workflows (Kafka, Argo, Temporal, Airflow or equivalents), cost optimization for ML/AI, and multi-tenant quotas/fairness.
  • Preferred: hands-on experience with SRE practices (SLIs/SLOs, incident management), eBPF-based observability, perf tooling, or io_uring.

Benefits

  • Completely remote, flexible work environment — work from where you do your best.
  • Join a progressive, multi-cultural, self-motivated global team.
  • Opportunity to work with cutting-edge open-source technologies and tools.
  • Support for fostering and bringing new ideas to market.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Machine Learning Engineer

airSlate 251-1K Professional Services

airSlate is seeking a Senior Machine Learning Engineer to develop and deploy ML and AI solutions that support high-impact marketing, SEO, and customer value initiatives at global scale.

AWS BERT Deep Learning Feature Engineering GPT LLM Machine Learning Python Reinforcement Learning SageMaker SEO
4 hours, 28 minutes ago

Senior Engineering Manager - Accelerated Compute Memory Systems

Pryon 51-250 Internet Software & Services

Pryon is seeking a Senior Engineering Manager to lead its Super Compute Memory team building cloud-native ingestion, retrieval, and inference infrastructure for large-scale AI memory workloads across commercial and federal deployments.

Apache Airflow AWS Azure C++ CloudFormation Datadog GCP Go Grafana Java Kafka Kubeflow Kubernetes Machine Learning NLP Prometheus Pulumi Python PyTorch RabbitMQ Rust TensorFlow Terraform
4 hours, 43 minutes ago

Principal AI Platform Engineer (US)

PointClickCare 1K-5K Health Care Providers & Services

PointClickCare is hiring a Principal AI Platform Engineer to build the infrastructure layer that powers its generative AI products and delivers AI-generated insights into agent workflows.

Generative AI Kubernetes MLflow OpenTelemetry
4 hours, 43 minutes ago

Senior Machine Learning Engineer

Spotify Media

Spotify’s Personalization team is hiring a Senior Machine Learning Engineer to help develop and improve recommendation systems that keep millions of listeners engaged across the main homepage and other personalized experiences.

Agile Apache Spark AWS GCP Java Machine Learning Python PyTorch Scala Scikit-learn Statistics TensorFlow
4 hours, 58 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers