Zyte

Zyte

Zyte is a leading provider of Full Stack Web Scraping API and World Class Data Extraction Services. With AI-powered web scraping platform, Zyte offers fast and reliable data extraction solutions for over 2,000 companies and 1 million developers worldwi...

Professional Services
251-1K
Founded 2010
$3M raised

Description

  • Design and evolve the core platform infrastructure (container orchestration, GPU scheduling/autoscaling, and distributed compute).
  • Own the model platform including registry, experiment tracking, training orchestration, evaluation, serving, and monitoring.
  • Build and maintain the Golden Path: reference repositories, scaffold CLI, opinionated CI/CD pipelines, runtime contracts (health/metrics/tracing/SLOs), and production-ready defaults.
  • Operate a secure, multi-tenant model registry and training platform with standardized experiment/evaluation harnesses.
  • Provide turnkey serving patterns for online and batch inference, including drift/quality monitoring and rollback playbooks.
  • Integrate public and open-source AI capabilities as managed platform services with cost and data-governance guardrails.
  • Run the squad: set roadmap and priorities, drive delivery, mentor engineers, and uphold high engineering standards and platform-thinking.
  • Partner with product engineering, Prod Ops, and Security on adoption, rollout plans, observability, billing/cost tracking, and supply-chain security.

Requirements

  • 5+ years experience building distributed systems and 3+ years in MLOps/ML platform engineering (or equivalent impact).
  • Proven track record designing and operating model platforms in production (registry, training, serving, monitoring).
  • Deep understanding of Linux/OS internals (process model, cgroups/namespaces), networking (TCP/IP, HTTP/2), concurrency, and performance profiling.
  • Strong knowledge of Kubernetes (Mesos experience a bonus) and GPU infrastructure provisioning, scheduling, containerization, and optimization.
  • Proficiency developing high-performance services in Java, Rust, Go, or C++ (bonus: vert.x/Netty); strong Python skills.
  • Demonstrated success leading technical teams and implementing organization-wide platform solutions.
  • Experience with observability and reliability practices (logging/metrics/tracing pipelines, SLIs/SLOs, incident management) and cost governance for ML/AI workloads.
  • Familiarity with streaming and workflow tools (Kafka, Argo, Temporal, Airflow) and experience with multi-tenant quotas/fairness.
  • Hands-on experience authoring Golden Paths (service templates, CI/CD blueprints, CLI scaffolds) and supply-chain security practices (SBOM, image signing).
  • Preferred experience with eBPF-based observability, perf tooling or io_uring, and cost-optimization strategies for ML workloads.

Benefits

  • Completely remote company with the freedom and flexibility to work from where you do your best work.
  • Be part of a self-motivated, progressive, multi-cultural team that fosters and nourishes new ideas.
  • Opportunity to work with cutting-edge open-source technologies and tools.
  • Support for bringing new ideas to market and a culture focused on innovation.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Python Developer: Databricks AI Platform, Alerting & Monitoring

Xenon7 Internet Software & Services

Senior Python Developer at Xenon7 responsible for building automated, scalable Databricks environments for AI/ML workloads and engineering a Python-based AWS monitoring and alerting ecosystem to ensure platform reliability at scale.

AWS Databricks Docker JIRA MLflow MLOps Python
14 hours, 18 minutes ago

Engineer - HPC Platform

Xenon7 Internet Software & Services

HPC Platform Engineer at a global healthcare leader supporting the Cardiometabolic Research therapeutic area by enabling, operating, and evolving scalable high-performance computing platforms to accelerate scientific and analytical workloads.

Agile Ansible Bash Chef Docker GitHub Kubernetes Python Terraform
14 hours, 33 minutes ago

HE - Azure Platform Engineer - 233

Thaloz 51-250 Internet Software & Services

Senior Azure Platform Engineer to lead design, deployment, and operation of production workloads on Azure Kubernetes Service (AKS), enabling a secure, scalable Platform-as-a-Service and accelerating time-to-market through repeatable AKS bootstrapping, CI/CD enablement, and platform automation.

Agile Azure CI/CD Docker Envoy Flux Git GitHub GitOps Helm Kanban Kubernetes Linux Microservices MongoDB PostgreSQL Prometheus REST API Scrum Shell Scripting SonarQube Terraform TLS YAML
15 hours, 48 minutes ago

Forward Deployed Engineer

Nice Côte d'Azur Hotels, Restaurants & Leisure

NiCE is hiring an individual-contributor AI engineer to architect, deliver, and own production-ready conversational/agentic AI systems that drive enterprise customer outcomes and company growth across industries.

Go React TypeScript
16 hours, 18 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers