RUNWARE

RUNWARE

RUNWARE provides an affordable API that enables AI developers to efficiently run image, video, and custom generative AI models without the need for extensive infrastructure or machine learning expertise.

Internet Software & Services
1-10
Founded 2023

Description

  • Build and scale infrastructure for real-time AI inference across GPU fleets, bare-metal servers, and containerized production systems.
  • Help evolve the platform toward more elastic, on-demand infrastructure that can respond to customer traffic and model demand.
  • Improve the performance, reliability, and resilience of request entrypoints, inference services, queues, storage, load balancers, and networking.
  • Automate infrastructure operations, including provisioning, configuration, CI/CD, deployment safety, progressive rollouts, and rapid rollback.
  • Build and maintain the observability stack needed to detect issues early, understand capacity, and resolve problems before they affect customers.
  • Lead production operations, incident response, debugging, and post-incident improvements.
  • Strengthen infrastructure security and compliance through patching, secrets management, access controls, hardening, auditability, and documentation.

Requirements

  • Strong experience as a DevOps Engineer, SRE, Infrastructure Engineer, Platform Engineer, or in a similar role running production systems at scale.
  • Deep Linux knowledge and confidence debugging real production issues across networking, storage, performance, services, and system behavior.
  • Hands-on experience building automation, Infrastructure-as-Code, CI/CD pipelines, and deployment workflows.
  • Experience operating high-availability, low-latency, or high-throughput platforms where reliability and performance directly affect customers.
  • Strong networking fundamentals across TCP/IP, DNS, load balancing, routing, firewalls, proxies, TLS, and HTTP.
  • A calm and pragmatic approach under pressure, with strong communication, good judgment, and a bias toward automation over manual toil.
  • Experience operating GPU infrastructure for AI/ML inference, including NVIDIA drivers, CUDA, container runtimes, GPU monitoring, capacity planning, and workload isolation (bonus).
  • Familiarity with inference serving and optimization frameworks such as vLLM, TensorRT, Triton, or similar (bonus).

Benefits

  • Remote-first work environment with the option to work from home anywhere they can employ you.
  • Flexible hours outside core collaboration blocks.
  • Generous paid time off, including vacation, sick days, and public holidays.
  • Meaningful stock options.
  • Paid family leave, including maternity, paternity, and caregiver time.
  • Twice-yearly company retreats in inspiring locations.
  • Built-in downtime after major release cycles to unplug and recharge.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Deployment Engineer

Armada 201-500 information technology & services

Armada is hiring a Deployment Engineer to execute field deployment, commissioning, startup, troubleshooting, and operational readiness for modular data center infrastructure across North America and global customer sites.

16 hours, 46 minutes ago

Senior DevOps Engineer (V)

Blue Coding 51-250 Internet Software & Services

Blue Coding is hiring a Senior DevOps Engineer to help modernize and operate a cloud-native AWS platform for a remote LATAM-based client team, with a focus on migrating legacy Windows infrastructure and improving delivery and reliability practices.

Agile AWS Bitbucket CI/CD Confluence Docker JIRA .NET PowerShell Serverless SonarQube Terraform Windows Server
17 hours, 31 minutes ago

DevOps Engineer II

Learneo 51-250 Diversified Consumer Services

Learneo’s Quillbot is hiring a DevOps Engineer II to support its Platform SRE team in building and operating multiregion, multicloud infrastructure for AI, data, and distributed systems in India.

Ansible Argo CD Bash CI/CD CloudFormation GCP GitLab CI GitOps Go Grafana Kubernetes Linux LLM Prometheus Python Terraform Unix
17 hours, 31 minutes ago

DevOps Engineer

Dijital Team 11-50 Internet Software & Services

Colombo IT is hiring a remote DevOps Engineer to improve automation, systems integration, and operational efficiency across a technology services environment serving clients and internal teams.

Azure Bash C++ Docker Generative AI GitHub GitLab Kubernetes PowerShell Python Shell Scripting SQL Terraform
17 hours, 46 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers