Staff Software Engineer - Inference & Performance

1 week ago
Full-time
Lead
Software Development
RUNWARE

RUNWARE

RUNWARE provides an affordable API that enables AI developers to efficiently run image, video, and custom generative AI models without the need for extensive infrastructure or machine learning expertise.

Internet Software & Services
1-10
Founded 2023

Description

  • Own end-to-end inference performance across the platform, with responsibility for latency, throughput, and reliability targets.
  • Lead the architecture and design of core inference systems, including request routing, async execution, queuing, GPU scheduling, and result delivery.
  • Drive the platform toward sub-1 second inference where feasible by identifying bottlenecks across networking, services, storage, and GPU execution.
  • Make high-impact architectural decisions with performance, scalability, and operational simplicity as primary concerns.
  • Partner with ML and model teams to ensure models are production-ready from a performance perspective, including cold starts, batching, memory usage, and concurrency.
  • Define performance budgets, SLAs, and success metrics, and ensure they are measured, visible, and continuously improved.
  • Lead deep-dive investigations into latency spikes, throughput degradation, and other system-level performance issues.
  • Influence and mentor engineers across teams on performance engineering, distributed systems thinking, and operational excellence.
  • Improve tooling, observability, and profiling capabilities to make performance issues easier to detect and reason about.
  • Advocate for pragmatic engineering best practices around testing, benchmarking, rollouts, and documentation.

Requirements

  • Excellent experience in software engineering with a strong focus on backend and systems development in PHP, Python, Go, Rust, or similar languages.
  • Proven experience building and operating high-performance, low-latency distributed systems in production.
  • Deep understanding of asynchronous processing, queues, concurrency models, and back pressure.
  • Strong intuition for performance trade-offs across CPU, GPU, networking, storage, and application layers.
  • Experience making and defending critical architectural decisions in complex systems.
  • Hands-on experience troubleshooting real production issues under load, including latency, saturation, and cascading failures.
  • Familiarity with modern cloud infrastructure, CI/CD, and observability stacks, including metrics, tracing, and profiling.
  • Ability to communicate clearly and influence across teams in a remote-first environment.
  • Strong mentorship mindset and a desire to raise the technical bar across the organisation.
  • Experience working on AI/ML inference platforms, GPU-backed workloads, or performance-critical compute systems (nice to have).
  • Knowledge of model optimisation techniques such as batching, quantisation, warm-starts, and memory management (nice to have).
  • Experience with infrastructure-as-code and DevOps practices (nice to have).
  • Background in startups or fast-paced environments where speed, ownership, and pragmatism matter (nice to have).
  • Prior ownership of latency or throughput SLOs at scale (nice to have).
  • Must have existing right to work in the UK; visa sponsorship is not available in the UK at this time.

Benefits

  • Remote-first setup with the ability to work from home anywhere the company can employ you.
  • Flexible hours outside core collaboration blocks.
  • Generous paid time off, including vacation, sick days, and public holidays.
  • Meaningful stock options to share in the upside you create.
  • Paid family leave, including maternity, paternity, and caregiver time.
  • Twice-yearly company retreats in inspiring locations.
  • Core hours for collaborative work, with the rest of your schedule under your control.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Java Engineer - Distributed Systems - Elasticsearch

Elastic 1K-5K Internet Software & Services

Elastic is hiring a Senior Software Engineer for the Elasticsearch Distributed Systems team to improve cluster-scale indexing, coordination, and resilience across a highly distributed search platform.

Elasticsearch Java Lucene
14 hours, 19 minutes ago

Senior Java Engineer - Distributed Systems - Elasticsearch

Elastic 1K-5K Internet Software & Services

Elastic is hiring a Senior Software Engineer for its Elasticsearch Distributed Systems team to improve the scale, performance, and resilience of clustered search infrastructure.

Elasticsearch Java Lucene
14 hours, 19 minutes ago

Senior Java Engineer - Distributed Systems - Elasticsearch

Elastic 1K-5K Internet Software & Services

Elastic is hiring a Senior Software Engineer for its Elasticsearch Distributed Systems team to help improve the scale, performance, and resilience of the cluster systems that handle indexing, allocation, replication, and node coordination.

Elasticsearch Java Lucene
14 hours, 19 minutes ago

Senior Software Engineer - Fullstack (Backend Focused)

New Relic 1K-5K Internet Software & Services

New Relic is hiring a backend engineer to help build a new observability experience and next-generation platform services for distributed systems in an AI-first environment.

Agile CI/CD Docker Git GraphQL Java Kafka Kubernetes Microservices React REST API Spring Boot TypeScript
14 hours, 19 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers