Senior Site Reliability Engineer, AI Research

2 months ago
Full-time
Senior
DevOps and Infrastructure
Algolia

Algolia

Algolia provides a hosted search platform that leverages AI to enhance user experience and developer engagement, enabling enterprises and developers to deliver fast, relevant search results across websites and mobile applications.

Internet Software & Services
251-1K
Founded 2012
$334M raised

Description

  • Support and evolve the reliability of platforms used by the AI Research team.
  • Ensure production services meet expectations for availability, latency, and operational readiness.
  • Design infrastructure and operational patterns that balance iteration speed with production safeguards.
  • Work closely with researchers and engineers as an advisor on infrastructure, reliability, and operations.
  • Participate in team planning and execution from early exploration through production rollout.
  • Help researchers self-serve infrastructure safely and effectively.
  • Build and maintain Kubernetes-based services on Google Cloud Platform using infrastructure-as-code and GitOps.
  • Own and improve CI/CD pipelines for Go-based services and some Python-based services.
  • Design and operate observability systems, including tools such as Datadog.
  • Participate in a light on-call rotation and respond to incidents while improving systems over time.

Requirements

  • Strong experience operating cloud-first infrastructure.
  • Hands-on experience running production services on Kubernetes.
  • Proficiency with infrastructure-as-code, especially Terraform, and CI/CD systems.
  • Experience supporting production services written in Go; Python experience is a plus.
  • Solid grounding in service reliability, incident response, and operational best practices.
  • Comfort working in ambiguous environments where problems are not always well defined.
  • Experience supporting mission-critical internal platforms is preferred.
  • Exposure to research or experimentation-heavy environments is preferred.
  • Familiarity working alongside researchers or highly specialized domain experts is preferred.
  • AI, ML, or deep learning experience is not required.
  • Model training, tuning, or ML framework expertise such as PyTorch or JAX is not required.

Benefits

  • Remote-friendly work culture with flexibility to work remotely or in a hybrid model.
  • Australia-based role with occasional off-hours collaboration as needed.
  • High-impact work that directly enables new AI-powered capabilities for customers.
  • High agency to help shape what gets built and how it is built.
  • Opportunity to collaborate with experienced SREs, engineers, and PhD researchers.
  • Growth in research-adjacent infrastructure and platform reliability expertise.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Platform Database Engineer (MONGO DB)

Valtech 5K-10K Professional Services

Platform Database Engineer at a US-remote enterprise role focused on designing, operating, and optimizing MongoDB platforms across cloud-based mission-critical data environments.

AWS Bash CI/CD EC2 GitOps Kafka Kubernetes Linux MongoDB Prometheus Python Terraform
11 hours, 58 minutes ago

Senior Software Engineer - Grafana Databases, Managed Services | Germany | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Senior Software Engineer for its Managed Services team to run and improve the production infrastructure behind Grafana Cloud’s next-generation database products.

AWS Azure Cassandra ClickHouse GCP Go Grafana Helm Kafka Kubernetes Linux Microservices PostgreSQL Snowflake Terraform
12 hours, 13 minutes ago

Staff Reliability Engineer

Anduril Industries 1K-5K Aerospace & Defense

Anduril Industries is hiring Reliability Engineers to support autonomous defense systems across the full product lifecycle, from early design through production and fielded operations.

12 hours, 28 minutes ago

Senior Reliability Engineer

Anduril Industries 1K-5K Aerospace & Defense

Anduril Industries is hiring a Reliability Engineer to support autonomous defense systems across the full product lifecycle, from concept and design through production and fielded operations.

12 hours, 43 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers