Senior Site Reliability Engineer, AI Research

3 hours, 1 minute ago
Full-time
Senior
DevOps and Infrastructure
Algolia

Algolia

Algolia provides a hosted search platform that leverages AI to enhance user experience and developer engagement, enabling enterprises and developers to deliver fast, relevant search results across websites and mobile applications.

Internet Software & Services
251-1K
Founded 2012
$334M raised

Description

  • Support and evolve the reliability of platforms used by the AI Research team.
  • Ensure production services meet expectations for availability, latency, and operational readiness.
  • Design infrastructure and operational patterns that balance iteration speed with production safeguards.
  • Work closely with researchers and engineers as an advisor on infrastructure, reliability, and operations.
  • Participate in team planning and execution from early exploration through production rollout.
  • Help researchers self-serve infrastructure safely and effectively.
  • Build and maintain Kubernetes-based services on Google Cloud Platform using infrastructure-as-code and GitOps.
  • Own and improve CI/CD pipelines for Go-based services and some Python-based services.
  • Design and operate observability systems, including tools such as Datadog.
  • Participate in a light on-call rotation and respond to incidents while improving systems over time.

Requirements

  • Strong experience operating cloud-first infrastructure.
  • Hands-on experience running production services on Kubernetes.
  • Proficiency with infrastructure-as-code, especially Terraform, and CI/CD systems.
  • Experience supporting production services written in Go; Python experience is a plus.
  • Solid grounding in service reliability, incident response, and operational best practices.
  • Comfort working in ambiguous environments where problems are not always well defined.
  • Experience supporting mission-critical internal platforms is preferred.
  • Exposure to research or experimentation-heavy environments is preferred.
  • Familiarity working alongside researchers or highly specialized domain experts is preferred.
  • AI, ML, or deep learning experience is not required.
  • Model training, tuning, or ML framework expertise such as PyTorch or JAX is not required.

Benefits

  • Remote-friendly work culture with flexibility to work remotely or in a hybrid model.
  • Australia-based role with occasional off-hours collaboration as needed.
  • High-impact work that directly enables new AI-powered capabilities for customers.
  • High agency to help shape what gets built and how it is built.
  • Opportunity to collaborate with experienced SREs, engineers, and PhD researchers.
  • Growth in research-adjacent infrastructure and platform reliability expertise.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Applications Support Specialist

Ensono 1K-5K IT Services

Application Reliability Lead at an enterprise in a regulated environment, responsible for restoring service during incidents and improving the resilience, stability, and operational readiness of critical applications.

Grafana Java .NET PowerShell Prometheus Python Splunk SQL
16 minutes ago

Remote in Brazil - Senior DevOps & Cloud/SRE

Stack Builders 51-250 Internet Software & Services

Stack Builders is hiring a Senior DevOps & Cloud/SRE Engineer to design and optimize secure, scalable infrastructure for client projects across the U.S., U.K., and Australia.

Ansible AWS Azure Bash CI/CD CircleCI CloudFormation Docker EC2 GCP GitHub Actions GitLab CI GitOps Go Jenkins Kubernetes Linux MongoDB MySQL PostgreSQL Pulumi Python Redis Secrets Management Terraform
21 minutes ago

Reliability Engineer, Energy Storage

Redwood Materials 251-1K Industrial Conglomerates

Redwood Materials is hiring a Reliability Engineer, Energy Storage to help define and validate the reliability of new hardware products for its battery and energy storage systems.

Python SEM
1 hour, 1 minute ago

Senior Database Reliability Engineer

Rithum Internet Software & Services

Rithum is seeking a Senior Database Reliability Engineer to manage and improve the reliability, availability, and observability of its large-scale hybrid database environment supporting e-commerce operations.

AWS CI/CD DynamoDB Elasticsearch MongoDB MySQL PostgreSQL PowerShell Python Redis SQL Server
1 hour, 1 minute ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers