Senior Site Reliability Engineer (FinOps) - Platform

13 hours, 1 minute ago
Full-time
Senior
DevOps and Infrastructure
Elastic

Elastic

Elastic is a leading platform for search-powered solutions, providing real-time insights and making data usable for developers and enterprises worldwide.

Internet Software & Services
1K-5K
Founded 2010

Description

  • Lead technical initiatives that automate system engineering and improve the reliability of Elastic’s global infrastructure.
  • Develop and maintain software, tooling, and automation to support platform growth and increasing scale demands.
  • Design, build, scale, and mature the multi-cloud platform used to host internal and external services.
  • Respond to major incidents and help prevent recurring customer impact through problem management.
  • Collaborate with engineers to identify, implement, and deliver solutions that improve platform reliability.
  • Promote operational excellence, collaboration, and an inclusive working environment across the team.
  • Participate in a follow-the-sun on-call rotation to support production systems.

Requirements

  • Background in software engineering with the ability to collaborate closely with other engineers on technical solutions.
  • Experience with public cloud and managed Kubernetes services is advantageous.
  • Experience working in distributed teams or remotely is desirable.
  • Experience operating a SaaS product in a public cloud, ideally using Infrastructure-as-Code tools such as Crossplane or Terraform.
  • Experience building or operating Kubernetes-at-scale infrastructure across multiple cloud providers.
  • Ability to write non-trivial programs in Golang or other programming languages.
  • Experience working with containerized services such as Docker.
  • Experience leading or improving alerting, incident management, and metrics systems such as Elastic Stack, Graphite, Prometheus, or Influx.
  • Professional Linux system administration experience on distributed systems at scale.
  • Experience with the Elastic Stack is preferred.
  • Experience coaching, mentoring, and uplifting other team members is preferred.

Benefits

  • Competitive pay based on the work you do, not your previous salary.
  • Health coverage for you and your family in many locations.
  • Flexible locations and schedules for many roles.
  • Generous vacation days each year.
  • Company match of up to $2,000 (or local currency equivalent) for financial donations and service.
  • Up to 40 hours each year for volunteer projects.
  • Minimum of 16 weeks of parental leave.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Database Reliability Engineer

Sporty Group 51-250 Media

Sporty is seeking a Database Reliability Engineer to own and improve its database infrastructure supporting multiple platforms and international expansion.

Ansible Argo CD Elasticsearch GitHub Actions Go Grafana Helm Jenkins Kubernetes MongoDB MySQL PostgreSQL Prometheus Python RabbitMQ Terraform
10 hours, 16 minutes ago

Senior Site Reliability Engineer

Moniepoint 1K-5K Diversified Financial Services

Moniepoint is hiring an experienced Site Reliability Engineer to improve the reliability, scalability, and observability of its highly distributed financial platform serving emerging markets.

AWS Azure Datadog GCP Go Java Kafka Kubernetes Microservices MySQL New Relic OpenTelemetry PostgreSQL Prometheus Python RabbitMQ Rust
11 hours, 1 minute ago

Senior Site Reliability Engineer, Identity Platform

Coinbase 1K-5K Capital Markets

Coinbase is hiring an experienced Site Reliability Engineer to build and scale identity and access management tooling for its IT Operations Corporate Engineering team supporting cloud-based, security-first systems.

Ansible AWS Azure C# CI/CD Docker GCP Go Java Kubernetes Python Ruby Secrets Management Terraform
11 hours, 31 minutes ago

Database Reliability Engineer - Core Team

ClickHouse 51-250 IT Services

ClickHouse is hiring a Site Reliability Engineering team member for ClickHouse Core to improve the reliability, availability, scalability, and performance of ClickHouse Cloud for customers worldwide.

AWS Azure C++ ClickHouse GCP Python SQL
12 hours, 1 minute ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers