Elastic

Elastic is a leading platform for search-powered solutions, providing real-time insights and making data usable for developers and enterprises worldwide.

Internet Software & Services

Information Technology

1K-5K (3335)

Founded 2010

72 open positions

Links

View All Jobs

Site Reliability Engineer (Hosted Infra) - Platform

10 hours, 58 minutes ago

United States

Full-time

Senior

Site Reliability Engineer (SRE)

DevOps and Infrastructure

Ansible Argo CD Docker Go Kubernetes Linux Prometheus Puppet Terraform Ubuntu

Apply Now

Elastic

Elastic is a leading platform for search-powered solutions, providing real-time insights and making data usable for developers and enterprises worldwide.

Internet Software & Services

1K-5K

Founded 2010

View All Jobs 72

Description

Engineer software and internal tools to automate large-scale systems and reduce operational toil.
Optimize host reliability and lifecycle management across multiple cloud providers.
Build alerting and monitoring systems that improve incident prevention and observability.
Scale global infrastructure and evolve infrastructure management processes to support growing demand.
Participate in code reviews, planning, knowledge sharing, and team mentoring.
Take part in a balanced SRE on-call rotation, including incident response, runbooks, postmortems, and reliability improvements.
Contribute documentation such as software designs, architecture decisions, runbooks, and postmortems.
Communicate project status clearly, surface blockers early, and follow through on action items.

Requirements

Experience building software with Golang.
Experience reviewing code and giving constructive feedback.
Production experience operating large-scale cloud compute environments with hundreds of hosts or more through automated workflows.
Deep experience with Linux systems and OS-level debugging in the terminal.
Experience running containerized workloads in production.
A customer-first, systems-thinking approach focused on root causes rather than symptoms.
Comfort working across time zones in both real-time and asynchronous collaboration.
Ability to create clear, maintainable documentation such as designs, runbooks, architecture diagrams, and postmortems.
A sensible approach to using AI tools to reduce operational burden without adding unnecessary complexity.
Preferred: production experience with Terraform, Puppet, Ansible, Argo CD, Argo Workflows, CUE, Docker, Kubernetes, Ubuntu, or Ubuntu Live Patch.
Preferred: on-call experience during incidents using observability tools such as Elastic Stack, Graphite, Prometheus, or Influx.
Preferred: hands-on experience engineering solutions with the Elastic Stack.

Benefits

Base salary with a typical starting range of $143,100 to $175,000 USD.
Eligibility to participate in Elastic's stock program.
Company-matched 401(k) with dollar-for-dollar matching up to 6% of eligible earnings.
Health coverage for you and your family in many locations.
Flexible locations and schedules for many roles.
Generous vacation days each year.
Up to 40 hours each year for volunteer projects.
Minimum of 16 weeks of parental leave.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

SRE Lead at a top European iGaming solution provider, responsible for building and maintaining the observability cloud infrastructure and platform while improving deployment processes and system reliability.

Ukraine Full-time Lead Site Reliability Engineer (SRE)

Argo CD AWS Azure Bash CI/CD Confluence Debian Docker EC2 Elasticsearch Fluentd GCP Git GitLab Grafana Helm Jenkins JIRA Kibana Kubernetes OpsGenie Prometheus Python

1 hour, 8 minutes ago

Apply

1 hour, 8 minutes ago

Senior Site Reliability Engineer (SRE)

The Investigo Group Professional Services

The Investigo Group is hiring a Senior Site Reliability Engineer to operate and mature its production Kubernetes and OpenShift platforms across secure on-premises and hybrid environments.

United Kingdom Full-time Senior Site Reliability Engineer (SRE)

Ansible Argo CD CI/CD Flux GitHub Actions GitOps Go Grafana Helm Juniper Kubernetes Linux Load Balancing Machine Learning OpenID Connect OpenShift OpenTelemetry Palo Alto Prometheus Python SAML Shell Scripting Terraform

7 hours, 21 minutes ago

Apply

7 hours, 21 minutes ago

Senior DevOps Engineer - Cloud Operations

Black Duck Inn 1K-5K Internet Software & Services

Black Duck Software is hiring a Sr. DevOps Engineer, Cloud Operations to own and operate global customer-facing SaaS and hosted infrastructure on Google Cloud Platform for enterprise applications.

United States Full-time Lead DevOps Engineer Site Reliability Engineer (SRE)

$136k-$168k

Argo CD Bash CI/CD DevSecOps DNS GCP GitHub Actions GitOps Go HashiCorp Vault Helm Java Kubernetes Load Balancing Microservices Python Terraform TLS

8 hours, 46 minutes ago

Apply

8 hours, 46 minutes ago

Senior AIOps Engineer, Incident Response [Remote-US]

Quanata 201-500 information technology & services

Quanata is hiring an experienced production operations and reliability leader to oversee production health, incident response, and operational support for its AI-driven insurance technology platform.

United States Full-time Senior AI Engineer Site Reliability Engineer (SRE)

$215k-$280k

AWS Confluence JIRA

18 hours, 22 minutes ago

Apply

18 hours, 22 minutes ago

Elastic

Tags

Links

Site Reliability Engineer (Hosted Infra) - Platform

Elastic

Description

Requirements

Benefits

Similar Roles

SRE Lead

Senior Site Reliability Engineer (SRE)

Senior DevOps Engineer - Cloud Operations

Senior AIOps Engineer, Incident Response [Remote-US]

You're on a roll! Sign up now to keep applying.