Elastic

Elastic

Elastic is a leading platform for search-powered solutions, providing real-time insights and making data usable for developers and enterprises worldwide.

Internet Software & Services
1K-5K
Founded 2010

Description

  • Lead technical initiatives to automate network engineering efforts to guarantee the reliability of the global Elastic infrastructure.
  • Design, develop, and maintain software, tooling, and automations to grow and scale the global platform across multiple cloud providers.
  • Participate in coding, technical design, and solution delivery to improve platform resilience, security, and operational quality.
  • Operate and respond to major incidents, perform prioritized problem management to prevent repeated customer impact, and participate in a follow-the-sun on-call rotation.
  • Improve and own alerting, monitoring, and incident management processes and metrics to diagnose issues and quantify impacts for stakeholders.
  • Collaborate inclusively with cross-functional teams, uplift others through coaching and mentoring, and strengthen partner and team relationships in a distributed environment.
  • Drive operational excellence by prioritizing bug fixes, features, and reliability improvements for production systems.

Requirements

  • Experience operating a SaaS product in a public cloud and working with Infrastructure-as-Code tooling such as Crossplane or Terraform.
  • Experience operating hardware or software routers and practical experience with BGP configuration.
  • Competency in system and network administration, including professional Linux experience on distributed systems at scale and familiarity with Kubernetes Cilium networking.
  • Background in software engineering and experience collaborating with engineers to identify, implement, and deliver solutions.
  • Experience with public cloud and managed Kubernetes services (advantageous).
  • Familiarity with containerized services (e.g., Docker) and building/operating Kubernetes-at-scale infrastructures across multiple clouds (preferred).
  • Proven experience leading or improving alerting, major incident management, and observability/metrics systems (e.g., Elastic Stack, Graphite, Prometheus, Influx).
  • Experience diagnosing, designing, or creating solutions using the Elastic Stack (preferred).
  • Demonstrated success working in distributed or remote teams and a customer-first SRE mindset focused on progress over perfection.
  • Ability to coach, mentor, and strengthen team members in a globally distributed, self-organizing environment.

Benefits

  • Typical starting base salary range: $130,900 — $165,500 USD.
  • Eligibility to participate in Elastic’s stock program (equity).
  • Company-matched 401(k) with dollar-for-dollar matching up to 6% of eligible earnings.
  • Health coverage for you and your family in many locations.
  • Flexible locations and schedules with the ability to craft your calendar.
  • Generous vacation days and paid time off policies.
  • Company matches up to $2,000 (or local equivalent) for charitable donations and up to 40 hours per year for volunteer projects.
  • Minimum of 16 weeks parental leave.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Infrastructure Engineer - Postgres

ClickHouse 51-250 IT Services

Senior SRE / Senior Infrastructure Engineer at ClickHouse responsible for owning reliability, automation, and operations for the company’s Postgres integration across AWS, GCP, and Azure to ensure scalable, secure, and dependable cloud data platform services.

AWS Azure CI/CD ClickHouse Docker GCP Go Grafana Kubernetes OpenTelemetry PostgreSQL Prometheus Terraform
1 month ago

Senior Field Engineer | UK | Remote

Grafana 1K-5K IT Services

Senior Field Engineering Infrastructure role at Grafana Labs responsible for maintaining and developing the pre-sales demo kit and backend infrastructure, creating technical demos and training, and enabling the Solution Engineering team to scale adoption and close deals.

AWS Azure CI/CD Datadog Elasticsearch GCP Grafana Kubernetes Prometheus Splunk Terraform
1 month ago

Cloud / Platform Engineer (Remote)

Alex Staff Agency 11-50 Professional Services

Cloud/Platform Engineer at a U.S.-based EdTech company operating a global, high-load digital learning platform, responsible for maintaining production reliability and operating multi-region cloud and Kubernetes infrastructure.

AWS Bash CI/CD GCP Go Kubernetes Python Terraform
1 month ago

Customer Reliability Engineer

Sysdig 251-1K IT Services

Customer Reliability Engineer at Sysdig (remote, flexible for Italy/Spain) delivering senior-level technical support and escalation management to ensure customers run and secure cloud/container environments reliably.

AWS Azure Bash Cassandra Elasticsearch GCP Kafka Kubernetes Linux PostgreSQL Python Shell Scripting
1 month ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers