Senior Site Reliability Engineer I

1 hour, 59 minutes ago
Full-time
Senior
DevOps and Infrastructure
instacart.careers

instacart.careers

Instacart is a leading grocery technology company in North America that works with grocers and retailers to transform how people shop. They partner with over 1,000 national, regional, and local retail banners to facilitate online shopping, delivery, an...

Internet Software & Services
1K-5K

Description

  • Develop scalable infrastructure strategies that support high availability and align with product roadmaps.
  • Optimize infrastructure cost, risk, and performance with cloud providers.
  • Establish and lead incident management protocols and response plans.
  • Investigate root causes, prevent recurrence, and coordinate with security teams on response readiness and security risks.
  • Monitor performance metrics, trends, SLOs, SLIs, and error budgets to identify reliability risks and propose improvements.
  • Lead cross-functional projects to optimize systems and reduce technical debt.
  • Collaborate with product and engineering teams to ensure system enhancements meet user requirements.
  • Design and deploy automation tools and maintain automation scripts and frameworks for deployment and operations.
  • Monitor automated systems for performance and reliability and quickly resolve issues in automated environments.
  • Provide technical guidance to junior colleagues and lead knowledge-sharing and training sessions on site reliability best practices.

Requirements

  • Proven experience in programming.
  • Strong knowledge of incident management processes and tools.
  • Exemplary troubleshooting and problem-solving skills.
  • Ability to work under pressure and prioritize tasks during high-stress situations.
  • Experience scaling application infrastructure for high availability.
  • Proficiency in Ruby or Go (preferred).
  • Experience with cloud platforms such as AWS, GCP, or Azure (preferred).
  • Experience with containerization tools such as Docker or Kubernetes (preferred).
  • Experience assessing risk for foundational infrastructure changes (preferred).
  • Experience monitoring system performance and analyzing trends (preferred).

Benefits

  • Highly market-competitive compensation.
  • Remote Flex First work policy with the flexibility to work from home, an office, or a coffee shop.
  • Eligible for a new hire equity grant.
  • Eligible for annual equity refresh grants.
  • Base salary range of $155,000 to $195,500 USD depending on location.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer

Orcrist Technologies Internet Software & Services

Orcrist is hiring a Site Reliability Engineer to deploy and operate its Kubernetes-based data intelligence platform in on-prem, hybrid, and agency-controlled environments for defense, law-enforcement, and enterprise customers.

Ansible Argo CD Elasticsearch Flux GitOps Grafana Helm Kubernetes Prometheus SAML SIEM Splunk Terraform
29 minutes ago

Site Reliability Engineer-SkillBridge Intern

Zscaler 1K-5K Internet Software & Services

Zscaler is hiring a Site Reliability Engineer SkillBridge Intern to support its Zero Trust Exchange team in a remote role based in San Jose or Bellevue, helping operate and improve the cloud security platform behind its global cybersecurity services.

Ansible AWS DNS HTTP Kubernetes Python SQL Terraform
44 minutes ago

Senior Site Reliability Engineer (SRE, Compute Node Team)

Nebius 51-250 Internet Software & Services

Nebius AI Cloud is hiring a Senior Site Reliability Engineer to operate and improve the Compute Node platform that runs virtual machines across global cloud regions, with a focus on Linux systems, virtualization, and operational reliability.

Kubernetes Linux System Design
1 hour, 14 minutes ago

Network Site Reliability Engineer (NetSRE)

Nebius 51-250 Internet Software & Services

Nebius is seeking a Network Site Reliability Engineer to help operate and improve the network infrastructure that underpins its AI cloud platform as the company scales globally.

CI/CD Go Linux Python
2 hours, 14 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers