Senior Site Reliability Engineer

2 hours, 56 minutes ago
Full-time
Senior
DevOps and Infrastructure
UJET

UJET

UJET is redefining customer experience through its advanced cloud contact center platform, which integrates AI-powered solutions to enhance support across voice and digital channels while offering features like intelligent workforce management and virt...

Professional Services
251-1K
Founded 2015
$101M raised

Description

  • Lead efforts to improve system reliability, scalability, and performance across critical services.
  • Define and implement SLIs, SLOs, and error budgets to guide engineering priorities.
  • Design and develop observability systems for metrics, logging, tracing, and alerting with minimal noise.
  • Lead complex incident response and act as incident commander when needed.
  • Conduct systemic postmortems and ensure corrective actions are completed.
  • Identify and eliminate operational toil through automation, tooling, and improved workflows.
  • Partner with product and platform teams on architecture decisions, production readiness, and failure recovery design.
  • Build reusable systems and paved roads that help teams operate services reliably.
  • Mentor other engineers and raise the organization’s operational maturity.

Requirements

  • 6-10+ years of experience in SRE, infrastructure, or backend systems engineering.
  • Experience owning reliability outcomes for complex, distributed systems.
  • Strong experience with cloud infrastructure such as AWS, GCP, or Azure and production-scale systems.
  • Deep understanding of observability, incident management, and system performance.
  • Proficiency in at least one programming language such as Go, Python, or Java, with a focus on automation and tooling.
  • Ability to influence how other teams work without direct managerial authority.
  • Strong decision-making skills during incidents, following a defined process without reacting emotionally.
  • Experience building or scaling SRE practices, including SLOs, incident frameworks, or on-call models is preferred.
  • Kubernetes or container orchestration experience is preferred.
  • Infrastructure as Code experience, such as Terraform, is preferred.
  • Experience with high-growth or scaling systems is preferred.
  • Background in performance engineering or capacity planning is preferred.

Benefits

  • Annual US hiring range of $100,000 to $120,000.
  • Medical, dental, and vision coverage.
  • 401(k) plan.
  • Commuter benefits.
  • Remote or hybrid work options indicated by #LI-Remote and #LI-Hybrid.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Database Reliability Engineer

Rithum Internet Software & Services

Rithum is seeking a Senior Database Reliability Engineer to manage and improve the availability, reliability, observability, and security of database systems across a large hybrid infrastructure.

AWS CI/CD DynamoDB Elasticsearch MongoDB MySQL PostgreSQL PowerShell Python Redis SQL Server
41 minutes ago

Senior Site Reliability Engineer

Algolia 251-1K Internet Software & Services

Algolia is hiring a Site Reliability Engineering team member to help operate and improve the availability, reliability, scalability, and cost efficiency of its Search products at internet scale.

AWS Azure Chef CircleCI Datadog GCP GitHub Actions Go Kubernetes Linux Python Ruby Terraform
41 minutes ago

Senior Site Reliability Engineer (SRE)

Fable 11-50 Professional Services

Fable Global is seeking a Senior Site Reliability Engineer to help ensure the reliability, scalability, and cost-efficient operation of the infrastructure behind its accessible digital products and AI-enabled capabilities.

AWS Azure CI/CD CloudFormation Datadog GCP Go Grafana Java Node.js Prometheus Python Terraform
59 minutes ago

Senior Database Reliability Engineer

Rithum Internet Software & Services

Rithum is hiring a Senior Database Reliability Engineer to manage and improve the reliability, observability, and security of a large-scale hybrid database environment supporting mission-critical commerce systems.

AWS CI/CD DynamoDB Elasticsearch MongoDB MySQL PostgreSQL PowerShell Python Redis SQL Server
3 hours, 11 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers