Operations Team Lead (Production & Reliability)

3 weeks ago
Full-time
Lead
DevOps and Infrastructure
Complexio

Complexio

Complexio connects your data, people, and systems into one intelligence layer. Ask questions in natural language, get answers from real operational data.

Description

  • Own production stability and availability across all live systems.
  • Lead operational readiness for new releases and manage safe production access and change coordination.
  • Own the full incident management lifecycle, including detection, response, communication, and postmortems.
  • Design and maintain sustainable on-call rotations, escalation paths, severity levels, and runbooks.
  • Define SLIs and SLOs for critical systems and improve visibility into reliability signals.
  • Track reliability metrics such as MTTR, incident frequency, and escalation trends.
  • Drive reliability roadmap initiatives and systemic fixes that prevent recurring incidents.
  • Lead and grow the Operations team by setting standards, KPIs, ownership, and accountability.
  • Raise the bar on operational discipline across both systems and team performance.

Requirements

  • Strong experience in SRE, DevOps, Infrastructure, or Production Engineering.
  • Prior experience leading technical teams.
  • Deep hands-on incident management experience.
  • Strong observability and reliability mindset.
  • Calm under pressure and clear in communication.
  • Systems thinker who fixes root causes rather than symptoms.
  • Experience building structured incident response and escalation processes is highly relevant.
  • Experience defining SLIs/SLOs, runbooks, or on-call practices is preferred.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Call Centre Manager

Reef / Hire with Reef 11-50 Design Services

Call Centre Manager at a remote vehicle and automotive-focused company, responsible for leading customer care operations and end-to-end insurance claims handling to improve service quality, accuracy, and speed.

1 hour ago

Intermediate Site Reliability Engineer - OP02119

Dev.Pro 251-1K Internet Software & Services

Dev.Pro is hiring an IT Specialist for its SRE team to support company and client environments by maintaining infrastructure, monitoring services, and automating operations across cloud and on-premises systems.

Ansible Apache AWS Bash CI/CD DHCP DNS Docker ELK Stack GCP Git Grafana Jenkins Linux MySQL Nginx PostgreSQL Prometheus Puppet Python SQL SQL Server SSH TCP/IP TeamCity Terraform TLS Ubuntu Windows Server Zabbix
1 hour ago

Head of Client Operations

Pavago IT Services

Head of Client Operations at a remote U.S.-hours startup, owning post-sale client operations, cold email campaign performance, and delivery quality to drive retention and results.

Copywriting CRM
1 hour, 15 minutes ago

Manager, Supply Strategy & Operations

Burq 11-50 Air Freight & Logistics

Burq is hiring a Supply Strategy & Operations Manager to lead its supply-side delivery network, managing DSP partnerships and operations to improve coverage, economics, and delivery performance.

SQL Tableau
1 hour, 15 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers