Sitetracker

Sitetracker provides a comprehensive platform for managing high-volume distributed projects, enabling real-time collaboration, automated reporting, and accurate forecasting to streamline the deployment of infrastructure projects.

Diversified Telecommunication Services

Telecommunication Services

251-1K (420)

Founded 2013

$183M raised

5 open positions

Links

View All Jobs

Site Reliability Engineer

15 hours, 19 minutes ago

Canada

Full-time

Lead

Site Reliability Engineer (SRE)

DevOps and Infrastructure

AWS Bash CloudFormation EC2 GitHub Actions Load Balancing Terraform

Apply Now

Sitetracker

Diversified Telecommunication Services

251-1K

Founded 2013

$183M raised

View All Jobs 5

Description

Define SLIs, SLOs, and error-budget policies for critical user journeys to guide reliability decisions.
Partner with engineers to transition the organization from reactive firefighting to a proactive reliability practice.
Lead production incident response as Incident Commander and run blameless postmortems with follow-up actions.
Build observability dashboards and actionable alerting that clearly explain system behavior and paging needs.
Design and maintain production-readiness and deploy-safety practices for engineering teams.
Evaluate infrastructure tooling and lead migrations when new approaches are justified by evidence.
Operate and debug AWS-based systems, including network and IAM issues, and support multi-region or regional rollout planning.
Mentor engineers through pair debugging, postmortem coaching, and runbook reviews to raise team capability.
Work with stakeholders and engineering teams to communicate downtime, infrastructure changes, and reliability trade-offs.
Use AI tools and log analysis to accelerate troubleshooting, operational improvements, and delivery.

Requirements

Staff or Senior Staff-level SRE experience is implied for this role.
Strong experience defining SLIs, SLOs, error budgets, and reliability practices.
Hands-on AWS experience across VPC, IAM, compute services such as ECS, EC2, and Lambda, managed data services, and load balancing.
Experience managing production incidents, incident command, and blameless postmortems.
Ability to build observability, dashboards, alerts, and clear runbooks.
Experience working with infrastructure managed through CloudFormation, bash scripts, and GitHub Actions.
Ability to debug production AWS issues at the network and IAM level without immediately escalating to AWS support.
Experience evaluating and leading infrastructure migrations, with familiarity or interest in Terraform, service mesh, and multi-region architectures.
Strong communication skills for writing postmortems, sharing downtime notices, and influencing roadmap decisions.
Comfort using AI tooling such as coding agents and log analysis tools to speed up engineering work.

Benefits

Salary range of $97,000 to $149,200 per year.
Remote work arrangement.
Opportunity to build a reliability practice and influence engineering standards from the ground up.
Autonomy to set reliability strategy and decide when to adopt new technologies.
Role impact across enterprise-scale platform reliability and expanding AI workloads.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer (Senior or Staff), Atlas

MongoDB 1K-5K Internet Software & Services

MongoDB is hiring a Senior Site Reliability Engineer for its Atlas team to help support, maintain, and grow a multi-cloud platform for customer-facing production workloads.

United States Full-time Senior Site Reliability Engineer (SRE)

$127k-$249k

AWS Azure DNS GCP Go HTTP Linux Python Ruby TLS

21 minutes ago

Apply

21 minutes ago

Site Reliability Engineer (Senior or Staff), Storage Layer Services (SLS)

MongoDB 1K-5K Internet Software & Services

MongoDB’s Storage Layer Services team is hiring a Site Reliability Engineer to help re-architect the cloud storage layer for Atlas and ensure the reliability and operational safety of its distributed storage infrastructure.

United States Full-time Senior Site Reliability Engineer (SRE)

$144k-$248k

AWS Azure DNS GCP Go Kubernetes Linux Python TCP/IP TLS

1 hour, 18 minutes ago

Apply

1 hour, 18 minutes ago

Manager, Software Engineering (Resilience Engineering)

Affirm 1K-5K Diversified Financial Services

Affirm is hiring an Engineering Manager to lead its Resilience Engineering team in building production load testing and chaos engineering capabilities that improve the safety and reliability of its production systems.

Canada Full-time Lead Engineering Manager Site Reliability Engineer (SRE)

$178k-$228k

AWS Java Kotlin Kubernetes Python

3 hours, 34 minutes ago

Apply

3 hours, 34 minutes ago

Senior Site Reliability Engineer

Civica 1K-5K Internet Software & Services

Civica is hiring a Senior Site Reliability Engineer to own the reliability, performance, security, and automation of the cloud platform supporting its public-sector SaaS products.

United Kingdom Full-time Senior Site Reliability Engineer (SRE)

Ansible AWS Azure CI/CD CloudFormation Datadog ELK Stack GCP GitHub Actions Go Grafana Jaeger Java Kubernetes .NET OpenSearch OpenShift Packer Prometheus Python Terraform

15 hours, 19 minutes ago

Apply

15 hours, 19 minutes ago

Sitetracker

Tags

Links

Site Reliability Engineer

Sitetracker

Description

Requirements

Benefits

Similar Roles

Site Reliability Engineer (Senior or Staff), Atlas

Site Reliability Engineer (Senior or Staff), Storage Layer Services (SLS)

Manager, Software Engineering (Resilience Engineering)

Senior Site Reliability Engineer

You're on a roll! Sign up now to keep applying.