Kaseya

Kaseya provides integrated IT management and security solutions for MSPs and SMBs, enabling centralized IT operations, remote management, cybersecurity, and automation.

IT Services

Information Technology

1K-5K (3500)

Founded 2000

$567M raised

87 open positions

Links

View All Jobs

Site Reliability Engineer

2 weeks, 1 day ago

Canada

Full-time

Mid Level

Site Reliability Engineer (SRE)

DevOps and Infrastructure

Ansible AWS Chef CloudFormation Datadog DevSecOps Elasticsearch Kibana Kubernetes MySQL PostgreSQL Puppet Secrets Management Serverless Terraform

Apply Now

Kaseya

Kaseya provides integrated IT management and security solutions for MSPs and SMBs, enabling centralized IT operations, remote management, cybersecurity, and automation.

IT Services

1K-5K

Founded 2000

$567M raised

View All Jobs 87

Description

Set, monitor, and enforce SLOs, SLIs, and error budgets to maintain service reliability.
Lead incident response, troubleshooting, and blameless postmortems that drive permanent fixes.
Build and maintain automated deployment, configuration management, and infrastructure provisioning using Infrastructure as Code.
Manage cloud and hybrid infrastructure with Terraform or CloudFormation, balancing cost, scalability, and resilience.
Improve observability through proactive monitoring, alerting, and dashboards that surface issues early.
Partner with development teams to embed reliability into the SDLC, including deployment automation, capacity planning, and chaos engineering.
Reduce operational toil through automation and self-healing systems.
Support containerized and serverless workloads to keep production systems highly available and fault tolerant.
Stay current on SRE, cloud, and observability practices and bring improvements back to the team.

Requirements

4 to 5 years of AWS production experience.
Experience owning infrastructure as code with Terraform or CloudFormation, including state management.
AWS ECS production experience, or a strong Kubernetes background with willingness to ramp up.
Active on-call rotation experience, including leading incidents and writing postmortems.
Working fluency with SLOs, SLIs, and error budgets in production.
Kubernetes production experience preferred.
Experience with observability tools such as Datadog, Dynatrace, CloudWatch, or Elasticsearch/Kibana preferred.
Experience with chaos engineering preferred.
Experience with AWS Lambda or other serverless workloads preferred.
Experience with Ansible, Chef, or Puppet preferred.
DevSecOps experience, including vulnerability scanning, secrets management, SOC2, or ISO 27001, preferred.
Production database support experience with RDS, PostgreSQL, or MySQL preferred.
Open source contributions or a public technical portfolio preferred.

Benefits

Annual base salary of CAD $115,000 to CAD $130,000.
Final offer considered based on experience, skills, and internal equity.
Equal employment opportunity across all protected characteristics.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Operations Engineer

Mozilla 251-1K Internet Software & Services

Mozilla is hiring a Staff Operations Engineer to lead the design, reliability, and evolution of hybrid-cloud and workplace infrastructure across teams.

Canada Full-time Lead Infrastructure Engineer Site Reliability Engineer (SRE)

$86k-$127k

Ansible DNS Linux Puppet Python TCP/IP Unix

3 hours, 38 minutes ago

Apply

3 hours, 38 minutes ago

Principal Site Reliability Engineer (SRE)

Symmetrio Professional Services

Symmetrio is recruiting a Principal Site Reliability Engineer for a rapidly growing healthcare technology company to own the reliability, scalability, security, and performance of a mission-critical SaaS platform used by healthcare providers across the United States.

United States Full-time Lead Site Reliability Engineer (SRE)

Active Directory AWS CI/CD Datadog Django Grafana Kubernetes Python Terraform Windows Server

3 hours, 53 minutes ago

Apply

3 hours, 53 minutes ago

Performance Test Engineer Lead

PartnerOne 51-250 Media

An enterprise performance engineering role at a cloud-focused organization, responsible for validating the scalability, stability, and production readiness of distributed systems across Azure and hybrid environments.

Egypt Full-time Lead QA Engineer Site Reliability Engineer (SRE)

Azure CI/CD Kubernetes PowerShell

4 hours, 8 minutes ago

Apply

4 hours, 8 minutes ago