66degrees

66degrees: Google Cloud Premier Partner shaping the Future of Work with AI and data solutions.

IT Services

Information Technology

251-1K (660)

24 open positions

Links

View All Jobs

Site Reliability Engineer

1 month ago

Canada

Mid Level

Site Reliability Engineer (SRE)

DevOps and Infrastructure

Agile Datadog GCP JIRA Kanban Kubernetes Linux Prometheus Python Scrum SQL Server Terraform

Apply Now

66degrees

66degrees: Google Cloud Premier Partner shaping the Future of Work with AI and data solutions.

IT Services

251-1K

View All Jobs 24

Description

Ensure near-zero downtime through monitoring, alerting, self-healing automation, and continuous improvement.
Build highly automated, available, and scalable systems using software and infrastructure principles.
Advise clients on DevOps and SRE practices, including deployment pipelines, high availability, service reliability, technical debt, and operational toil.
Take a proactive approach to client workloads by anticipating failures, automating tasks, ensuring availability, and supporting customer experience.
Work with clients, internal teams, and Google engineers to investigate and resolve infrastructure issues.
Manage a Jira queue of inbound requests across multiple clients while balancing and prioritizing work.
Contribute to documentation, open-source efforts, and operational improvements.
Support deployment and optimization of cloud workloads using Google Cloud technologies and related tooling.

Requirements

4+ years of cloud and infrastructure experience, including Linux, Windows, Kubernetes, databases, and networking services.
2+ years of full-time Google Cloud experience preferred.
Proficiency with Python required.
Strong provisioning and configuration skills using Terraform.
Experience troubleshooting issues across systems, networks, and code.
Experience with 24x7x365 monitoring, incident response, and on-call support preferred.
Experience determining and negotiating error budgets, SLIs, SLOs, and SLAs with product owners.
Experience working in Agile Scrum and Kanban methodologies in the SDLC.
Ability to work independently and collaboratively across teams.
Strong communication skills in a heavily customer-facing role.
Bachelor’s degree in Computer Science, Computer Engineering, or related field, or equivalent work experience.
Microsoft Server and SQL Server experience is a plus but not required.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Operations Engineer

Mozilla 251-1K Internet Software & Services

Mozilla is hiring a Staff Operations Engineer to lead the design, reliability, and evolution of hybrid-cloud and workplace infrastructure across teams.

Canada Full-time Lead Infrastructure Engineer Site Reliability Engineer (SRE)

$86k-$127k

Ansible DNS Linux Puppet Python TCP/IP Unix

3 hours, 38 minutes ago

Apply

3 hours, 38 minutes ago

Principal Site Reliability Engineer (SRE)

Symmetrio Professional Services

Symmetrio is recruiting a Principal Site Reliability Engineer for a rapidly growing healthcare technology company to own the reliability, scalability, security, and performance of a mission-critical SaaS platform used by healthcare providers across the United States.

United States Full-time Lead Site Reliability Engineer (SRE)

Active Directory AWS CI/CD Datadog Django Grafana Kubernetes Python Terraform Windows Server

3 hours, 53 minutes ago

Apply

3 hours, 53 minutes ago

Performance Test Engineer Lead

PartnerOne 51-250 Media

An enterprise performance engineering role at a cloud-focused organization, responsible for validating the scalability, stability, and production readiness of distributed systems across Azure and hybrid environments.

Egypt Full-time Lead QA Engineer Site Reliability Engineer (SRE)

Azure CI/CD Kubernetes PowerShell

4 hours, 8 minutes ago

Apply

4 hours, 8 minutes ago