Backblaze

Backblaze is a pioneer in robust, scalable low-cost cloud backup and storage services, offering enterprise hot storage, low-cost backup and archive solutions. With the easiest way to back up all files, Backblaze provides unlimited, unthrottled, and unc...

IT Services

Information Technology

251-1K (393)

Founded 2007

5 open positions

Links

View All Jobs

Director of Site Reliability Engineering (SRE)

3 days, 17 hours ago

United States

Full-time

Executive

Site Reliability Engineer (SRE)

DevOps and Infrastructure

Apply Now

Backblaze

IT Services

251-1K

Founded 2007

View All Jobs 5

Description

Lead and mentor a globally distributed SRE organization (15+ technical team members), including Sr. SRE and SRE Level 1 services teams.
Own the state of production and be accountable for production infrastructure and performance against key SLOs.
Provide 24/7 SRE services and centrally manage incident, change, and asset management processes.
Drive continuous improvement by leveraging operational data to prioritize work and enhance core operational competencies.
Manage demand forecasts and make strategic decisions regarding infrastructure expansion.
Oversee the budget for operational tooling and observability and manage the department budget.
Lead and coordinate strategic initiatives to evolve production support, incident/change/asset management, and related processes.
Recruit, coach, and develop team members to meet Backblaze and individual career objectives.
Build and maintain strong cross-functional relationships with Infrastructure Engineering, Customer Support, Data Center Operations, Supply Chain, Vendor Management, and Legal.
Represent Cloud Operations leadership as an engaged, visible leader and participate in contract renewal and vendor management cycles.

Requirements

Proven experience in a leadership role within MSP or Infrastructure-as-a-Service environments.
6+ years of management experience, with at least 3 years at the Director level.
5+ years of hands-on technical experience in a field related to the team’s focus.
Significant experience with cloud-scale data center systems, services, and managing mission-critical operations of complex global infrastructure.
Experience being accountable for production SLOs and measuring performance against those objectives.
Demonstrated experience with incident, change, and operational/process management and continuous improvement.
Strong analytical and data-driven decision-making skills and experience establishing department-level objectives/OKRs.
Experience managing department budgets and budgets for operational tooling and observability.
Excellent collaboration and communication skills with experience building high-performing, distributed teams.
Ability to travel domestically and internationally as needed; remote within Continental USA is acceptable with experience managing remotely.
Six Sigma training and/or certification is a plus.

Benefits

RSU grants for full-time employees
Annual company bonus plan
Healthcare for family, including dental and vision
401(k) plan
Employee Stock Purchase Plan (ESPP)
Flexible vacation policy
Maternity and paternity leave
MacBook Pro plus a generous stipend to personalize your workstation

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer

TextNow 51-250 Wireless Telecommunication Services

TextNow is hiring a remote Site Reliability Engineer in Canada to own infrastructure, monitoring, logging, CI/CD, and reliability for the systems supporting its free phone service platform.

Canada Full-time Senior Site Reliability Engineer (SRE)

$113k-$212k

Ansible AWS CI/CD GitHub System Design Terraform

6 hours, 28 minutes ago

Apply

6 hours, 28 minutes ago

Senior Application Engineer

Warner Music Group 5K-10K Media

Warner Music Group is hiring a Senior Application Engineer to support, improve, and modernize the software systems behind its global music operations.

Canada Full-time Senior Site Reliability Engineer (SRE) Software Engineer

$100k-$145k

Angular AWS CI/CD GitHub Actions Java Oracle PostgreSQL Python React SQL

6 hours, 43 minutes ago

Apply

6 hours, 43 minutes ago

Site Reliability Engineer - Backstage

Spotify Media

Site Reliability Engineer for Spotify’s Backstage team in New York City, focused on building and operating cloud infrastructure for an external developer portal and internal AI-driven coding workflows.

United States Full-time Mid Level Site Reliability Engineer (SRE)

$133k-$190k

AWS GCP Go Java LLM Microservices Python React Terraform TypeScript

7 hours, 58 minutes ago

Apply

7 hours, 58 minutes ago

Blockchain Site Reliability Engineer

InfStones 51-250 Internet Software & Services

InfStones is hiring a remote Blockchain Site Reliability Engineer in Dallas to ensure the reliability, availability, and performance of its blockchain node infrastructure.

United States Contract Senior Site Reliability Engineer (SRE)

Docker Ethereum Go Grafana JavaScript Kubernetes Linux Prometheus Python Rust Solana

8 hours, 43 minutes ago

Apply

8 hours, 43 minutes ago

Backblaze

Tags

Links

Director of Site Reliability Engineering (SRE)

Backblaze

Description

Requirements

Benefits

Similar Roles

Site Reliability Engineer

Senior Application Engineer

Site Reliability Engineer - Backstage

Blockchain Site Reliability Engineer

You're on a roll! Sign up now to keep applying.