Backblaze

Backblaze

Backblaze is a pioneer in robust, scalable low-cost cloud backup and storage services, offering enterprise hot storage, low-cost backup and archive solutions. With the easiest way to back up all files, Backblaze provides unlimited, unthrottled, and unc...

IT Services
251-1K
Founded 2007

Description

  • Support the availability and durability of critical services across production environments.
  • Monitor service health using SLIs, SLOs, and error budgets, and escalate issues when thresholds are at risk.
  • Participate in on-call rotations, incident response, and post-incident reviews to improve service reliability.
  • Follow established ITIL/OSS processes for incident, change, problem, and capacity management.
  • Develop automation for common operational tasks to reduce manual intervention and toil.
  • Contribute to monitoring, logging, and alerting frameworks.
  • Work with CI/CD pipelines, configuration management, and infrastructure as code tools.
  • Write scripts to improve system reliability and operational efficiency.
  • Partner with engineering, product, and operations teams on resilient system design and operations.
  • Assist with capacity planning, disaster recovery exercises, vendor troubleshooting, documentation, and operational process improvement.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
  • 2–4 years of experience in site reliability, systems engineering, or operations.
  • Exposure to large-scale, production-grade systems.
  • Solid Linux systems administration and troubleshooting skills.
  • Familiarity with monitoring, alerting, incident response, and root cause analysis.
  • Proficiency in at least one scripting language such as Python, Bash, or Go.
  • Understanding of containers such as Kubernetes and Docker, and microservices concepts.
  • Knowledge of incident response and operational best practices.
  • Experience in a SaaS, service provider, or distributed systems environment (preferred).
  • Familiarity with ITIL/OSS practices and SLOs/SLA(s) (preferred).
  • Experience with cloud platforms such as AWS, GCP, or Azure (preferred).
  • Ability to work independently, take ownership, and drive projects from problem discovery through resolution (preferred).

Benefits

  • Backblaze emphasizes a culture of learning, development, and growth.
  • The company supports diversity, equity, and inclusion and fosters a sense of belonging.
  • Backblaze is proud to be an Equal Opportunity Employer.
  • Candidates are encouraged to apply even if they do not meet every requirement.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior AIOps Engineer, Incident Response [Remote-US]

Quanata 201-500 information technology & services

Quanata is hiring an experienced production operations and reliability leader to oversee production health, incident response, and operational support for its AI-driven insurance technology platform.

AWS Confluence JIRA
49 minutes ago

DevOps & Site Reliability Engineer

Oowlish 51-250 Internet Software & Services

Oowlish is hiring a DevOps & Site Reliability Engineer for a remote role supporting an AI-focused SaaS startup’s infrastructure, deployment, and reliability needs.

AWS Azure Azure Pipelines Bash CI/CD CircleCI Datadog Docker GCP Grafana Helm Jenkins Kubernetes New Relic Prometheus
1 hour, 49 minutes ago

SRE

Resilient Co 11-50 Professional Services

HealthEquity is hiring a remote Purple Platform Engineer – SRE to support its cloud-native application delivery ecosystem by improving reliability, self-service deployment, and secure platform operations.

Argo CD Bash C# CI/CD Docker Flux GitOps Go HashiCorp Vault Helm Kubernetes MongoDB Node.js OpenTelemetry PowerShell Python Redis Terraform
10 hours, 32 minutes ago

Senior Site Reliability Engineer

Cribl 251-1K IT Services

Cribl is hiring a Senior Site Reliability Engineer in Poland to help build and operate the telemetry infrastructure and observability platform that supports its cloud products and enterprise customers.

Ansible AWS Azure CI/CD Grafana JavaScript Kibana Linux New Relic Node.js PagerDuty Prometheus Splunk Terraform TypeScript
16 hours, 10 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers