Veeam Software

Veeam Software

Veeam Software is the global leader in Backup that delivers Modern Data Protection, offering solutions for virtual environments, enterprises, small businesses, and service providers worldwide.

Internet Software & Services
1K-5K
Founded 2006
$500M raised

Description

  • Act as a technical authority and mentor senior engineers on reliability and resilience decisions.
  • Lead the definition and enforcement of SLIs, SLOs, and error budgets across engineering teams.
  • Collaborate with staff peers to align reliability strategy and shared standards.
  • Partner with development and product teams to design for failure and build resilient architecture from the start.
  • Drive company-wide adoption of observability best practices and tooling.
  • Ensure metrics, logs, and traces provide actionable insight across systems.
  • Lead complex incident responses, postmortems, and systemic reliability improvements.
  • Promote a blameless culture of learning and continuous improvement.
  • Lead initiatives in infrastructure as code, deployment automation, and resilience testing.
  • Influence chaos engineering practices and release validation frameworks, and partner with platform and security teams on production readiness.

Requirements

  • 8+ years of experience in a Software Engineering or SRE role, including technical leadership.
  • Demonstrated experience mentoring and guiding senior engineers.
  • Deep expertise in building distributed systems on public cloud, with Azure preferred.
  • Strong programming skills in JS, Go, TypeScript, Java, or C#.
  • Hands-on experience with observability tools such as Prometheus, Grafana, and OpenTelemetry.
  • Mastery of infrastructure automation tools such as Terraform or Pulumi.
  • Experience with Kubernetes for container orchestration.
  • Ability to communicate clearly across geographies and disciplines.
  • Experience leading SRE initiatives across multiple product teams is preferred.
  • Background in chaos engineering, incident learning, or performance and load testing is preferred.
  • Familiarity with global compliance standards such as ISO, SOC 2, GDPR, FedRAMP, or CMMC is preferred.

Benefits

  • 18 paid vacation days plus 4 extra global VeeaMe Days for self-care and 24 paid volunteer hours annually.
  • Private medical coverage for you and up to four dependents.
  • Life, accident, and disability insurance with enhanced coverage.
  • Annual flexible wellbeing allowance for physical and mental wellness.
  • Free confidential counseling and coaching through an Employee Assistance Program, including legal and financial advice.
  • Meal, fuel, and transportation benefits based on work arrangement.
  • Daycare reimbursement and a safe cab facility for eligible employees.
  • Access to learning and growth resources, including LinkedIn Learning, O’Reilly, mentoring, workshops, and Global Day of Learning events.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Manager, Software Engineering

Anduril Industries 1K-5K Aerospace & Defense

Anduril Industries is seeking a Senior Manager to lead CorpTech Platform software teams that build and operate AI-enabled production systems and improve how internal engineering work is designed, shipped, and maintained.

CI/CD Computer Vision ERP LLM Microservices
19 minutes ago

Senior Site Reliability Engineer

Anduril Industries 1K-5K Aerospace & Defense

Anduril Industries is hiring a Site Reliability Engineer for its Mission Autonomy team to support the reliability and operational excellence of autonomous systems used across cloud, hardware-in-the-loop, and air-gapped environments.

Ansible AWS Azure DNS Docker GCP Go HTTP Kubernetes Linux Load Balancing Puppet Python Splunk TCP/IP Terraform
19 minutes ago

Site Reliability Engineer

66degrees 251-1K IT Services

66degrees is hiring a Site Reliability Engineer to help enterprise cloud clients maintain, optimize, and scale Google Cloud environments through reliability engineering, automation, and incident response.

Agile Datadog GCP JIRA Kanban Kubernetes Linux Prometheus Python Scrum SQL Server Terraform
51 minutes ago

Senior Site Reliability Engineer

Megaport 251-1K Diversified Telecommunication Services

Megaport is hiring a Senior Site Reliability Engineer in Australia to help keep its globally distributed network infrastructure secure, reliable, and available for customers.

AWS Bash Cassandra CI/CD ClickHouse GitHub Go Kubernetes Linux PostgreSQL Python Terraform
1 hour, 11 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers