Veeam Software

Veeam Software

Veeam Software is the global leader in Backup that delivers Modern Data Protection, offering solutions for virtual environments, enterprises, small businesses, and service providers worldwide.

Internet Software & Services
1K-5K
Founded 2006
$500M raised

Description

  • Get up to speed on the full platform, including workloads, dependencies, and risk areas, through code, documentation, and conversations.
  • Work with subject matter experts to close knowledge gaps and create onboarding material for the team.
  • Write and maintain runbooks, architecture documentation, and operational guides.
  • Design highly available and fault-tolerant infrastructure on Azure, including Azure Government.
  • Define SLIs, SLOs, and error budgets for the platform.
  • Run incident response efforts and lead blameless postmortems that drive improvements.
  • Identify reliability risks across modern and legacy workloads and develop remediation plans that meet compliance constraints.
  • Define instrumentation needs, close observability gaps, and set alerting, telemetry, and monitoring standards.
  • Build automation to reduce toil and support fleet management.
  • Work with infrastructure-as-code, CI/CD, deployment automation, testing, canary releases, and release validation pipelines.
  • Collaborate across product, platform, security, legal, compliance, and operations teams.
  • Participate in on-call rotations and mentor other engineers on SRE practices.

Requirements

  • 7+ years in software engineering, including 3+ years in SRE, platform engineering, or a similar discipline.
  • Experience with government or sovereign cloud environments such as Azure Government or AWS GovCloud.
  • Experience working in regulated compliance environments such as FedRAMP, CMMC, IL2/IL4/IL5, PCI-DSS, SOX, HIPAA, or HITRUST.
  • Strong experience building and running production services on cloud infrastructure, with Azure preferred.
  • Ability to learn large, complex platforms quickly with limited guidance and restricted environment access.
  • Ability to investigate systems independently and produce clear documentation, risk assessments, and improvement plans.
  • Comfort working across engineering, product, security, compliance, and operations teams.
  • Programming experience in TypeScript/JavaScript, Go, Java, C#, or a similar language.
  • Experience with monitoring and observability tools such as Prometheus, Grafana, OpenTelemetry, or ELK Stack.
  • Experience with infrastructure as code tools such as Terraform, Terragrunt, Pulumi, or similar, plus Kubernetes.
  • Experience with CI/CD and GitOps tooling such as GitHub Actions, Azure DevOps, GitLab CI, ArgoCD, FluxCD, or Dagger.
  • Solid understanding of distributed systems, networking, and cloud-native architecture.
  • Clear written and verbal communication skills.
  • Preferred: Experience on B2B SaaS platforms in regulated or government markets.
  • Preferred: Background in chaos engineering, resilience testing, or performance/load testing.
  • Preferred: Experience building an SRE or reliability function from scratch.
  • Preferred: Experience across both modern cloud-native and legacy systems.
  • Preferred: Familiarity with AI-first development workflows and LLM-powered tools for infrastructure automation, code generation, and documentation.

Benefits

  • Unlimited paid time off, 12 paid holidays, 4 additional VeeaMe Days for self-care, and 24 paid volunteer hours annually.
  • Paid parental leave of 8 weeks for all parents and 16 weeks for birthing parents.
  • Medical, dental, and vision coverage starting on day one.
  • Mental health support, therapy sessions, and digital wellness tools through the Employee Assistance Program.
  • 401(k) retirement plan with company matching contributions.
  • Fertility, adoption, and surrogacy support through Maven.
  • Legal services, identity protection, and supplemental health insurance options.
  • Tax-advantaged spending accounts for healthcare, dependent care, and commuting.
  • Learning and development support through LinkedIn Learning, O’Reilly, mentoring, workshops, and the Global Day of Learning.
  • Competitive compensation with a performance-based bonus, plus U.S. salary ranges from $109,800 to $252,500 depending on geographic zone.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Database Reliability Engineer

Sporty Group 51-250 Media

Sporty is seeking a Database Reliability Engineer to own and improve its database infrastructure supporting multiple platforms and international expansion.

Ansible Argo CD Elasticsearch GitHub Actions Go Grafana Helm Jenkins Kubernetes MongoDB MySQL PostgreSQL Prometheus Python RabbitMQ Terraform
10 hours, 13 minutes ago

Senior Site Reliability Engineer

Moniepoint 1K-5K Diversified Financial Services

Moniepoint is hiring an experienced Site Reliability Engineer to improve the reliability, scalability, and observability of its highly distributed financial platform serving emerging markets.

AWS Azure Datadog GCP Go Java Kafka Kubernetes Microservices MySQL New Relic OpenTelemetry PostgreSQL Prometheus Python RabbitMQ Rust
10 hours, 58 minutes ago

Senior Site Reliability Engineer, Identity Platform

Coinbase 1K-5K Capital Markets

Coinbase is hiring an experienced Site Reliability Engineer to build and scale identity and access management tooling for its IT Operations Corporate Engineering team supporting cloud-based, security-first systems.

Ansible AWS Azure C# CI/CD Docker GCP Go Java Kubernetes Python Ruby Secrets Management Terraform
11 hours, 28 minutes ago

Database Reliability Engineer - Core Team

ClickHouse 51-250 IT Services

ClickHouse is hiring a Site Reliability Engineering team member for ClickHouse Core to improve the reliability, availability, scalability, and performance of ClickHouse Cloud for customers worldwide.

AWS Azure C++ ClickHouse GCP Python SQL
11 hours, 58 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers