Veeam Software

Veeam Software

Veeam Software is the global leader in Backup that delivers Modern Data Protection, offering solutions for virtual environments, enterprises, small businesses, and service providers worldwide.

Internet Software & Services
1K-5K
Founded 2006
$500M raised

Description

  • Get up to speed on Veeam Data Cloud workloads, dependencies, and operational workflows by reading code, documentation, and working with subject matter experts.
  • Write and maintain runbooks, incident guides, onboarding materials, and other operational documentation.
  • Participate in incident response, including triage, investigation, mitigation, and postmortems.
  • Help implement and maintain service level indicators, service level objectives, and error budgets.
  • Identify reliability issues and propose concrete improvements during incidents and reviews.
  • Support high availability and fault tolerance work on Azure, including Azure Government.
  • Implement monitoring improvements by adding instrumentation, alerting, and dashboards.
  • Contribute to toil reduction through automation and tooling improvements.
  • Participate in on-call rotations.
  • Work with engineering, security, compliance, and operations teams to deliver reliability improvements.

Requirements

  • 3+ years of experience in Software Engineering, including at least 1 year in SRE, Platform Engineering, or DevOps for cloud-hosted services.
  • Experience with cloud infrastructure on Azure or a comparable cloud provider.
  • Experience working in regulated or compliance-oriented environments such as government, financial, or healthcare.
  • Ability to read and understand code well enough to investigate system behavior independently.
  • Experience with monitoring and observability tools such as Prometheus, Grafana, OpenTelemetry, or the ELK stack.
  • Experience with IaC tools such as Terraform, Terragrunt, or Pulumi, and with Kubernetes.
  • Experience with CI/CD tools such as GitHub Actions, Azure DevOps, GitLab CI, or ArgoCD.
  • Strong programming skills in one or more of TypeScript/JavaScript, Go, Java, or C#, or similar languages.
  • Solid understanding of distributed systems fundamentals and networking basics.
  • Clear written and verbal communication skills.
  • Preferred: experience in Government or Sovereign Cloud environments such as Azure Government or AWS GovCloud.
  • Preferred: background in SaaS platforms or multi-tenant systems.
  • Preferred: familiarity with chaos engineering, resilience testing, or load testing.
  • Preferred: exposure to building or improving reliability practices on a team.
  • Preferred: familiarity with AI-first development workflows using LLM-powered tools for automation, code generation, or documentation.

Benefits

  • Unlimited paid time off, 12 paid holidays, 4 global VeeaMe Days for self-care, and 24 paid volunteer hours annually.
  • Paid parental leave: 8 weeks for all parents and 16 weeks for birthing parents.
  • Medical, dental, and vision coverage starting on the first day.
  • Mental health support, therapy sessions, and digital wellness tools through the Employee Assistance Program.
  • 401(k) retirement plan with company matching contributions.
  • Fertility, adoption, and surrogacy support through Maven.
  • Legal services, identity protection, and supplemental health insurance options.
  • Tax-advantaged spending accounts for healthcare, dependent care, and commuting.
  • Professional development resources including mentorship, training, workshops, on-demand learning libraries, and learning events.
  • Competitive compensation with pay transparency, performance-based bonus, and role-based geographic salary ranges.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Operations Engineer

Mozilla 251-1K Internet Software & Services

Mozilla is hiring a Staff Operations Engineer to lead the design, reliability, and evolution of hybrid-cloud and workplace infrastructure across teams.

Ansible DNS Linux Puppet Python TCP/IP Unix
49 minutes ago

Principal Site Reliability Engineer (SRE)

Symmetrio Professional Services

Symmetrio is recruiting a Principal Site Reliability Engineer for a rapidly growing healthcare technology company to own the reliability, scalability, security, and performance of a mission-critical SaaS platform used by healthcare providers across the United States.

Active Directory AWS CI/CD Datadog Django Grafana Kubernetes Python Terraform Windows Server
1 hour, 4 minutes ago

Performance Test Engineer Lead

PartnerOne 51-250 Media

An enterprise performance engineering role at a cloud-focused organization, responsible for validating the scalability, stability, and production readiness of distributed systems across Azure and hybrid environments.

Azure CI/CD Kubernetes PowerShell
1 hour, 19 minutes ago

Site Reliability Engineer

MLabs 11-50 Internet Software & Services

Remote UK-hours Site Reliability Engineering role at a financial technology company, focused on automating and operating the infrastructure that supports global integration services for financial institutions.

Active Directory Ansible AWS CI/CD GCP OAuth PostgreSQL SAML
1 hour, 34 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers