Veeam Software

Veeam Software

Veeam Software is the global leader in Backup that delivers Modern Data Protection, offering solutions for virtual environments, enterprises, small businesses, and service providers worldwide.

Internet Software & Services
1K-5K
Founded 2006
$500M raised

Description

  • Own complex and escalated production issues from support and drive long-term fixes with engineering across code, configuration, and architecture changes.
  • Proactively identify and address risks uncovered during problem-solving and incident response.
  • Lead production efficiency initiatives and maintain processes, runbooks, and knowledge base integrity.
  • Define, build, and maintain production monitoring systems.
  • Continuously improve alerting to reduce noise and ensure actionable, well-documented runbooks.
  • Define and maintain SLIs/SLOs for key services and use error budgets to inform operational and product decisions.
  • Turn manual operational processes into automation.
  • Own and drive the post-mortem review process and follow-up actions from incident analysis.
  • Collaborate with support as an escalation point and provide feedback and improvement recommendations.
  • Work with developers, product managers, and security professionals throughout design, rollout, patch delivery, and incident mitigation to ensure services are production-ready and fault-tolerant.

Requirements

  • 3–5 years of experience in software engineering, site reliability, production engineering, or senior technical support roles operating distributed systems.
  • Experience with log analysis and advanced troubleshooting.
  • Basic programming experience with languages such as JavaScript, Go, TypeScript, Java, or C#.
  • Experience deploying and troubleshooting systems on public cloud platforms, with Azure preferred.
  • Familiarity with observability tools such as Elastic, Prometheus, Grafana, and OpenTelemetry.
  • Understanding of distributed systems, networking, automation, and CI/CD.
  • Prior on-call or incident response experience preferred.
  • Background in automation, performance testing, or service scalability preferred.
  • Familiarity with compliance or security best practices preferred.

Benefits

  • Unlimited paid time off, 12 paid holidays, and 4 extra global VeeaMe Days for self-care.
  • 24 paid volunteer hours annually through Veeam Cares.
  • Paid parental leave: 8 weeks for all parents and 16 weeks for birthing parents.
  • Medical, dental, and vision coverage starting on day one.
  • Mental health support, therapy sessions, and digital wellness tools through the Employee Assistance Program.
  • 401(k) retirement plan with company matching contributions.
  • Fertility, adoption, and surrogacy support through Maven.
  • Legal services, identity protection, supplemental health insurance options, and tax-advantaged spending accounts for healthcare, dependent care, and commuting.
  • Competitive compensation with performance-based bonus, plus pay ranges from $92,100 to $235,900 USD depending on U.S. geographic zone.
  • Professional development resources including LinkedIn Learning, O’Reilly, mentoring, workshops, and learning events.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Site Reliability Engineer

Anduril Industries 1K-5K Aerospace & Defense

Anduril Industries is hiring a Site Reliability Engineer for its Mission Autonomy team to support the reliability and operational excellence of autonomous systems used across cloud, hardware-in-the-loop, and air-gapped environments.

Ansible AWS Azure DNS Docker GCP Go HTTP Kubernetes Linux Load Balancing Puppet Python Splunk TCP/IP Terraform
57 minutes ago

Vice President Site Reliability Engineering (Data Centers)

Galaxy 251-1K Capital Markets

Galaxy is hiring a Site Reliability Engineering leader to own enterprise automation and infrastructure platform reliability across a hybrid environment supporting digital assets, data center operations, and AI-related compute.

Active Directory Ansible AWS Azure Bash Git GitHub Actions GitLab CI Go Grafana Jenkins Linux Packer Palo Alto PowerShell Prometheus Python Splunk Terraform Windows Server
2 hours, 17 minutes ago

Sr. Site Reliability Engineer

Obsidian Security 51-250 Internet Software & Services

Obsidian Security is hiring a Sr. Site Reliability Engineer to support the reliability and operational excellence of its multi-tenant SaaS security platform for enterprise and financial customers.

Argo CD AWS Datadog GCP GitHub Actions GitOps Grafana Helm Kubernetes Microservices Prometheus
2 hours, 30 minutes ago

Site Reliability Engineer

SupplyHouse.com 251-1K Building Materials

SupplyHouse.com is hiring a full-time Site Reliability Engineer in India to support the scalability, reliability, and performance of its cloud infrastructure and applications.

Ansible Bash CI/CD Datadog Docker GCP GitLab CI Go Grafana Jenkins Kubernetes Linux Network Security Prometheus Python Terraform Unix
2 hours, 47 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers