Firstup

Firstup offers an intelligent communication platform designed to engage employees throughout their entire employment journey, providing insights that help organizations support, promote, and retain their workforce effectively.

Professional Services

Industrials

251-1K (450)

Founded 2008

$47M raised

11 open positions

Links

View All Jobs

Director of Cloud Operations

3 months ago

United States

Full-time

Executive

Site Reliability Engineer (SRE)

DevOps and Infrastructure

AWS CI/CD CircleCI Datadog Kubernetes Microservices .NET Serverless Terraform

Apply Now

Firstup

Professional Services

251-1K

Founded 2008

$47M raised

View All Jobs 11

Description

Own the availability, performance, and resilience of the multi-region AWS platform.
Drive reliability improvements using SLIs/SLOs, error budgets, and proactive engineering practices.
Lead efforts to reduce MTTR and improve incident response across the organization.
Guide architecture decisions for microservices, Kubernetes (EKS), and serverless workloads.
Advance the observability strategy using Datadog to provide actionable insights across infrastructure and applications.
Establish and refine incident management practices, including on-call processes, escalation paths, and post-incident reviews.
Act as incident commander for critical events and participate in the on-call rotation.
Improve operational standards through automation, standardization, and modern best practices.
Drive cost optimization across AWS environments without sacrificing performance or reliability.
Lead, mentor, and support a distributed CloudOps team across the US and UK.
Oversee operations of a legacy .NET-based solution in private data centers in the US and Europe.

Requirements

10+ years of experience in cloud infrastructure, SRE, or DevOps roles.
Recent experience leading CloudOps or SRE teams.
Proven experience leading operational or platform transformations in a SaaS environment.
Experience operating multi-region, customer-facing systems at scale.
Strong hands-on experience with AWS multi-region architectures.
Hands-on experience with Kubernetes (EKS) and containerized environments.
Infrastructure as Code experience, with Terraform preferred.
Experience with CI/CD pipelines such as CircleCI or similar tools.
Experience with observability platforms such as Datadog or equivalent tools.
Solid understanding of microservices and distributed systems design.
Familiarity with serverless architectures and modern cloud-native patterns.
Deep experience with incident management, on-call operations, and reliability engineering practices.
Strong understanding of SLO/SLI frameworks, monitoring strategies, and performance optimization.
Demonstrated ability to balance hands-on technical work with team leadership.
Collaborative and pragmatic leadership style with ability to influence across teams.
Passion for building and supporting high-performing teams.
Bias toward continuous improvement and measurable outcomes.

Benefits

Base salary range of $200,000 to $228,000.
Excellent PTO program.
Great health benefits.
Remote work arrangement.
Casual and friendly work environment.
Leadership team committed to personal and professional growth.
Inclusive, high-growth environment where ideas are rewarded.
Opportunity to make a direct impact on reliability, scalability, and customer experience.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer

VantageScore 11-50 Banks

Site Reliability Engineer at a growing engineering team, focused on DevSecOps for maintaining the reliability, security, and compliance of cloud infrastructure, APIs, and software supply chains.

United States Full-time Senior Site Reliability Engineer (SRE)

$150k-$150k

Agile AWS AWS CDK Bash CI/CD CloudFormation CodePipeline Datadog DevSecOps Docker EC2 GitHub Actions Grafana HashiCorp Vault Kong Kubernetes Microservices Python REST API Scrum Terraform

9 hours, 25 minutes ago

Apply

9 hours, 25 minutes ago

Application Site Reliability Engineer (SRE)

CXM Direct 51-250 Capital Markets

Application Site Reliability Engineer at a trading technology company, responsible for keeping .NET/C# Windows-based trading and back-office services highly reliable, observable, and resilient.

Chile Colombia Peru Uruguay Mexico Argentina Full-time Mid Level Site Reliability Engineer (SRE)

AWS Bash C# CI/CD Docker Grafana Kubernetes Microservices .NET OpenTelemetry PowerShell Prometheus Python Terraform Windows Server

9 hours, 40 minutes ago

Apply

9 hours, 40 minutes ago

Customer Reliability Engineer

iPiD 11-50 Internet Software & Services

iPiD is hiring a Customer Reliability Engineer to own production reliability, customer deployments, and operational excellence for its global KYP verification platform.

Romania Full-time Senior Customer Success Site Reliability Engineer (SRE)

Ansible CI/CD GitOps Helm Kubernetes Linux Microservices Terraform

9 hours, 55 minutes ago

Apply

9 hours, 55 minutes ago

Site Reliability Engineer

CSC Generation 251-1K Internet Software & Services

Backcountry is hiring a Site Reliability Engineer in Costa Rica to keep its ecommerce platform reliable, scalable, and observable across a multi-cloud environment.

Costa Rica Full-time Mid Level Site Reliability Engineer (SRE)

Ansible Argo CD AWS AWS CDK Bash CI/CD Docker GCP GitOps Grafana Helm Kubernetes Linux Node.js OpenSearch Prometheus Python Terraform TypeScript

1 day, 8 hours ago

Apply

1 day, 8 hours ago

Firstup

Tags

Links

Director of Cloud Operations

Firstup

Description

Requirements

Benefits

Similar Roles

Site Reliability Engineer

Application Site Reliability Engineer (SRE)

Customer Reliability Engineer

Site Reliability Engineer

You're on a roll! Sign up now to keep applying.