Technical Senior Manager - Site Reliability Engineering

1 week, 4 days ago
Full-time
Lead
DevOps and Infrastructure
Coalfire

Coalfire

Coalfire is a cybersecurity advisor that helps organizations avert threats, reduce risk, and turn security into a competitive advantage, fueling their success.

Internet Software & Services
251-1K
Founded 2001
$9M raised

Description

  • Spend about 70% of time on hands-on engineering work, including developing deployments, tooling, and automation scripts for client needs.
  • Spend about 30% of time on leadership responsibilities, including mentoring engineers, ensuring quality deliverables, and managing escalations.
  • Serve as the primary escalation contact for complex technical issues and resolve them promptly.
  • Monitor engineering quality standards and ensure alignment with internal protocols, compliance requirements, and project milestones.
  • Identify and mitigate risks with consulting and solutions architecture teams to address regulatory requirements and client expectations.
  • Coordinate day-to-day engineering activities, track progress, and adjust resources to meet project goals using Agile methodologies.
  • Help create and implement solutions that improve the practice.
  • Lead the definition, planning, and documentation of key managed services projects and initiatives.
  • Track outcomes against established goals and support continuous operational improvement.

Requirements

  • 9+ years of experience in systems engineering and architecture, including requirements definition, architecture development, systems integration, and testing.
  • 9+ years of experience in cloud computing designing, implementing, operating, and automating environments in AWS, Azure, or GCP.
  • 9+ years of hands-on experience with Infrastructure-as-Code, especially Terraform and Ansible.
  • Proven track record of meeting SLAs for availability, response times, and service posture through effective escalation and collaboration.
  • Demonstrated success driving continuous improvement using KPIs and operational best practices.
  • Experience guiding Infrastructure-as-Code governance models and alignment with standards such as FedRAMP or similar security frameworks.
  • Proven experience managing teams of 6–8 contributors, including career development, goal setting, project oversight, and daily guidance.
  • Experience preparing teams for client-facing compliance audits with third-party auditors.
  • Familiarity with ticket management systems and managed services environments that require SLA adherence.
  • Deep knowledge of AWS, Azure, or GCP, plus Terraform, Ansible, GitLab, and CI/CD technologies.
  • Relevant professional cloud certification such as AWS Solutions Architect, Azure Solutions Architect, or GCP equivalent.
  • Preferred advanced cloud or specialty certifications such as AWS DevOps Engineer, AWS Security Specialty, or Azure Security Engineer.
  • Preferred CISSP or comparable cybersecurity certification.
  • Experience in technical consulting for external clients is preferred.
  • Exposure to 24x7 operational settings or large-scale, high-availability system support is preferred.
  • Demonstrated expertise with SSL, PKI, FIPS 140-2, and security baselines such as CIS Benchmarks and DISA STIG is preferred.
  • Additional hands-on work with Kubernetes, advanced threat detection, or enterprise endpoint security is preferred.

Benefits

  • $94,000 - $163,000 base salary range.
  • Eligibility for annual incentive, commission, and/or recognition programs.
  • Flexible work model with the ability to work from home or an office.
  • Paid parental leave.
  • Flexible time off.
  • Certification and training reimbursement.
  • Digital mental health and wellbeing support membership.
  • Comprehensive insurance options.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Site Reliability Engineer

Parallel Domain 51-250 Aerospace & Defense

Parallel Domain is hiring a Senior Site Reliability Engineer to operate and evolve the infrastructure that powers large-scale simulation and validation for autonomous systems in a remote role across Canada and the U.S. Pacific Northwest.

Active Directory Argo CD AWS Bash DNS Docker GitHub Actions Grafana Helm Kubernetes Linux Load Balancing Packer Prometheus Python Terraform
48 minutes ago

Site Reliability Engineer (Senior or Staff), Atlas

MongoDB 1K-5K Internet Software & Services

MongoDB is hiring a Senior Site Reliability Engineer for its Atlas team to help support, maintain, and grow a multi-cloud platform for customer-facing production workloads.

AWS Azure DNS GCP Go HTTP Linux Python Ruby TLS
2 hours, 55 minutes ago

Intermediate Site Reliability Engineer - OP02119

Dev.Pro 251-1K Internet Software & Services

Dev.Pro is hiring an IT Specialist for its SRE team to support company and client environments by maintaining infrastructure, monitoring services, and automating operations across cloud and on-premises systems.

Ansible Apache AWS Bash CI/CD DHCP DNS Docker ELK Stack GCP Git Grafana Jenkins Linux MySQL Nginx PostgreSQL Prometheus Puppet Python SQL SQL Server SSH TCP/IP TeamCity Terraform TLS Ubuntu Windows Server Zabbix
4 hours, 57 minutes ago

Manager, Software Engineering (Resilience Engineering)

Affirm 1K-5K Diversified Financial Services

Affirm is hiring an Engineering Manager to lead its Resilience Engineering team in building production load testing and chaos engineering capabilities that improve the safety and reliability of its production systems.

AWS Java Kotlin Kubernetes Python
5 hours, 56 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers