Software Development Engineer III - Infrastructure

1 month ago
Full-time
Mid Level
DevOps and Infrastructure
HighLevel

HighLevel

HighLevel provides an all-in-one sales and marketing platform that agencies can white label and resell, offering tools and resources designed to help businesses consolidate their marketing efforts and achieve their growth objectives.

Internet Software & Services
251-1K
Founded 2018
$60M raised

Description

  • Participate in 24/7 on-call rotations for core infrastructure systems and execute incident response including triage, mitigation, and recovery
  • Maintain and improve runbooks, operational procedures, and escalation paths
  • Reduce mean time to recovery (MTTR) and prevent repeat incidents through engineering solutions and systemic fixes
  • Improve reliability of core infrastructure components, including Kubernetes (GKE) clusters, cloud networking, load balancing, and edge services
  • Support capacity planning, scaling, and resilience testing for production systems
  • Execute security remediations across cloud and Kubernetes environments and support security incident response and post-incident reviews
  • Support enforcement of IAM least-privilege access, network security controls, and runtime security policies, and partner on vulnerability management and remediation
  • Automate repetitive operational and security tasks and build tooling to improve incident response speed, operational visibility, and security posture enforcement
  • Support change management and governance for infrastructure/configuration changes and contribute to incident reviews, postmortems, and continuous improvement
  • Collaborate closely with Cloud Infrastructure, Platform, Data, and Security teams and mentor junior engineers while leading small reliability or security initiatives

Requirements

  • 4+ years of experience operating large-scale systems
  • Experience with GCP or other public cloud platforms
  • Production experience with Kubernetes (GKE)
  • Proven experience leading incident response or reliability initiatives
  • Strong understanding of reliability, security, and operational best practices
  • Comfortable working in on-call and incident response environments
  • Strong troubleshooting and communication skills and experience supporting/operating production systems
  • Experience mentoring junior engineers and influencing peers
  • Nice to have: Familiarity with Cloudflare, networking, or edge security
  • Nice to have: Exposure to security tooling or vulnerability management
  • Nice to have: Scripting or automation experience (Python, Go, Bash, etc.)
  • Nice to have: Experience in compliance- or audit-driven environments (SOC2, ISO)

Benefits

  • Remote-first, global work environment
  • Opportunity to work on a high-scale production platform (metrics cited for platform scale and traffic)
  • Membership in a large, distributed team across 15+ countries (growth and cross-functional collaboration opportunities)
  • Opportunities for mentorship, leadership, and professional growth within infrastructure, SRE, and security domains
  • Equal opportunity employer and inclusive hiring practices

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Infrastructure Engineer - Postgres

ClickHouse 51-250 IT Services

Senior SRE / Senior Infrastructure Engineer at ClickHouse responsible for owning reliability, automation, and operations for the company’s Postgres integration across AWS, GCP, and Azure to ensure scalable, secure, and dependable cloud data platform services.

AWS Azure CI/CD ClickHouse Docker GCP Go Grafana Kubernetes OpenTelemetry PostgreSQL Prometheus Terraform
1 month ago

Senior Field Engineer | UK | Remote

Grafana 1K-5K IT Services

Senior Field Engineering Infrastructure role at Grafana Labs responsible for maintaining and developing the pre-sales demo kit and backend infrastructure, creating technical demos and training, and enabling the Solution Engineering team to scale adoption and close deals.

AWS Azure CI/CD Datadog Elasticsearch GCP Grafana Kubernetes Prometheus Splunk Terraform
1 month ago

Cloud / Platform Engineer (Remote)

Alex Staff Agency 11-50 Professional Services

Cloud/Platform Engineer at a U.S.-based EdTech company operating a global, high-load digital learning platform, responsible for maintaining production reliability and operating multi-region cloud and Kubernetes infrastructure.

AWS Bash CI/CD GCP Go Kubernetes Python Terraform
1 month ago

Customer Reliability Engineer

Sysdig 251-1K IT Services

Customer Reliability Engineer at Sysdig (remote, flexible for Italy/Spain) delivering senior-level technical support and escalation management to ensure customers run and secure cloud/container environments reliably.

AWS Azure Bash Cassandra Elasticsearch GCP Kafka Kubernetes Linux PostgreSQL Python Shell Scripting
1 month ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers