Weekday

Weekday

Weekday helps companies hire engineers who are vouched by other software engineers, enabling passive income for engineers. They offer services like drafting outreach messages, shortlisting candidates, and conducting reference checks. Backed by Y Combin...

Construction & Engineering
11-50
Founded 2020

Description

  • Lead and manage SRE and infrastructure teams while fostering a culture of reliability and accountability.
  • Define and execute the infrastructure and reliability strategy aligned with business goals.
  • Oversee the design, deployment, and maintenance of scalable, highly available, and secure systems.
  • Establish and monitor SLAs, SLOs, and SLIs to ensure service performance and uptime.
  • Drive incident management processes, including root cause analysis, postmortems, and continuous improvement.
  • Collaborate with product and engineering teams to build reliability and scalability into the development lifecycle.
  • Champion automation, observability, and proactive monitoring to reduce downtime and improve system health.
  • Manage infrastructure costs, capacity planning, and resource optimization.
  • Mentor and develop engineering managers and senior engineers to build a leadership pipeline.
  • Ensure best practices in cloud infrastructure, DevOps, and security compliance are followed.

Requirements

  • 10–15 years of experience in software engineering, infrastructure, or SRE.
  • At least 3–5 years of experience in an Engineering Manager or other leadership role.
  • Proven expertise in Site Reliability Engineering principles, including reliability, scalability, and fault tolerance.
  • Strong experience with cloud platforms such as AWS, GCP, or Azure.
  • Deep understanding of infrastructure as code tools such as Terraform or CloudFormation.
  • Experience with CI/CD pipelines and containerization technologies including Docker and Kubernetes.
  • Demonstrated ability to lead and scale distributed engineering teams.
  • Strong problem-solving skills with a focus on system-level thinking and root cause analysis.
  • Experience with monitoring and observability tools such as Prometheus, Grafana, or the ELK stack.
  • Excellent stakeholder management and communication skills to influence cross-functional teams.
  • Preferred: experience managing large-scale, high-traffic production systems.
  • Preferred: background in DevOps transformation and cloud-native architecture.
  • Preferred: familiarity with security best practices and compliance frameworks.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Principal Architect - Infrastructure

Aera Technology 251-1K Internet Software & Services

Aera Technology is hiring a Principal Architect, Infrastructure to design and operate the multi-cloud foundation for its AI-powered Decision Intelligence platform, with a focus on scalability, reliability, security, and global performance.

Argo CD Azure GitHub Actions GitOps Grafana Helm Kubernetes Machine Learning MySQL OpenTelemetry Prometheus Python Ruby Terraform
2 hours, 59 minutes ago

Infrastructure Software Engineer

Mechanical Orchard 11-50 Internet Software & Services

Mechanical Orchard is hiring a remote Infrastructure Software Engineer in Canada to help build and operate infrastructure for its Generative AI platform, Imogen, as it is deployed to customer cloud environments.

Agile Bash CI/CD DevSecOps Docker Generative AI Go Helm Kubernetes LLM Terraform
3 hours, 28 minutes ago

Senior Engineering Manager - Accelerated Compute Memory Systems

Pryon 51-250 Internet Software & Services

Pryon is seeking a Senior Engineering Manager to lead its Super Compute Memory team building cloud-native ingestion, retrieval, and inference infrastructure for large-scale AI memory workloads across commercial and federal deployments.

Apache Airflow AWS Azure C++ CloudFormation Datadog GCP Go Grafana Java Kafka Kubeflow Kubernetes Machine Learning NLP Prometheus Pulumi Python PyTorch RabbitMQ Rust TensorFlow Terraform
3 hours, 28 minutes ago

Principal Cloud Infrastructure Architect*

Egen.ai IT Services

Egen is seeking a Principal Cloud Infrastructure Architect to lead enterprise cloud strategy, governance, and large-scale multi-cloud solutions across GCP and a secondary cloud platform.

AWS Azure DevSecOps EC2 GCP Generative AI GitOps HIPAA Java Python Salesforce Terraform Vertex AI
3 hours, 44 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers