SupplyHouse.com

SupplyHouse.com

SupplyHouse.com is an online supplier specializing in plumbing, heating, and HVAC supplies, offering over 90,000 products at competitive prices while ensuring a low price guarantee and fast shipping to support professionals in the trades.

Building Materials
251-1K
Founded 2004

Description

  • Design, build, and maintain scalable, reliable systems on Google Cloud Platform, including Compute Engine, GKE, Cloud Storage, and Cloud SQL.
  • Develop automation for infrastructure provisioning using Terraform, Ansible, or Deployment Manager.
  • Build and maintain observability platforms for monitoring, logging, and tracing using tools such as Stackdriver, Prometheus, or Grafana.
  • Manage incident response, conduct postmortems, and implement improvements to reduce recurring issues.
  • Partner with DevOps and engineering teams to improve CI/CD pipelines for resilient deployments.
  • Define and monitor SLAs, SLOs, and SLIs to support application availability and performance.
  • Implement disaster recovery and backup strategies across cloud services.
  • Continuously optimize GCP performance, capacity, and cost-efficiency.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related field.
  • 3+ years of hands-on experience as a Site Reliability Engineer, DevOps Engineer, Systems Engineer, or Cloud Infrastructure Engineer.
  • Proven track record managing production-grade systems on Google Cloud Platform (GCP) or other cloud providers.
  • Strong understanding of Linux/Unix system administration, networking, and troubleshooting.
  • Experience implementing Infrastructure as Code (IaC) using Terraform, Ansible, or Deployment Manager.
  • Familiarity with containerization and orchestration technologies such as Docker and Kubernetes (GKE).
  • Experience with monitoring and observability tools such as Google Cloud Operations Suite, Prometheus, Grafana, Datadog, or ELK.
  • Experience defining and monitoring SLAs, SLOs, and SLIs for uptime and performance.
  • Proven ability to handle incident response, conduct postmortems, and drive root cause analysis.
  • Proficiency in at least one scripting language such as Python, Bash, or Go.
  • Hands-on experience building or managing CI/CD pipelines using Jenkins, GitLab CI, or Cloud Build.
  • Strong background in configuration management and release automation.
  • Knowledge of IAM, network security, and cloud compliance controls.
  • Familiarity with disaster recovery, backups, and high-availability design.
  • High-level proficiency in written and verbal English communication.
  • Preferred: Proven ability to optimize infrastructure performance and cost, particularly within GCP, with FinOps experience a plus.
  • Preferred: Background in capacity planning, load testing, and horizontal scaling of distributed systems.
  • Preferred: Certification as a Google Cloud Professional Cloud DevOps Engineer, Google Cloud Professional Cloud Architect, Associate Cloud Engineer, or Kubernetes CKA/CKAD.
  • Preferred: Experience implementing blue-green deployments, canary rollouts, and progressive delivery strategies.
  • Preferred: Experience working cross-functionally with software development, QA, and security teams.
  • Preferred: Ability to mentor junior engineers and establish best practices for monitoring, deployment, and incident response.

Benefits

  • Base salary of $29,000–$36,000 USD per year.
  • Comprehensive and affordable medical, dental, vision, and life insurance options.
  • Competitive Provident Fund contributions.
  • Paid time off and holidays.
  • Mental health support and wellbeing program.
  • Company-provided equipment and a one-time $250 USD work-from-home stipend.
  • $750 USD annual professional development budget.
  • Ownership for All program that shares company growth and accomplishments with team members.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Site Reliability Engineer

Anduril Industries 1K-5K Aerospace & Defense

Anduril Industries is hiring a Site Reliability Engineer for its Mission Autonomy team to support the reliability and operational excellence of autonomous systems used across cloud, hardware-in-the-loop, and air-gapped environments.

Ansible AWS Azure DNS Docker GCP Go HTTP Kubernetes Linux Load Balancing Puppet Python Splunk TCP/IP Terraform
59 minutes ago

Vice President Site Reliability Engineering (Data Centers)

Galaxy 251-1K Capital Markets

Galaxy is hiring a Site Reliability Engineering leader to own enterprise automation and infrastructure platform reliability across a hybrid environment supporting digital assets, data center operations, and AI-related compute.

Active Directory Ansible AWS Azure Bash Git GitHub Actions GitLab CI Go Grafana Jenkins Linux Packer Palo Alto PowerShell Prometheus Python Splunk Terraform Windows Server
2 hours, 18 minutes ago

Sr. Site Reliability Engineer

Obsidian Security 51-250 Internet Software & Services

Obsidian Security is hiring a Sr. Site Reliability Engineer to support the reliability and operational excellence of its multi-tenant SaaS security platform for enterprise and financial customers.

Argo CD AWS Datadog GCP GitHub Actions GitOps Grafana Helm Kubernetes Microservices Prometheus
2 hours, 31 minutes ago

Senior Production Engineer

Veeam Software 1K-5K Internet Software & Services

Veeam is hiring a Production Engineer to support the reliability, scalability, and operational excellence of its Data Cloud platform.

Azure C# CI/CD Elasticsearch Go Grafana Java JavaScript OpenTelemetry Prometheus TypeScript
2 hours, 33 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers