Intermediate Site Reliability Engineer, Cloud Cost Utilization

3 days, 4 hours ago
Full-time
Mid Level
DevOps and Infrastructure
GitLab

GitLab

GitLab: The comprehensive DevOps platform revolutionizing software development with automation, AI workflows, and essential tools for efficient collaboration.

Internet Software & Services
1K-5K
Founded 2014

Description

  • Design and maintain cloud resource tagging and labeling strategies across GCP and AWS to support accurate cost attribution.
  • Develop tooling and pipelines to ingest, normalize, and report on cloud billing data using the FOCUS specification.
  • Automate cost anomaly detection, forecasting, and alerting for infrastructure spend.
  • Contribute to observability and monitoring stacks, including Prometheus, LGTM, and ELK, to surface cost efficiency signals.
  • Partner with Finance and Engineering leadership to support cloud cost forecasting for planning and budget discussions.
  • Act as a subject matter expert for cloud cost attribution, tagging strategy, and FOCUS adoption across GitLab Infrastructure.
  • Collaborate with Finance and Compliance teams on audits, certifications, and financial reporting needs related to cloud infrastructure usage.
  • Contribute to infrastructure-as-code efforts using Terraform and Ansible to embed cost controls and tagging requirements into provisioning workflows.
  • Improve cloud billing data quality and develop standards and workflows that help teams understand the real cost of the services they run.
  • Work through technical and organizational ambiguity and connect infrastructure data with business context to help teams act on cost signals.

Requirements

  • Hands-on experience with cloud cost management in GCP and/or AWS, including billing data, pricing models, and optimization approaches.
  • Familiarity with, or interest in adopting, the FinOps FOCUS specification for multi-cloud cost analysis.
  • Experience designing or implementing cloud resource tagging and labeling strategies and improving adoption across teams.
  • Comfort working across technical and business functions, including Engineering, Finance, and other stakeholders.
  • Experience with infrastructure as code, including Terraform and Ansible.
  • Familiarity with observability tooling, including Grafana, and an understanding of how reliability and cost signals can be connected.
  • Ability to explain technical cost data clearly to non-engineering audiences and support informed decision-making.
  • A self-directed approach to work, with comfort operating in a fully remote and asynchronous environment.
  • All team members are expected to incorporate AI into their daily workflows to drive efficiency, innovation, and impact.
  • Candidates with varying levels of experience are welcome, and applicants are encouraged to apply even if they do not meet every requirement.

Benefits

  • Benefits to support health, finances, and well-being.
  • Flexible Paid Time Off.
  • Team Member Resource Groups.
  • Equity Compensation and Employee Stock Purchase Plan.
  • Growth and Development Fund.
  • Parental leave.
  • Home office support.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Site Reliability Engineer, Fabric

MongoDB 1K-5K Internet Software & Services

MongoDB is hiring a Site Reliability Engineer for its Fabric team to build and operate the multi-cloud networking infrastructure that keeps service-to-service communication secure, reliable, and globally connected.

AWS Azure CDN DNS GCP Kubernetes Load Balancing MongoDB TCP/IP TLS
10 hours, 19 minutes ago

Staff Site Reliability Engineer, Fabric

MongoDB 1K-5K Internet Software & Services

MongoDB is hiring a Site Reliability Engineer for its Fabric team to build and operate the multi-cloud network infrastructure that enables secure, reliable communication between services and the public internet.

AWS Azure DNS GCP Kubernetes Load Balancing TCP/IP TLS
14 hours, 31 minutes ago

Staff Site Reliability Engineer, Database

Alpaca 51-250 Capital Markets

Alpaca is hiring a Site Reliability Engineer to keep its large-scale brokerage infrastructure reliable, scalable, and high-performing across database and production systems.

Go Linux PostgreSQL Prometheus
14 hours, 31 minutes ago

Senior Site Reliability Engineer I

Sumo Logic 251-1K Internet Software & Services

Sumo Logic is hiring a Senior Site Reliability Engineer I in San Jose, Costa Rica (remote) to own the availability and operational excellence of its planet-scale observability and security products.

Ansible AWS CI/CD Go Java Jenkins Kafka Kanban Kubernetes Linux Microservices Python Scala Scrum Terraform
14 hours, 44 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers