Intermediate Site Reliability Engineer, Environment Automation

3 hours, 8 minutes ago
Full-time
Junior
DevOps and Infrastructure
GitLab

GitLab

GitLab: The comprehensive DevOps platform revolutionizing software development with automation, AI workflows, and essential tools for efficient collaboration.

Internet Software & Services
1K-5K
Founded 2014

Description

  • Contribute to the design and evolution of infrastructure automation for provisioning, upgrading, and operating many GitLab environments.
  • Help debug and resolve production issues across Kubernetes clusters, GitLab components, and cloud services.
  • Assist in building and maintaining deployment and orchestration tools such as Helm Charts, omnibus-gitlab configurations, and multi-tenant workflows.
  • Automate operational tasks across the environment lifecycle, including provisioning, configuration updates, upgrades, and routine maintenance.
  • Help build and refine the observability stack for multi-tenant GitLab environments.
  • Assist in responding to platform alerts and incidents, including troubleshooting and documenting findings.
  • Support infrastructure changes, capacity expansions, and new service rollouts for Dedicated and other managed environments.
  • Develop and maintain scripts, automation tools, and infrastructure-as-code workflows.
  • Participate in the on-call rotation for production GitLab environments with support.
  • Document operational tasks, runbooks, and lessons learned to improve repeatability and reduce toil.

Requirements

  • Experience working as an SRE or in a similar role operating production infrastructure.
  • Hands-on experience with Golang; required.
  • Hands-on experience running Kubernetes-based workloads in production.
  • Familiarity with Terraform and Ansible.
  • Solid understanding of Git-based workflows and infrastructure-as-code practices.
  • Experience working in distributed systems or cloud-based production environments, ideally in SaaS or managed service settings.
  • Comfort participating in incident response and on-call rotations under guidance.
  • A proactive mindset focused on automation and documentation.
  • Comfort working asynchronously across distributed teams.
  • Interest in automating the lifecycle of many environments or tenants in parallel, even if not yet at large scale.

Benefits

  • Benefits to support health, finances, and well-being.
  • Flexible Paid Time Off.
  • Equity Compensation and Employee Stock Purchase Plan.
  • Growth and Development Fund.
  • Parental leave.
  • Home office support.
  • Team Member Resource Groups.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Site Reliability Engineer (SRE, Compute Node Team)

Nebius 51-250 Internet Software & Services

Nebius AI Cloud is hiring a Senior Site Reliability Engineer to operate and improve the Compute Node platform that runs virtual machines across global cloud regions, with a focus on Linux systems, virtualization, and operational reliability.

Kubernetes Linux System Design
23 minutes ago

Senior Site Reliability Engineer I

instacart.careers 1K-5K Internet Software & Services

Instacart is hiring a Senior Site Reliability Engineer I to help maintain and improve the reliability, performance, and scalability of its grocery delivery platform and supporting services.

AWS Azure Docker GCP Go Kubernetes Ruby
1 hour, 8 minutes ago

Network Site Reliability Engineer (NetSRE)

Nebius 51-250 Internet Software & Services

Nebius is seeking a Network Site Reliability Engineer to help operate and improve the network infrastructure that underpins its AI cloud platform as the company scales globally.

CI/CD Go Linux Python
1 hour, 23 minutes ago

Senior Manager, Software Engineering (Data & Storage Services)

Affirm 1K-5K Diversified Financial Services

Affirm is seeking an experienced Engineering Leader to lead its Online Storage team within the Data and Storage Services organization, driving scalable storage strategy and execution for critical data systems that support new products and business needs.

CockroachDB DynamoDB Memcached MySQL PostgreSQL Redis
1 hour, 38 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers