hatch I.T.

hatch I.T. is a specialized technology recruiting firm that supports emerging tech startups in growing their engineering, data, and product teams. They focus on candidate-centric recruiting, bridging the gap between startups and the local tech communit...

Professional Services

Industrials

11-50 (30)

Founded 2012

5 open positions

Links

View All Jobs

Site Reliability Engineer (SRE)

1 month ago

United States

Full-time

Mid Level

DevOps and Infrastructure

Ansible AWS Azure Chef CI/CD Datadog Docker Java Kubernetes Linux Microservices OpenTelemetry PostgreSQL Puppet Python Shell Scripting Terraform

Apply Now

hatch I.T.

Professional Services

11-50

Founded 2012

View All Jobs 5

Description

Ensure high availability, scalability, and performance of production systems.
Implement and maintain SLIs, SLOs, and SLAs for critical services.
Conduct capacity planning and performance tuning.
Automate infrastructure provisioning using infrastructure-as-code tools such as Terraform, Terragrunt, and Ansible.
Develop automation to reduce manual operations and improve deployment workflows.
Build and maintain CI/CD pipelines to support rapid and reliable deployments.
Design and maintain monitoring, logging, and alerting systems using Datadog.
Participate in on-call rotations and lead incident response efforts.
Perform root-cause analysis and write postmortems to prevent recurring issues.
Manage cloud infrastructure and container orchestration platforms such as AWS, Azure, Kubernetes, and ECS.
Optimize system architecture for reliability and fault tolerance.
Implement best practices for security, networking, and service resilience.
Work with development teams to design reliable microservices and distributed systems.
Advocate for SRE principles and operational excellence across engineering teams.
Mentor engineers on reliability practices, tooling, and automation strategies.

Requirements

Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
3–7 years of experience in SRE, DevOps, or Systems Engineering roles.
Strong proficiency with Linux systems and shell scripting.
Experience with cloud platforms such as AWS and Azure.
Hands-on experience with Kubernetes/ECS and container technologies such as Docker.
Proficiency in at least one programming language: Python or Java.
Experience with CI/CD pipelines and DevOps tooling.
Strong understanding of distributed systems, networking, and security fundamentals.
Strong analytical and problem-solving skills.
Excellent communication and cross-team collaboration skills.
Ability to thrive in fast-paced, high-stakes environments.
A mindset focused on continuous improvement and operational excellence.
Experience with observability stacks such as OpenTelemetry (preferred).
Knowledge of database management, especially PostgreSQL (preferred).
Experience with configuration management tools such as Ansible, Chef, or Puppet (preferred).
Familiarity with zero-downtime deployments and chaos engineering practices (preferred).

Benefits

Competitive pay.
Medical, dental, and vision insurance.
401(k) plan with company match for benefit-eligible employees.
PTO (Personal Time Off) for full-time employees.
Sick time for full-time employees.
Remote work environment.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

SRE - DevOps Engineer - Argentina

Coderio 51-250 Internet Software & Services

Coderio is hiring a remote DevOps/SRE Engineer in Argentina to ensure the stability, scalability, and efficient operation of the infrastructure that supports its global digital solutions.

Argentina Full-time Mid Level Site Reliability Engineer (SRE)

Argo CD CI/CD Flux GitHub Actions GitOps Helm Jenkins Kubernetes OpenShift Terraform

1 hour, 17 minutes ago

Apply

1 hour, 17 minutes ago

Senior Site Reliability Engineer

Cribl 251-1K IT Services

Cribl is hiring a Senior Site Reliability Engineer in Poland to help build and operate the telemetry infrastructure and observability platform that supports its cloud products and enterprise customers.

Poland Full-time Senior Site Reliability Engineer (SRE)

Ansible AWS Azure CI/CD Grafana JavaScript Kibana Linux New Relic Node.js PagerDuty Prometheus Splunk Terraform TypeScript

2 hours, 10 minutes ago

Apply

2 hours, 10 minutes ago

Site Reliability Engineer

Kaseya 1K-5K IT Services

Kaseya is hiring a Site Reliability Engineer to own the reliability, automation, and production stability of its AWS-based services used by thousands of MSPs worldwide.

Canada Full-time Mid Level Site Reliability Engineer (SRE)

$85k-$96k

Ansible AWS Chef CloudFormation Datadog DevSecOps Elasticsearch Kibana Kubernetes MySQL PostgreSQL Puppet Secrets Management Serverless Terraform

3 hours, 16 minutes ago

Apply

3 hours, 16 minutes ago

Site Reliability Engineer

Alpaca 51-250 Capital Markets

Alpaca is hiring a Site Reliability Engineer to keep its brokerage platform reliable and operable across cloud, Kubernetes, observability, messaging, and database systems, with a strong focus on PostgreSQL reliability on the trading-critical path.

Europe Full-time Mid Level Site Reliability Engineer (SRE)

DNS GitOps Go Kafka Kubernetes Linux Load Balancing PostgreSQL Python RabbitMQ Secrets Management TLS

7 hours, 14 minutes ago

Apply

7 hours, 14 minutes ago

hatch I.T.

Tags

Links

Site Reliability Engineer (SRE)

hatch I.T.

Description

Requirements

Benefits

Similar Roles

SRE - DevOps Engineer - Argentina

Senior Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

You're on a roll! Sign up now to keep applying.