hatch I.T.

hatch I.T.

hatch I.T. is a specialized technology recruiting firm that supports emerging tech startups in growing their engineering, data, and product teams. They focus on candidate-centric recruiting, bridging the gap between startups and the local tech communit...

Professional Services
11-50
Founded 2012

Description

  • Ensure high availability, scalability, and performance of production systems.
  • Implement and maintain SLIs, SLOs, and SLAs for critical services.
  • Conduct capacity planning and performance tuning.
  • Automate infrastructure provisioning using infrastructure-as-code tools such as Terraform, Terragrunt, and Ansible.
  • Develop automation to reduce manual operations and improve deployment workflows.
  • Build and maintain CI/CD pipelines to support rapid and reliable deployments.
  • Design and maintain monitoring, logging, and alerting systems using Datadog.
  • Participate in on-call rotations and lead incident response efforts.
  • Perform root-cause analysis and write postmortems to prevent recurring issues.
  • Manage cloud infrastructure and container orchestration platforms such as AWS, Azure, Kubernetes, and ECS.
  • Optimize system architecture for reliability and fault tolerance.
  • Implement best practices for security, networking, and service resilience.
  • Work with development teams to design reliable microservices and distributed systems.
  • Advocate for SRE principles and operational excellence across engineering teams.
  • Mentor engineers on reliability practices, tooling, and automation strategies.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
  • 3–7 years of experience in SRE, DevOps, or Systems Engineering roles.
  • Strong proficiency with Linux systems and shell scripting.
  • Experience with cloud platforms such as AWS and Azure.
  • Hands-on experience with Kubernetes/ECS and container technologies such as Docker.
  • Proficiency in at least one programming language: Python or Java.
  • Experience with CI/CD pipelines and DevOps tooling.
  • Strong understanding of distributed systems, networking, and security fundamentals.
  • Strong analytical and problem-solving skills.
  • Excellent communication and cross-team collaboration skills.
  • Ability to thrive in fast-paced, high-stakes environments.
  • A mindset focused on continuous improvement and operational excellence.
  • Experience with observability stacks such as OpenTelemetry (preferred).
  • Knowledge of database management, especially PostgreSQL (preferred).
  • Experience with configuration management tools such as Ansible, Chef, or Puppet (preferred).
  • Familiarity with zero-downtime deployments and chaos engineering practices (preferred).

Benefits

  • Competitive pay.
  • Medical, dental, and vision insurance.
  • 401(k) plan with company match for benefit-eligible employees.
  • PTO (Personal Time Off) for full-time employees.
  • Sick time for full-time employees.
  • Remote work environment.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

SRE - DevOps Engineer - Argentina

Coderio 51-250 Internet Software & Services

Coderio is hiring a remote DevOps/SRE Engineer in Argentina to ensure the stability, scalability, and efficient operation of the infrastructure that supports its global digital solutions.

Argo CD CI/CD Flux GitHub Actions GitOps Helm Jenkins Kubernetes OpenShift Terraform
1 hour, 17 minutes ago

Senior Site Reliability Engineer

Cribl 251-1K IT Services

Cribl is hiring a Senior Site Reliability Engineer in Poland to help build and operate the telemetry infrastructure and observability platform that supports its cloud products and enterprise customers.

Ansible AWS Azure CI/CD Grafana JavaScript Kibana Linux New Relic Node.js PagerDuty Prometheus Splunk Terraform TypeScript
2 hours, 10 minutes ago

Site Reliability Engineer

Kaseya 1K-5K IT Services

Kaseya is hiring a Site Reliability Engineer to own the reliability, automation, and production stability of its AWS-based services used by thousands of MSPs worldwide.

Ansible AWS Chef CloudFormation Datadog DevSecOps Elasticsearch Kibana Kubernetes MySQL PostgreSQL Puppet Secrets Management Serverless Terraform
3 hours, 16 minutes ago

Site Reliability Engineer

Alpaca 51-250 Capital Markets

Alpaca is hiring a Site Reliability Engineer to keep its brokerage platform reliable and operable across cloud, Kubernetes, observability, messaging, and database systems, with a strong focus on PostgreSQL reliability on the trading-critical path.

DNS GitOps Go Kafka Kubernetes Linux Load Balancing PostgreSQL Python RabbitMQ Secrets Management TLS
7 hours, 14 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers