Site Reliability Engineer

3 hours, 10 minutes ago
Full-time
Mid Level
DevOps and Infrastructure
DEUNA

DEUNA

DEUNA is a payment orchestrator that optimizes transaction acceptance, boosts conversion rates, and minimizes fraud with over 80 methods in one integration.

Diversified Financial Services
51-250
Founded 2020

Description

  • Design, define, and maintain observability and monitoring for AWS infrastructure.
  • Define and track SLIs, SLOs, and SLAs for critical systems.
  • Improve system uptime, latency, and fault tolerance across the platform.
  • Provide internal libraries and toolsets to developers for diagnostics and debugging.
  • Manage scaling, performance, and resilience efforts related to system reliability.
  • Collaborate with technical teams on capacity planning, load testing, and scaling policies.
  • Improve production operations by defining and evolving deployment strategies.
  • Conduct disaster recovery testing and failure drills to validate system resilience.

Requirements

  • Experience with observability tools such as Prometheus, Grafana, OpenTelemetry, or AWS CloudWatch.
  • Experience designing dashboards, alerts, and log aggregation pipelines.
  • Deep understanding of AWS services including ECS, Lambda, RDS, and CodePipeline.
  • Strong proficiency in Go programming language.
  • Skilled at defining SLIs, SLOs, error budgets, and improving MTTR.
  • Experience conducting failure drills such as Chaos Monkey or Gremlin.
  • Excellent communication and collaboration skills.
  • Adaptability to thrive in dynamic, fast-paced environments.
  • Strong time management and task prioritization skills.
  • Proficiency in English.

Benefits

  • Vacations and additional PTO.
  • Remote work from anywhere.
  • Economic support for health insurance, internet, and cell phone line.
  • Stock options.
  • Learning and development platform.
  • Multidisciplinary, diverse, and dynamic team.
  • Growth and career path.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Platform Site Reliability Specialist (Observability & Kubernetes) (copy)

Everbridge 1K-5K Internet Software & Services

Everbridge is hiring a Staff Platform Site Reliability Specialist to own and evolve its enterprise observability platform and Kubernetes environment across a large-scale cloud-native AWS and GCP infrastructure.

AWS GCP Grafana Kubernetes Terraform
25 minutes ago

Senior SRE Engineer

Stellar Cyber 51-250 Professional Services

Stellar Cyber is hiring a Senior Site Reliability Engineer to strengthen the reliability, scalability, and operational excellence of its cloud-based cybersecurity platform.

Apache Spark Argo CD AWS Azure Bash Bitbucket CI/CD Elasticsearch GCP GitHub Actions Grafana Helm Kafka Kubernetes MongoDB Prometheus Python Redis Terraform
55 minutes ago

Senior Site Reliability Engineer

Airalo 51-250 Airlines

Airalo is hiring a Senior Site Reliability Engineer in its fully remote Engineering team to help scale and improve the reliability of the global eSIM platform used by millions of travellers.

Agile AWS Datadog GitHub Actions Go Java Kubernetes OpenTelemetry Prometheus Python Scrum Terraform
55 minutes ago

Senior SRE Engineer

Stellar Cyber 51-250 Professional Services

Stellar Cyber is seeking a Senior Site Reliability Engineer to strengthen the reliability, scalability, and operational excellence of its cloud-native security platforms used by enterprises, government agencies, and MSSPs.

Apache Spark Argo CD AWS Azure Bash Bitbucket CI/CD Elasticsearch GCP GitHub Actions Grafana Helm Kafka Kubernetes MongoDB Prometheus Python Redis Terraform
1 hour, 10 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers