CI&T

CI&T

CI&T is a global digital technology agency empowering agile growth for leading companies through advanced technologies with a team of 2000 experts worldwide.

Internet Software & Services
5K-10K
Founded 1995

Description

  • Perform reliability, performance, and availability analysis for the application.
  • Monitor deployment issues and respond to performance or security problems as they arise.
  • Capture learnings from incidents to help prevent similar future issues.
  • Manage the task backlog proactively based on monitoring and application analysis.
  • Identify improvement opportunities and propose effective collaborative solutions.
  • Communicate clearly with teams responsible for different application journeys to align needs and priorities.
  • Stay current on cloud computing and DevOps/SRE trends, best practices, and emerging technologies.
  • Support troubleshooting efforts and continuous improvements through dashboards, tracers, and observability insights.

Requirements

  • Experience as a Site Reliability Engineer and familiarity with SRE metrics.
  • Experience monitoring backend Java applications.
  • Strong experience with FinOps practices and cloud cost management.
  • Experience with observability tools such as Datadog, Grafana, Prometheus, and Thanos.
  • Experience with AWS-based platforms such as ECS and EKS, and/or Kubernetes and Docker.
  • Experience with Linux.
  • Knowledge of GitHub, Jenkins, and Splunk is desirable.
  • Experience with CI/CD pipelines such as GitHub Actions, CodeBuild, and CodePipeline.
  • Experience with infrastructure as code using Terraform.
  • Strong analytical and problem-solving skills, with adaptability in a dynamic environment.
  • Experience with performance testing and stress testing.
  • Understanding of chaos engineering principles and failure injection scenarios.
  • Ability to troubleshoot efficiently and drive continuous improvement.
  • Preferred experience monitoring mobile applications for Android and iOS.
  • Preferred knowledge of Google Analytics and Firebase Crashlytics.
  • Preferred programming experience with Java, Shell Script, Golang, or Python.

Benefits

  • Health and dental insurance.
  • Meal and food allowances.
  • Childcare assistance.
  • Extended parental leave.
  • Wellhub (Gympass) and TotalPass fitness and wellness partnerships.
  • Profit sharing (PLR).
  • Life insurance.
  • Continuous learning platform (CI&T University).
  • Discount club and free online health, mental health, and well-being platform.
  • Parenting and pregnancy support courses.
  • Partnerships with online learning platforms and language-learning resources.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Director of Cloud Operations

Firstup 251-1K Professional Services

Firstup is hiring a Director of Cloud Operations to lead the reliability, scalability, and efficiency of its globally distributed SaaS cloud platform across AWS, while partnering with engineering, security, and product teams.

AWS CI/CD CircleCI Datadog Kubernetes Microservices .NET Serverless Terraform
6 hours, 9 minutes ago

Site Reliability Engineer (SRE)

hatch I.T. 11-50 Professional Services

CardioOne is hiring a remote Site Reliability Engineer to partner with engineering teams in keeping its healthcare platform reliable, scalable, secure, and high-performing as the company grows.

Ansible AWS Azure Chef CI/CD Datadog Docker Java Kubernetes Linux Microservices OpenTelemetry PostgreSQL Puppet Python Shell Scripting Terraform
6 hours, 25 minutes ago

Staff Site Reliability Engineer

Caseware 251-1K Internet Software & Services

Caseware is hiring a Staff Site Reliability Engineer in Romania to help build and scale its AI platform by keeping AWS, Kubernetes, and GitOps-based production systems reliable, observable, and automated.

AWS AWS CDK CI/CD Docker GitHub GitHub Actions GitOps Kubernetes Linux Load Balancing Microservices Terraform
6 hours, 40 minutes ago

Senior Infrastructure Engineer - Postgres

ClickHouse 51-250 IT Services

ClickHouse is hiring a Senior SRE / Senior Infrastructure Engineer to own reliability, automation, and operations for its multi-cloud Postgres integration and cloud data platform as it scales globally.

AWS Azure CI/CD GCP Go Grafana Kubernetes OpenTelemetry PostgreSQL Prometheus Terraform
18 hours, 9 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers