CI&T

CI&T

CI&T is a global digital technology agency empowering agile growth for leading companies through advanced technologies with a team of 2000 experts worldwide.

Internet Software & Services
5K-10K
Founded 1995

Description

  • Design, implement, and evolve CI/CD pipelines for .NET and Next.js applications to support fast, secure, and traceable releases.
  • Manage and improve container infrastructure with Docker and Kubernetes, including deployments, autoscaling, and resource management.
  • Implement and maintain the product observability stack, including metrics, logs, traces, and operational dashboards.
  • Build and maintain SRE dashboards covering SLIs, SLOs, and error budgets.
  • Configure proactive alerts and runbooks for incident response.
  • Collaborate with developers on code instrumentation standards, including structured logs and distributed traces.
  • Work with AWS and infrastructure security practices to support a reliable production environment.
  • Support QA in running automated tests in ephemeral, container-isolated environments.
  • Contribute to engineering culture through runbooks, post-mortems, and continuous process improvement.
  • Investigate and resolve incidents with urgency and clear communication across technical and business teams.

Requirements

  • Solid experience with CI/CD tools such as GitHub Actions, GitLab CI, Azure DevOps, or equivalent.
  • Strong hands-on experience with Docker and Kubernetes in production, including deployments, services, ingress, HPA, and namespaces.
  • Experience with AWS, especially EKS, ECR, Secrets Manager, IAM, and WAF.
  • Knowledge of observability tools such as Datadog, Grafana, Prometheus, OpenTelemetry, or similar.
  • Experience building operational dashboards focused on availability, latency, errors, and saturation using RED, USE, or Four Golden Signals models.
  • Familiarity with infrastructure as code tools such as Terraform, Pulumi, or CDK.
  • Knowledge of database monitoring for connection health, slow queries, and locks.
  • Understanding of infrastructure security practices such as secrets rotation, least privilege, and network policies.
  • Ability to read and understand .NET/C#, TypeScript, and Next.js code to support instrumentation and troubleshooting.
  • Preferred experience with service mesh technologies such as Istio or Linkerd.
  • Preferred knowledge of distributed tracing tools such as Jaeger, Tempo, or Datadog APM.
  • Preferred experience with incident management and creating runbooks and operational playbooks.
  • Preferred experience with performance and load testing tools such as k6 or Gatling integrated into CI/CD pipelines.
  • Preferred experience working in multi-tenant environments and isolating observability by client.

Benefits

  • Health and dental insurance.
  • Meal and food allowance.
  • Childcare assistance.
  • Extended parental leave.
  • Gym and wellness partnerships through Wellhub (Gympass) and TotalPass.
  • Profit sharing (PLR).
  • Life insurance.
  • Continuous learning platform (CI&T University) and partnerships with online course and language-learning platforms.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Lead Site Reliability Engineer - 10929

Coupa Software 1K-5K Internet Software & Services

Coupa is hiring a Lead Site Reliability Engineer to support and evolve its cloud and GenAI platform infrastructure, with a focus on reliability, automation, and scalable operations.

AWS Azure Bash Chef DNS GCP Git GitHub Actions Helm Kubernetes Linux LLM MySQL New Relic PagerDuty Python SageMaker Terraform
3 hours, 58 minutes ago

Site Reliability Engineer (Remote)

Libertex Group 251-1K Capital Markets

Libertex Group is hiring an SRE Engineer to support and improve the reliability, performance, and availability of its large-scale production systems for its online trading platform.

Ansible Apache Airflow AWS Azure Bash CDN CI/CD DNS Docker GCP GitLab Grafana HTTP Jenkins Kubernetes PowerShell Prometheus Python SQL SQL Server
4 hours, 20 minutes ago

Senior AIOps Engineer, Incident Response [Remote-US]

Quanata 201-500 information technology & services

Quanata is hiring an experienced production operations and reliability leader to oversee production health, incident response, and operational support for its AI-driven insurance technology platform.

AWS Confluence JIRA
7 hours, 15 minutes ago

Senior Site Reliability Engineer

Amwell 1K-5K Diversified Telecommunication Services

Amwell is hiring a Senior Systems Engineer to support and automate infrastructure across its data center and cloud environments for telehealth services.

Active Directory Ansible AWS Azure Bash Elasticsearch ELK Stack GCP Kibana Linux Logstash PowerShell Puppet Python TCP/IP Terraform
10 hours, 28 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers