Caseware

Caseware

CaseWare International Inc. provides cutting-edge software solutions for accounting firms, corporations, and governments, enabling users worldwide to work smarter and transform insights into impact.

Internet Software & Services
251-1K
Founded 1988

Description

  • Maintain reliable, high-performing AWS production systems.
  • Manage EKS clusters for configuration, scaling, and workload stability.
  • Set up and support Istio service mesh for traffic control and security.
  • Oversee GitOps workflows to ensure secure and consistent infrastructure changes.
  • Create automation tools and platform enhancements.
  • Design, implement, and manage monitoring, logging, and tracing solutions for AI workloads, microservices, and data pipelines.
  • Respond to incidents, perform root cause analysis, and recommend lasting solutions.
  • Work with developers and platform teams to improve deployments and system operations.
  • Support nx-based monorepos to enable scalable developer workflows.
  • Participate in an on-call rotation.

Requirements

  • Experience as a Site Reliability Engineer or in a similar infrastructure-focused role.
  • Solid software engineering skills and practical experience operating modern cloud-native infrastructure.
  • Deep understanding of AWS production services such as EKS, EC2, IAM, networking, and load balancing.
  • Professional Kubernetes experience, including EKS, autoscaling, networking, RBAC, and cluster operations.
  • Hands-on experience with service meshes, specifically Istio.
  • Experience with GitHub, GitHub Actions, and modern CI/CD workflows.
  • Experience working with monorepos, especially nx.
  • Understanding of GitOps practices, with Flux CD experience preferred.
  • Strong grasp of Linux systems, networking, containers, and Docker.
  • Familiarity with infrastructure-as-code tools such as CDK and Terraform.
  • Knowledge of SLOs, error budgets, incident management, and production readiness best practices.
  • Strong English communication and collaboration skills.
  • Excellent communication, analytical thinking, and problem-solving abilities.
  • Bias toward ownership, clarity, and operational excellence.

Benefits

  • Fully remote work from Romania.
  • Flexible work options and remote opportunities.
  • Generous time-off policies.
  • Competitive compensation and comprehensive benefits.
  • Performance bonuses.
  • Opportunities for career growth.
  • Collaborative, inclusive team culture with knowledge sharing.
  • Work on international projects with a diverse global team.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

[Job-28557] Senior SRE, Brazil

CI&T 5K-10K Internet Software & Services

CI&T is hiring a Senior SRE in Brazil to support a cloud-based application project with a strong focus on reliability, observability, and proactive operational ownership.

Android AWS Datadog Docker GitHub GitHub Actions Go Google Analytics Grafana iOS Java Jenkins Kubernetes Linux Prometheus Python Splunk Terraform
5 hours, 47 minutes ago

Director of Cloud Operations

Firstup 251-1K Professional Services

Firstup is hiring a Director of Cloud Operations to lead the reliability, scalability, and efficiency of its globally distributed SaaS cloud platform across AWS, while partnering with engineering, security, and product teams.

AWS CI/CD CircleCI Datadog Kubernetes Microservices .NET Serverless Terraform
6 hours, 17 minutes ago

Site Reliability Engineer (SRE)

hatch I.T. 11-50 Professional Services

CardioOne is hiring a remote Site Reliability Engineer to partner with engineering teams in keeping its healthcare platform reliable, scalable, secure, and high-performing as the company grows.

Ansible AWS Azure Chef CI/CD Datadog Docker Java Kubernetes Linux Microservices OpenTelemetry PostgreSQL Puppet Python Shell Scripting Terraform
6 hours, 32 minutes ago

Senior Infrastructure Engineer - Postgres

ClickHouse 51-250 IT Services

ClickHouse is hiring a Senior SRE / Senior Infrastructure Engineer to own reliability, automation, and operations for its multi-cloud Postgres integration and cloud data platform as it scales globally.

AWS Azure CI/CD GCP Go Grafana Kubernetes OpenTelemetry PostgreSQL Prometheus Terraform
18 hours, 17 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers