Drivetrain

Drivetrain

Drivetrain offers a strategic finance platform designed to streamline financial planning, enhance real-time tracking of actuals, accelerate reporting processes, and support informed decision-making for finance teams in modern businesses.

Capital Markets
11-50
Founded 2021
$15M raised

Description

  • Architect, manage, and continuously optimize highly available cloud infrastructure across AWS and GCP.
  • Design, deploy, and manage scalable Kubernetes clusters and standardized deployment configurations.
  • Implement and maintain service mesh technologies to secure, control, and observe service-to-service communication.
  • Build, maintain, and optimize CI/CD pipelines with automated testing and security gates.
  • Write, review, and maintain Terraform modules to provision and manage cloud resources.
  • Develop Python scripts and tooling to automate maintenance, backups, scaling, and recovery tasks.
  • Design and enhance monitoring, logging, and alerting systems across the observability stack.
  • Own incident response, facilitate blameless postmortems, and define and enforce SLIs, SLOs, and SLAs.
  • Collaborate with software engineers to design applications for deployability, scalability, and resilience.
  • Identify system bottlenecks, contribute to process improvements, build developer tooling, and maintain documentation.

Requirements

  • 5+ years of hands-on experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles, preferably in a fast-paced SaaS environment.
  • Deep experience with AWS services including EC2, EKS, RDS, VPC, IAM, and S3.
  • Deep experience with GCP services including GKE, Compute Engine, Cloud SQL, IAM, and Cloud Storage.
  • Expert-level knowledge of Docker and Kubernetes, including advanced deployment strategies and lifecycle management.
  • Strong programming skills in Python and extensive experience with Terraform.
  • Hands-on experience building dashboards and alerting systems with Prometheus, Grafana, and ELK/EFK stacks.
  • Solid understanding of cloud networking, including VPC peering, load balancing, and DNS.
  • Understanding of zero-trust security principles in a containerized environment.
  • Experience with configuration management tools like Kustomize is preferred.
  • Experience with service mesh technologies such as Istio or Linkerd is preferred.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Observability Architect

Geotab 1K-5K Road & Rail

Geotab is hiring an SRE Observability Architect to define and lead the observability architecture for its cloud platforms, with the goal of delivering scalable, cost-efficient, and highly reliable insight across distributed systems.

Elasticsearch GCP Go Grafana Helm Jaeger Kubernetes OpenTelemetry Prometheus Python Terraform
3 hours, 52 minutes ago

Senior Site Reliability Engineer (SRE)

Sleek 251-1K Professional Services

Sleek is hiring a Senior SRE Engineer to architect and scale its cloud and AI-ready infrastructure across a multi-country, fast-growing platform serving micro SMEs.

API Gateway Argo CD AWS Azure CI/CD Cloudflare CloudFormation Flux GCP GitOps Kong Kubernetes Microservices NestJS Node.js OpenSearch OpenTelemetry Prometheus Pulumi Python Secrets Management Serverless Terraform Traefik WAF
3 hours, 52 minutes ago

[Job 30278] SRE (DevOps)

CI&T 5K-10K Internet Software & Services

CI&T is hiring a senior SRE/DevOps to evolve the infrastructure behind critical digital products, with a focus on resilient multi-region AWS architecture and mobile delivery pipelines.

Android Ansible API Gateway AWS Bash CI/CD DynamoDB GitHub Actions GitLab CI Grafana iOS Jenkins Kubernetes Prometheus Python Secrets Management Terraform
4 hours, 7 minutes ago

Senior Manager, Engineering

Sumo Logic 251-1K Internet Software & Services

Sumo Logic is hiring a Senior Manager, Engineering for Application Security to lead global programs that improve product security, reliability, and operational efficiency across its cloud platform.

Agile AWS C++ Docker GCP Java Kafka Kubernetes OWASP Ruby Scala SIEM
1 day, 4 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers