Parallel Domain

Parallel Domain is a synthetic data platform that helps machines see the world through 3D simulation and generative AI. Their API offers flexibility in data capture, enabling the development, training, and testing of autonomous systems efficiently and ...

Aerospace & Defense

Industrials

51-250 (85)

Founded 2017

$44M raised

2 open positions

Links

View All Jobs

Principal Site Reliability Engineer

2 hours, 57 minutes ago

Canada

Full-time

Lead

Site Reliability Engineer (SRE)

DevOps and Infrastructure

Argo CD AWS Bash CI/CD DNS Docker Elasticsearch GitHub Actions GitOps Grafana Helm Jenkins Kubernetes Linux Load Balancing Packer Prometheus Python Terraform Windows Server

Apply Now

Parallel Domain

Aerospace & Defense

51-250

Founded 2017

$44M raised

View All Jobs 2

Description

Own and evolve AWS infrastructure to improve platform performance, availability, and support future enterprise deployment models.
Operate EKS clusters across production regions, including node pool strategy, AMI lifecycle management, autoscaling, and workload health.
Support and manage the GitOps deployment pipeline using infrastructure-as-code across multiple clusters.
Design and maintain complex networking components, including VPCs, cross-region connectivity, DNS, and load balancing.
Lead infrastructure deprecation and migration efforts with minimal disruption to services.
Own SLO measurement infrastructure and enable proactive issue triage before customer impact occurs.
Lead incident investigations, root cause analysis, and postmortems to drive systemic reliability fixes.
Design and improve automated remediation systems to reduce mean time to recovery.
Review platform architecture decisions through a security-conscious lens and own cloud IAM governance across accounts and services.
Support compliance-adjacent work, including audit readiness, partner certification requirements, and customer security questionnaires.

Requirements

5+ years of experience in SRE, DevOps, or infrastructure engineering roles.
Strong infrastructure-as-code experience, including Terraform modules, state management, and multi-environment patterns.
Deep AWS experience with services including EKS, EC2, IAM, S3, Storage Gateway, VPC networking, Transit Gateway, CloudFront, KMS, and IRSA.
Strong Kubernetes expertise, including cluster operations, node pools, probes, cordoning, pod scheduling, RBAC, Helm, and node autoscaling.
Experience with GitOps workflows and CI/CD tooling such as ArgoCD, GitHub Actions, or Jenkins.
Solid networking fundamentals, including CIDR design, security groups, DNS, load balancing, VPNs, and cross-region connectivity.
Experience with monitoring and observability tools such as Prometheus, Grafana, and Elasticsearch.
Comfort with Python and Bash for tooling and automation.
Familiarity with Linux and Windows environments; operational experience with Windows Server is a meaningful advantage.
Preferred: experience with Karpenter, Windows-based workloads on EKS, GPU workloads on Kubernetes, NVIDIA and DirectX device plugins, AWS Storage Gateway or Transfer Family, Envoy Gateway, container-optimized OS images such as Bottlerocket or Packer, and cloud cost optimization at scale.

Benefits

Remote full-time work arrangement.
Opportunity to work on high-impact infrastructure for customer-critical autonomous vehicle simulation workloads.
High-trust, high-autonomy role with real influence over infrastructure architecture and cross-team process.
Work on technically challenging systems such as multi-region GPU scheduling and Windows workloads on Kubernetes.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Site Reliability Engineer

Zeta Global 1K-5K Media

Zeta Global is hiring a Senior Site Reliability Engineer to help build and operate scalable observability and reliability systems for high-throughput distributed services processing millions of transactions daily.

United States Full-time Senior Site Reliability Engineer (SRE)

$140k-$170k

Argo CD AWS Docker GitOps Go Grafana Honeycomb Jenkins Kubernetes Microservices OpenTelemetry Prometheus Python Terraform

12 minutes ago

Apply

12 minutes ago

Senior SRE Engineer / DevOps

Margo Bank Professional Services

Senior SRE Engineer / DevOps position at a consulting team in Warsaw focused on developing an internal developer platform and establishing CI/CD standards across multiple teams.

Poland Contract Senior DevOps Engineer Site Reliability Engineer (SRE)

$0k-$0k

Bash CI/CD DevSecOps Git Kubernetes Python

12 minutes ago

Apply

12 minutes ago

Senior Site Reliability Engineer (SRE)

KOMOJU Internet Software & Services

KOMOJU is hiring a Site Reliability Engineer to own the reliability, performance, and developer experience of its cloud-based payment platform supporting merchants across cross-border integrations.

Japan Full-time Junior Site Reliability Engineer (SRE)

AWS CI/CD CircleCI Datadog GitHub Actions Go Jenkins Python Ruby Ruby on Rails Shopify TCP/IP Terraform

27 minutes ago

Apply

27 minutes ago

DevOps & Site Reliability Engineer

Oowlish 51-250 Internet Software & Services

Oowlish is hiring a DevOps & Site Reliability Engineer to support an AI-focused SaaS startup by maintaining, optimizing, and scaling the infrastructure behind its platform for high availability, performance, and reliability.

Mexico Argentina Brazil Colombia Full-time Mid Level Site Reliability Engineer (SRE)

AWS Azure Azure Pipelines Bash CI/CD CircleCI Datadog Docker GCP Grafana Helm Jenkins Kubernetes New Relic Prometheus

42 minutes ago

Apply

42 minutes ago

Parallel Domain

Tags

Links

Principal Site Reliability Engineer

Parallel Domain

Description

Requirements

Benefits

Similar Roles

Senior Site Reliability Engineer

Senior SRE Engineer / DevOps

Senior Site Reliability Engineer (SRE)

DevOps & Site Reliability Engineer

You're on a roll! Sign up now to keep applying.