SpaceX

SpaceX

SpaceX designs, manufactures, and launches advanced rockets and spacecraft with the aim of revolutionizing space technology and enabling human life on other planets.

Aerospace & Defense
10K-50K
Founded 2002

Description

  • Build, install, manage, scale, and optimize Kubernetes and RKE clusters in production environments using Ansible, Terraform, and related technologies.
  • Collaborate with engineers to gather requirements, evaluate options, design, plan, deploy, and support Kubernetes-based software platforms.
  • Design and maintain highly resilient, high-performance, scalable, and robust systems.
  • Recommend, justify, and implement improvements using an accepted change control process.
  • Work with internal business units to solve problems and deliver creative solutions in a timely, proactive manner.
  • Define, document, and follow standards and best practices for systems design, testing, and implementation.
  • Foster collaboration and cross-training to build Kubernetes expertise across the team.
  • Drive scripting, self-service, and automation to reduce administrative overhead and TOIL.
  • Participate in an on-call rotation for urgent after-hours support when necessary.

Requirements

  • Bachelor’s degree in Computer Science or a STEM discipline and 5+ years of systems engineering experience, or 7+ years of systems engineering experience in lieu of a degree.
  • Experience deploying and supporting Linux servers in physical and virtualized environments, including via automation.
  • Experience with the Linux shell and configuring/extending Linux instances, including kernel modules, cgroups, PKI, iptables, and network interfaces.
  • Experience supporting and scaling containerized applications in Linux environments.
  • Experience using automation frameworks such as Ansible and Terraform to manage provisioning and post-provisioning infrastructure lifecycles and Kubernetes installations.
  • Willingness to work extended hours and weekends as needed.
  • Must meet ITAR export-control requirements as a U.S. citizen/national, lawful permanent resident, refugee, asylee, or otherwise eligible for required U.S. Department of State authorization.
  • Experience with high-availability, fault-tolerant, performance-tuned systems and metrics/monitoring is preferred.
  • Experience with Git, Subversion, and Git-based workflows such as Pull Requests is preferred.
  • Strong understanding of Linux Container Runtime is preferred.
  • Experience with Infrastructure as Code, CI/CD, and GitOps tools such as AWX/Tower, Vagrant, Puppet, Redfish, Jenkins, cloud-init, and ArgoCD is preferred.
  • Experience writing test automation for Kubernetes deployments and automation processes is preferred.
  • Experience with Python or Golang and RESTful API integrations is preferred.
  • Experience troubleshooting Kubernetes internals and plugins such as CNI, CRI, CSI, Docker, Cri-O, Ceph, Cilium, MetalLB, Istio, and rook-ceph is preferred.
  • Experience developing Kubernetes extensions such as webhooks, controllers, operators, and sidecars is preferred.
  • Experience building monitoring and alerting workflows with Prometheus, Grafana, and InfluxDB or similar tools is preferred.
  • Experience with templating tools and formats such as Jinja, Jsonnet, YAML, and Helm is preferred.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Site Reliability Engineer

Remote 251-1K Professional Services

Remote is hiring a Senior SRE to own reliability and platform work for its fully remote global HR platform, helping translate ambiguous infrastructure challenges into robust solutions.

AWS Bash CI/CD Docker Elixir GitHub Actions GitLab CI Go Grafana Kubernetes Linux Node.js OpenTelemetry Prometheus Python Terraform
4 hours, 47 minutes ago

Senior Site Reliability Engineer

Remote 251-1K Professional Services

Remote is hiring a Senior SRE to own reliability and platform initiatives for its fully remote, async-first global engineering team.

AWS Bash CI/CD Docker Elixir GitHub Actions GitLab CI Go Grafana Kubernetes Linux Node.js OpenTelemetry Prometheus Python Terraform
5 hours, 2 minutes ago

Field Reliability Engineer- LATAM

Honeycomb.io 51-250 Internet Software & Services

Honeycomb is hiring a Platform Engineering professional to own managed services and infrastructure operations for customer-facing deployments across AWS and Kubernetes environments.

AWS Helm Kubernetes Microservices OpenTelemetry Serverless Terraform
5 hours, 2 minutes ago

Staff Reliability Engineer (Full Stack)

Feeld 51-250 Family Services

Feeld is hiring a Staff Reliability Engineer (Full Stack) to improve the reliability and operability of its production backend and mobile-integrated systems within a distributed Platform team.

Agile AWS CI/CD Node.js PostgreSQL React Native Redis TypeScript
5 hours, 32 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers