Anduril Industries

Anduril Industries

Anduril Industries is an American defense technology firm that specializes in developing advanced autonomous systems for integrated awareness and security across land, sea, and air, utilizing its proprietary Lattice platform to enhance intelligence, su...

Aerospace & Defense
1K-5K
Founded 2017
$2200M raised

Description

  • Manage and expand on-premises developer servers, Hardware-in-the-Loop systems, and other on-site compute resources.
  • Design, implement, and maintain highly available, fault-tolerant, and resilient autonomous systems.
  • Identify and eliminate performance bottlenecks to ensure low-latency, high-throughput, real-time operations.
  • Develop monitoring, logging, tracing, and alerting solutions that provide visibility into system health at scale.
  • Automate operational tasks including provisioning, deployment, testing, and recovery.
  • Scale services and infrastructure to support evolving mission demands, including distributed systems and edge deployments.
  • Work with security teams to integrate best practices into operational processes and infrastructure.
  • Create documentation, runbooks, and playbooks for operational procedures.
  • Integrate open-source, commercial, and internal tooling to improve software delivery.
  • Collaborate with Developer Platform, Networking, Security, and autonomy software teams in a fast-paced, multidisciplinary environment.

Requirements

  • Bachelor of Science degree in Computer Science, Engineering, or a related field, or equivalent work experience.
  • 5+ years of experience in Site Reliability Engineering, DevOps, or a similar role focused on security for mission-critical applications.
  • Strong proficiency in at least one modern programming language such as Python or Go.
  • Experience with automation tools such as Ansible, Puppet, or Terraform.
  • Deep expertise with Linux operating systems and strong command-line skills.
  • Knowledge of secure coding practices and experience implementing security controls in cloud and on-premise environments.
  • Solid understanding of networking fundamentals including TCP/IP, DNS, HTTP, and load balancing.
  • Proficiency with Docker and Kubernetes.
  • Strong analytical, problem-solving, and debugging skills.
  • Excellent communication skills and ability to work effectively in cross-functional teams.
  • Must be a U.S. Person due to access to U.S. export-controlled information or facilities.
  • Active U.S. Security Clearance.
  • Experience with edge computing, mesh networks, or highly distributed autonomous systems (preferred).
  • Experience with embedded Linux systems development and associated tools (preferred).
  • Experience troubleshooting and analyzing remotely deployed software systems (preferred).
  • Familiarity with monitoring and logging tools such as auditd, journald, selinux, or Splunk (preferred).
  • Prior experience in defense, aerospace, robotics, or other mission-critical domains (preferred).
  • Extensive experience with cloud platforms such as AWS, Azure, or GCP (preferred).

Benefits

  • US salary range of $166,000 to $220,000.
  • Highly competitive equity grants are included in the majority of full-time offers.
  • Comprehensive, competitive benefits package available at little to no cost to employees.
  • Support for health, recovery, and future needs.
  • Full-time employee benefits with top-tier coverage.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Site Reliability Engineer

PlayOn is seeking a Senior Site Reliability Engineer to strengthen the reliability, performance, and scalability of its remote systems supporting high school sports products.

AWS Azure C++ CI/CD Datadog Docker GCP Git Go Grafana Java Kubernetes Linux Prometheus Python Terraform
12 minutes ago

Staff Site Reliability Engineer

Veeam Software 1K-5K Internet Software & Services

Veeam is hiring a Staff Site Reliability Engineer to lead reliability and observability efforts across its global platform and help shape resilient architecture and SRE practices at scale.

Azure C# Go Grafana Java JavaScript Kubernetes OpenTelemetry Prometheus Pulumi Terraform TypeScript
57 minutes ago

Site Reliability Engineer

66degrees 251-1K IT Services

66degrees is hiring a Site Reliability Engineer to help enterprise cloud clients maintain, optimize, and scale Google Cloud environments through reliability engineering, automation, and incident response.

Agile Datadog GCP JIRA Kanban Kubernetes Linux Prometheus Python Scrum SQL Server Terraform
1 hour, 14 minutes ago

Senior Site Reliability Engineer

Megaport 251-1K Diversified Telecommunication Services

Megaport is hiring a Senior Site Reliability Engineer in Australia to help keep its globally distributed network infrastructure secure, reliable, and available for customers.

AWS Bash Cassandra CI/CD ClickHouse GitHub Go Kubernetes Linux PostgreSQL Python Terraform
1 hour, 34 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers