Armada

Armada

Armada is a full-stack edge infrastructure company that specializes in edge computing and AI solutions tailored for remote and rugged environments. Through its Armada Edge Platform (AEP), the company provides compute, storage, connectivity, and AI/ML capabilities. Armada operates in over 100 countries, addressing operational challenges with thousands of connected assets and active users. The company focuses on simplifying data management and AI deployment in disconnected settings, making it easier for industries to leverage technology. Armada's product lineup includes Atlas, a tool for monitoring IoT devices; Galleon, ruggedized modular data centers for AI inference; Bridge, a platform for managing GPUs; and a Marketplace for hardware and software for remote operations. Armada serves various sectors, including oil and gas, public sector, manufacturing, mining, logistics, and telecommunications, enhancing safety, productivity, and automation in challenging environments.

information technology & services
201-500
Founded 2022
$226M raised

Description

  • Lead the design of a globally scalable AI control plane for GPU, storage, and network orchestration.
  • Define architectural patterns for custom Kubernetes operators that manage AI training and inference workloads.
  • Own the long-term scalability, extensibility, and evolution of the GPUaaS platform.
  • Architect hard isolation strategies across kernel, hypervisor, and hardware layers.
  • Design secure multi-tenant execution models aligned with zero-trust networking principles.
  • Drive integration strategies for VAST, Weka, and DDN storage platforms.
  • Collaborate with hardware and networking vendors to optimize RDMA, GPUDirect, and RoCE v2 traffic patterns.
  • Design and evolve VXLAN- and BGP-EVPN-based networking architectures.
  • Design, develop, and maintain custom Kubernetes operators for GPU, storage, and infrastructure automation.
  • Implement CRDs, reconciliation logic, and lifecycle management for AI workloads.
  • Define platform SLOs, capacity planning models, and GPU availability targets.
  • Establish benchmarking standards including MLPerf and custom training/inference stress tests.
  • Lead post-incident reviews, root-cause analysis, and performance optimization initiatives.
  • Set engineering standards through design reviews, architecture documentation, and technical RFCs.
  • Mentor and grow L3/L4 engineers into strong platform owners.
  • Influence and collaborate across infrastructure, security, and product teams.

Requirements

  • 10–15 years of experience in software, platform, or infrastructure engineering roles.
  • Demonstrated expertise designing and operating production-grade Kubernetes operators using Go (Kubebuilder / Operator SDK).
  • Deep understanding of Kubernetes internals, including etcd performance, API machinery, CRDs, controllers, and scheduling.
  • Proven experience building secure, multi-tenant platforms with strong isolation and zero-trust networking.
  • Strong hands-on knowledge of high-performance storage and networking, including POSIX semantics, CSI drivers, and InfiniBand / RoCE v2.
  • Experience designing infrastructure automation workflows using tools such as Ansible, Terraform, or equivalent.
  • Hands-on experience with observability and monitoring tools such as Prometheus, OpenTelemetry (OTEL), Grafana, Splunk, or similar.
  • Strong proficiency in Go and Python.
  • Excellent leadership, communication, and cross-functional collaboration skills.
  • Experience with AI serving frameworks such as vLLM, Ray Serve, or Triton Inference Server (preferred).
  • Familiarity with virtualization and lower-layer systems including VMware vSphere, OpenStack, KVM, or bare-metal provisioning (preferred).
  • Experience with GPU infrastructure, including NVIDIA DGX/HGX systems, GPU Operator, DCGM, Nsight, or performance profiling tools (preferred).
  • Exposure to distributed training systems such as PyTorch DDP, DeepSpeed, or large-scale training frameworks (preferred).

Benefits

  • Competitive base salary for India-based candidates.
  • Equity options included with compensation.
  • Opportunity to help build a transformative platform at a well-funded company.
  • Work on infrastructure deployed in over 60 countries across energy, defense, and other sectors.
  • Equal opportunity employer committed to a discrimination-free workplace.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Software Engineer, Protect

SoFi 1K-5K Capital Markets

SoFi is hiring a Senior Software Engineer for its Protect team to help build a next-generation insurance platform and shape the technical direction of a greenfield, high-impact business area.

AWS CI/CD Docker DynamoDB Git Java JavaScript Kafka Kotlin Kubernetes LLM Microservices PostgreSQL React Spring TypeScript
1 day, 2 hours ago

Software Engineer, Developer (Wallets and Onchain Tools)

Coinbase 1K-5K Capital Markets

Coinbase is hiring a software engineer for its CDP Wallets & Onchain Tools team to build developer-focused APIs, SDKs, and documentation that help accelerate crypto application development onchain.

Android Encryption Flutter GitHub Go iOS Microservices OpenAPI React React Native Solana TypeScript Unity
1 day, 2 hours ago

Software Engineer II

Veracyte 251-1K Pharmaceuticals

Veracyte is hiring a cloud engineering and application development professional for its Bioinformatics & Data Science Development team to build scalable cloud-native applications that support cancer diagnostics products and productionize research workflows.

Agile AWS AWS CDK CloudFormation Docker EC2 Kubernetes Machine Learning Microservices Node.js Python React REST API Scrum SQL Terraform Vue.js
1 day, 2 hours ago

Staff Software Engineer, C021 Security

Cribl 251-1K IT Services

Cribl is hiring a Staff Engineer for its C021 new product initiative to help design and build an emerging data platform that processes large volumes of streaming data in a fully remote, remote-first environment.

Apache Spark AWS Azure Docker Druid Flink GCP JavaScript Kafka Kubernetes Linux LLM Node.js
1 day, 2 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers