Stellar Cyber

Stellar Cyber

Stellar Cyber provides Next Gen SIEM Security, Network Detection, and Response platforms with AI-driven threat analysis, empowering lean security teams to secure environments effectively.

Professional Services
51-250
Founded 2017
$80M raised

Description

  • Administer and maintain Kubernetes clusters and containerized workloads.
  • Manage cloud infrastructure across OCI, AWS, GCP, or Azure environments.
  • Develop and maintain CI/CD pipelines for reliable application deployments.
  • Implement and manage infrastructure as code using Terraform and Helm.
  • Build automation tooling and operational workflows using Python, Go, or Bash.
  • Drive observability improvements across monitoring, logging, tracing, and alerting.
  • Monitor, troubleshoot, and resolve production incidents, including participation in on-call rotations.
  • Support and optimize distributed data platforms such as Kafka, Elasticsearch, Spark, Redis, and MongoDB.
  • Improve platform reliability, scalability, and operational efficiency using SRE best practices.
  • Collaborate with platform, development, and operations teams across multiple time zones.
  • Perform Linux system administration and networking troubleshooting.
  • Contribute to incident response processes, postmortems, and reliability improvements.
  • Support GitOps and deployment workflows using tools such as ArgoCD and GitHub Actions.
  • Evaluate and implement AI-assisted operational tooling for auto-remediation, alert correlation, and operational intelligence.

Requirements

  • 5+ years of experience in DevOps, SRE, or Platform Engineering roles.
  • Strong expertise with Kubernetes, Docker, and container orchestration.
  • Hands-on experience managing production cloud environments.
  • Strong infrastructure as code experience with Terraform and Helm.
  • Experience with CI/CD tools and deployment automation.
  • Advanced troubleshooting skills in Linux systems, networking, and distributed systems.
  • Experience with observability platforms including Prometheus, Grafana, Loki, Alertmanager, and Elastic Stack.
  • Strong programming and scripting skills in Python, Bash, or Go.
  • Experience supporting high-availability production systems and on-call operations.
  • Knowledge of incident management and reliability engineering practices.
  • Familiarity with data platform technologies such as Kafka, Spark, Elasticsearch, Redis, or MongoDB.
  • Understanding of AI-driven operational tooling and automated remediation concepts.
  • Excellent communication, collaboration, and problem-solving skills.
  • Must reside on the East Coast.

Benefits

  • Base compensation range of USD 165,000-215,000 per year.
  • Total compensation includes bonus opportunity and equity.
  • Pre-IPO stock options.
  • Medical, dental, and vision coverage.
  • 401(k) plan.
  • Employee Assistance Program.
  • Paid time off.
  • Employee discount program.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Site Reliability Engineer (SRE)

The Investigo Group Professional Services

The Investigo Group is hiring a Senior Site Reliability Engineer to operate and mature its production Kubernetes and OpenShift platforms across secure on-premises and hybrid environments.

Ansible Argo CD CI/CD Flux GitHub Actions GitOps Go Grafana Helm Juniper Kubernetes Linux Load Balancing Machine Learning OpenID Connect OpenShift OpenTelemetry Palo Alto Prometheus Python SAML Shell Scripting Terraform
42 minutes ago

Sustaining Engineering Lead

Actian 251-1K IT Services

Actian is hiring a remote Sustaining Engineering Lead in Europe to own end-to-end escalation handling for critical platform issues on its data intelligence platform.

CI/CD GitHub JIRA
2 hours, 5 minutes ago

Sr. DevOps Engineer II (Drupal) (6631)

MetroStar 251-1K IT Services

MetroStar is hiring a Sr. Devops Engineer II to build and support enterprise-scale Drupal platforms that deliver digital content and services while modernizing cloud-native infrastructure.

CI/CD Docker Drupal Kubernetes
2 hours, 17 minutes ago

Senior Site Reliability Engineer

Blink Health 251-1K Health Care Providers & Services

Blink Health is hiring a senior site reliability and platform engineering leader to improve the reliability, observability, and scalability of its healthcare technology infrastructure supporting prescription access products.

Agile Ansible AWS Azure Bash CloudFormation DNS GCP Go Helm Kubernetes Linux Load Balancing Microservices Pulumi Python React Secrets Management TCP/IP Terraform
3 hours, 23 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers