Resilient Co

Resilient Co

Resilient Co is a technology consulting company that empowers businesses with smart solutions and diverse teams, offering resilient support in the dynamic tech industry.

Professional Services
11-50
Founded 2020

Description

  • Design, build, and operate highly reliable systems within the Purple Platform ecosystem.
  • Enable product teams to self-serve, deploy, and operate applications securely and efficiently.
  • Translate platform guardrails and policies into a strong developer experience.
  • Build and maintain cloud-native platform capabilities and automation for application delivery.
  • Develop CI/CD workflows and declarative deployment processes for GitOps-based releases.
  • Implement observability, monitoring, and incident response practices across distributed systems.
  • Write internal tools, CLIs, templates, and plug-ins that improve engineering velocity.
  • Apply security, governance, and compliance controls in platform workflows.
  • Collaborate across engineering teams to support platform operations and reliability goals.

Requirements

  • Strong Kubernetes expertise, including workloads, scaling, networking, operators, and CRDs.
  • Advanced containerization experience, including Docker multi-stage builds and security hardening.
  • Hands-on experience implementing service mesh technologies such as Istio and API gateways.
  • Experience with Infrastructure as Code using Terraform.
  • Ability to configure and troubleshoot MongoDB collections, Redis Cache, Azure Service Bus, and Azure Document Storage.
  • Strong background in C#, Python, and/or Node.js.
  • Ability to build reliable distributed applications and automation tools.
  • Experience building CI/CD pipelines and working with AI-assisted development tools.
  • Deep understanding of GitOps workflows, including Argo CD and Flux.
  • Experience with Helm, Kustomize, deployment manifests, and environment modeling.
  • Strong observability experience with Dynatrace, OpenTelemetry, App Insights, and Kusto (KQL).
  • Experience leading incident response, writing postmortems, and managing error budgets.
  • Familiarity with container scanning, SBOM tools, secure secret management, Vault/KMS, and managed identities.
  • Understanding of compliance frameworks relevant to healthcare systems.
  • Strong scripting skills in Bash, PowerShell, Python, or Go.
  • 3+ years of experience in large-scale, enterprise-grade cloud-native platforms.
  • Previous experience in SRE, Platform Engineering, DevOps, or Production Engineering roles.
  • Experience with self-service portals and cloud resource orchestration is preferred.
  • Familiarity with classification-driven policy models and governance automation is preferred.
  • Experience with Backstage or internal developer portals is a plus.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior AIOps Engineer, Incident Response [Remote-US]

Quanata 201-500 information technology & services

Quanata is hiring an experienced production operations and reliability leader to oversee production health, incident response, and operational support for its AI-driven insurance technology platform.

AWS Confluence JIRA
1 hour, 13 minutes ago

Site Reliability Engineer II

Backblaze 251-1K IT Services

Backblaze is hiring a Site Reliability Engineer II to support the stability, scalability, and reliability of customer-facing cloud storage services and the infrastructure that powers them.

Ansible AWS Azure Bash CI/CD Docker GCP Go Grafana Jenkins Kubernetes Linux Microservices Prometheus Python Terraform
1 hour, 43 minutes ago

DevOps & Site Reliability Engineer

Oowlish 51-250 Internet Software & Services

Oowlish is hiring a DevOps & Site Reliability Engineer for a remote role supporting an AI-focused SaaS startup’s infrastructure, deployment, and reliability needs.

AWS Azure Azure Pipelines Bash CI/CD CircleCI Datadog Docker GCP Grafana Helm Jenkins Kubernetes New Relic Prometheus
2 hours, 13 minutes ago

Senior Site Reliability Engineer

Cribl 251-1K IT Services

Cribl is hiring a Senior Site Reliability Engineer in Poland to help build and operate the telemetry infrastructure and observability platform that supports its cloud products and enterprise customers.

Ansible AWS Azure CI/CD Grafana JavaScript Kibana Linux New Relic Node.js PagerDuty Prometheus Splunk Terraform TypeScript
16 hours, 34 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers