Staff Reliability Engineer (Full Stack)

6 hours, 39 minutes ago
Full-time
Lead
DevOps and Infrastructure
Feeld

Feeld

Feeld is a modern dating app that caters to open-minded individuals seeking fulfilling relationships. It provides a space for curious and open-minded humans to explore intimacy, embrace desires, and connect with like-minded people. Through Feeld, users...

Family Services
51-250
Founded 2014

Description

  • Own reliability outcomes for critical backend services and their integration with React Native mobile clients.
  • Lead incident response by coordinating mitigation, diagnosing root causes, communicating status, and driving resolution.
  • Build and improve monitoring and observability through dashboards, alerts, tracing, and logging.
  • Run blameless post-incident reviews and turn learnings into durable fixes, runbooks, automation, and process updates.
  • Improve engineering safety through guardrails, safer migrations, feature-flag practices, rollout strategies, and resilience patterns.
  • Partner with product, design, QA, and engineering to align delivery plans with operational risk and reliability needs.
  • Strengthen documentation and onboarding materials such as architecture notes, service ownership docs, runbooks, and working guides.
  • Mentor engineers through pairing, code reviews, incident shadowing, and coaching on production ownership.
  • Collaborate across squads to improve production ownership, reliability, and backend-to-mobile integration patterns.

Requirements

  • Significant experience building and operating production backend systems at scale, including debugging distributed systems and performance issues.
  • Strong TypeScript/Node.js backend experience, or equivalent, with comfort working across services and APIs.
  • Proven incident response leadership experience, including on-call participation, triage, mitigation, and root-cause analysis with follow-through.
  • Solid observability skills with practical experience in logging, metrics, tracing, dashboards, and actionable alerting.
  • Experience collaborating with mobile teams on backend-to-mobile integration concerns such as API compatibility, releases, and feature flags.
  • Demonstrated staff-level IC leadership through design reviews, technical direction, documentation, and cross-team alignment.
  • React Native experience or a strong understanding of mobile architecture patterns and release constraints is preferred.
  • AWS or similar cloud experience, plus familiarity with infrastructure as code, CI/CD, and production tooling is preferred.
  • Experience designing reliability programs such as SLOs, error budgets, incident processes, and operational excellence improvements is preferred.
  • Experience with PostgreSQL, Redis, and performance tuning in high-traffic systems is preferred.
  • Experience in a high-growth environment where prioritization and pragmatic trade-offs are essential is preferred.

Benefits

  • Flexible working hours.
  • Unlimited paid time off.
  • Fully remote work.
  • Home office budget.
  • Learning and development budget.
  • On-demand therapy sessions and mental health support via Spill.
  • In-person meetups.
  • Transparent, equitable compensation with a baseline freedom salary of £60,000 GBP per year for roles below that amount.
  • Estimated market-competitive total cash compensation of £100,000 to £130,000 GBP, depending on geographic location.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Sr. IT Linux Site Reliability Engineer

SpaceX 10K-50K Aerospace & Defense

SpaceX is hiring a Sr. Linux Site Reliability Engineer to support its Linux Infrastructure team in designing, maintaining, scaling, and optimizing Kubernetes-based platforms for critical business operations.

Ansible Argo CD CI/CD CRI-O Docker Git GitOps Go Grafana Helm InfluxDB Jenkins Kubernetes Linux Prometheus Puppet Python REST API SVN Terraform Vagrant YAML
5 hours, 39 minutes ago

Senior Site Reliability Engineer

Remote 251-1K Professional Services

Remote is hiring a Senior SRE to own reliability and platform work for its fully remote global HR platform, helping translate ambiguous infrastructure challenges into robust solutions.

AWS Bash CI/CD Docker Elixir GitHub Actions GitLab CI Go Grafana Kubernetes Linux Node.js OpenTelemetry Prometheus Python Terraform
5 hours, 54 minutes ago

Senior Site Reliability Engineer

Remote 251-1K Professional Services

Remote is hiring a Senior SRE to own reliability and platform initiatives for its fully remote, async-first global engineering team.

AWS Bash CI/CD Docker Elixir GitHub Actions GitLab CI Go Grafana Kubernetes Linux Node.js OpenTelemetry Prometheus Python Terraform
6 hours, 9 minutes ago

Field Reliability Engineer- LATAM

Honeycomb.io 51-250 Internet Software & Services

Honeycomb is hiring a Platform Engineering professional to own managed services and infrastructure operations for customer-facing deployments across AWS and Kubernetes environments.

AWS Helm Kubernetes Microservices OpenTelemetry Serverless Terraform
6 hours, 9 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers