Flip App

Flip App

Flip is the employee app reshaping workplace communication by empowering every employee with a digital workspace for effective communication and workflow management.

Internet Software & Services
51-250
Founded 2018

Description

  • Own critical reliability domains end-to-end within the Platform Squad.
  • Drive technical direction and architectural decisions for the platform.
  • Help evolve cloud infrastructure on Azure and Kubernetes for high throughput and high availability.
  • Define and improve the platform’s resilience strategy, including scaling, zero-downtime deployments, rollback mechanisms, and disaster recovery.
  • Improve the observability stack built around Loki, Grafana, Tempo, and Mimir.
  • Reduce infrastructure toil by making the IaC platform more self-service for engineering teams.
  • Lead platform-related major incidents and drive blameless post-mortems.
  • Coach teammates, run RFCs and design reviews, and mentor engineers within the squad.
  • Partner with the squad to shape the platform roadmap and direction.

Requirements

  • 5+ years of hands-on experience as an SRE, Platform Engineer, DevOps Engineer, Infrastructure Engineer, Cloud Engineer, or Backend Engineer with a strong infrastructure focus.
  • Proven track record of building and operating high-throughput, highly available production systems.
  • Deep production-level experience with Kubernetes on any hyperscaler.
  • Strong experience with modern observability stacks such as Prometheus, Mimir, VictoriaMetrics, Dash0, Loki, or ELK, plus a clear point of view on SLIs, SLOs, and error budgets.
  • Solid software development skills in Go, strongly preferred because the IaC runs on Pulumi in Go, or Python.
  • Hands-on experience with Infrastructure as Code tools such as Pulumi, OpenTofu, or Terraform, plus GitOps tools such as ArgoCD and CI/CD pipeline design.
  • Demonstrated ability to lead complex infrastructure initiatives from design to production, including writing RFCs and driving architecture decisions.
  • Experience mentoring engineers and raising the technical bar within a team.
  • Comfortable owning major incidents end-to-end and turning learnings into systemic change.
  • Strong communication skills and business-fluent English.
  • Willingness to participate in on-call rotations.
  • Preferred: experience rolling out production-ready API gateways with Gateway API such as Envoy Gateway.
  • Preferred: experience operating multi-cluster service meshes such as Cilium, Linkerd, or Istio.
  • Preferred: experience deploying and maintaining Kubernetes Operators such as Strimzi or CNPG.
  • Preferred: experience operating highly available PostgreSQL in production.

Benefits

  • Remote-first work with flexibility to work from home.
  • Occasional team events, workshops, or meetings in the Berlin or Stuttgart offices with plenty of notice.
  • E-Gym-Wellpass membership covered by the company.
  • Job bike leasing.
  • Regular team events and culture days.
  • Option to work abroad within the European Union.
  • Relaxed working atmosphere with highly motivated and committed colleagues.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Operations Engineer

Mozilla 251-1K Internet Software & Services

Mozilla is hiring a Staff Operations Engineer to lead the design, reliability, and evolution of hybrid-cloud and workplace infrastructure across teams.

Ansible DNS Linux Puppet Python TCP/IP Unix
15 hours, 26 minutes ago

Principal Site Reliability Engineer (SRE)

Symmetrio Professional Services

Symmetrio is recruiting a Principal Site Reliability Engineer for a rapidly growing healthcare technology company to own the reliability, scalability, security, and performance of a mission-critical SaaS platform used by healthcare providers across the United States.

Active Directory AWS CI/CD Datadog Django Grafana Kubernetes Python Terraform Windows Server
15 hours, 41 minutes ago

Performance Test Engineer Lead

PartnerOne 51-250 Media

An enterprise performance engineering role at a cloud-focused organization, responsible for validating the scalability, stability, and production readiness of distributed systems across Azure and hybrid environments.

Azure CI/CD Kubernetes PowerShell
15 hours, 56 minutes ago

Site Reliability Engineer

MLabs 11-50 Internet Software & Services

Remote UK-hours Site Reliability Engineering role at a financial technology company, focused on automating and operating the infrastructure that supports global integration services for financial institutions.

Active Directory Ansible AWS CI/CD GCP OAuth PostgreSQL SAML
16 hours, 11 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers