Staff Site Reliability Engineer Storage

3 hours, 27 minutes ago
Full-time
Lead
DevOps and Infrastructure
Qonto

Qonto

Qonto provides a comprehensive financial management solution for small and medium-sized enterprises and freelancers, offering services such as business accounts, invoicing, bookkeeping, expense management, and financing, all supported by dedicated cust...

Banks
1K-5K
Founded 2016
$703M raised

Description

  • Assess the resilience maturity of the Kafka and Redis stacks, identify key risks, and define an improvement roadmap within the first 3 months.
  • Deliver improvements in disaster recovery readiness, safe upgrades, alerting, and capacity planning for production systems.
  • Act as an internal consultant for backend and product engineering teams by leading design reviews and advising on efficient storage usage.
  • Respond to and lead high-severity incidents on critical stateful infrastructure, mitigating impact quickly and communicating clearly.
  • Build automation, tooling, and APIs to improve developer experience and support a platform engineering approach.
  • Work closely with the Storage Lead to strengthen resilience standards across core data infrastructure.
  • Help evolve storage systems toward a seamless Platform-as-a-Service experience for developers.
  • Translate infrastructure constraints into practical guidance for engineers and stakeholders.

Requirements

  • Strong hands-on experience operating distributed infrastructure and stateful systems at scale in production.
  • Significant experience with Kafka, especially MSK, and Redis, especially ElastiCache.
  • Strong knowledge of reliability fundamentals, including disaster recovery planning, incident management, observability, and capacity planning.
  • Track record of building infrastructure as a product through automation, infrastructure as code, tooling, or DBaaS-like solutions.
  • Ability to work independently in complex and evolving production environments and make safe decisions during incidents.
  • Strong communication skills and ability to explain infrastructure constraints clearly to technical stakeholders.
  • Experience with PostgreSQL, AWS, Terraform, and Kubernetes.
  • Rigorous, detail-oriented approach with a proactive mindset.
  • Nice to have experience supporting banking-grade or highly regulated infrastructure.
  • Open to fully remote work in a distributed team environment.

Benefits

  • Full-time remote role.
  • Opportunity to work from Paris, Barcelona, Berlin, Milan, or Belgrade.
  • Unlimited access to AI tools.
  • Impactful ownership in a critical, high-stakes infrastructure role.
  • Autonomy and trust with clear goals and high standards.
  • Support from a hands-on manager focused on technical growth.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Technical Lead Manager, Release Health

Waymo Autonomous vehicles, robotics, AI, ride-hailing / mobility tech

Waymo is hiring a Systems Engineering leader to run the Behavior Test Engineering Release Health Team and help ensure the reliability and performance of its autonomous driving testing infrastructure.

1 hour, 52 minutes ago

Senior Site Reliability Engineer (SRE)

Fable 11-50 Professional Services

Fable Global is seeking a Senior Site Reliability Engineer to help ensure the reliability, scalability, and cost-efficient operation of the infrastructure behind its accessible digital products and AI-enabled capabilities.

AWS Azure CI/CD CloudFormation Datadog GCP Go Grafana Java Node.js Prometheus Python Terraform
2 hours, 57 minutes ago

Senior Database Reliability Engineer

Rithum Internet Software & Services

Rithum is hiring a Senior Database Reliability Engineer to manage and improve the availability, reliability, observability, and security of its large-scale hybrid database environment.

AWS CI/CD DynamoDB Elasticsearch MongoDB MySQL PostgreSQL PowerShell Python Redis SQL Server
2 hours, 57 minutes ago

[Job-28831] Senior DevOps / SRE, Brazil

CI&T 5K-10K Internet Software & Services

CI&T is hiring a Senior DevOps/SRE for its Flow AI platform team in Brazil to build and evolve an Internal Developer Platform that enables teams to consume infrastructure and services through secure, standardized self-service.

CI/CD GitHub Actions GitOps Helm Kubernetes Python Solid.js Terraform
3 hours, 12 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers