Nice Côte d'Azur

Nice Côte d'Azur

Nice Côte d'Azur promotes tourism in the Nice Côte d'Azur region, offering resources for exploring the area and organizing weekend getaways that blend coastal and mountain experiences, while also providing interactive exploration games and multimedia c...

Hotels, Restaurants & Leisure
Founded 1960

Description

  • Act as a primary or escalation responder in a 24x7 on-call rotation.
  • Lead or support Major Incident response, including triage, mitigation, and resolution.
  • Coordinate incident work across Engineering, Infrastructure, Security, and Product teams.
  • Execute and improve runbooks, playbooks, and escalation paths.
  • Drive blameless post-incident reviews and track corrective actions.
  • Own service health monitoring across infrastructure, applications, and dependencies.
  • Design and maintain alerting strategies aligned with SLIs and SLOs.
  • Build dashboards and improve signal-to-noise to reduce alert fatigue.
  • Automate repetitive operational tasks to reduce manual toil.
  • Develop scripts and tools to support NOC/SRE workflows and enable self-healing or auto-remediation.
  • Support and troubleshoot Linux-based systems, cloud platforms, and Kubernetes/containerized environments.
  • Assist with capacity planning, availability reviews, and production release readiness.

Requirements

  • Strong Linux systems administration experience.
  • Experience with incident management and production support.
  • Familiarity with AWS, Azure, or GCP, with AWS preferred.
  • Experience with Docker and Kubernetes.
  • Scripting or programming experience in Python, Bash, Go, or similar languages.
  • Understanding of networking fundamentals such as DNS, TCP/IP, and load balancing.
  • Experience working in 24x7 NOC or production operations environments.
  • Ability to handle high-pressure incidents calmly and effectively.
  • Strong written and verbal communication skills for incident coordination.
  • Comfort working from runbooks and improving them when needed.
  • Experience defining or operating to SLOs and SLIs, preferred.
  • Prior experience migrating from a traditional NOC to an SRE model, preferred.
  • Infrastructure as Code experience with Terraform, Ansible, or similar tools, preferred.
  • Exposure to security, compliance, or regulated environments, preferred.

Benefits

  • Remote work with a #LI-Remote designation.
  • Opportunity to work for a global company with large-scale operational impact.
  • Individual contributor role with direct ownership in network operations.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Site Reliability Engineer

Alphasense 51-250 Industrial Conglomerates

AlphaSense is hiring a Staff Site Reliability Engineer to shape reliability, scalability, and performance for its AI-driven market intelligence platform and global engineering organization.

AWS Azure Datadog DNS GCP Go Grafana Kubernetes Load Balancing OpenTelemetry Prometheus Python TCP/IP
22 minutes ago

Site Reliability Engineer (Mid / Senior) - Platform Infrastructure

Elastic 1K-5K Internet Software & Services

Elastic is hiring an Infrastructure team software developer to build and operate the internal systems and production services that support the Elastic Stack and related company-wide engineering workflows.

Ansible Chef Clojure Docker Git Go Haskell Java JavaScript Kubernetes Linux Packer Puppet Python SaltStack Shell Scripting Terraform
2 hours, 19 minutes ago

Site Reliability Engineer (m/f/d)

Flip App 51-250 Internet Software & Services

Flip is hiring a Site Reliability Engineer to strengthen the reliability, scalability, and production operations of its cloud-based employee experience platform for frontline workers.

Ansible AWS Azure Chef CI/CD GCP GitOps Go Grafana Kotlin Kubernetes PostgreSQL Prometheus Pulumi Python Terraform
19 hours, 28 minutes ago

DevOps Engineer - Midnight Foundation

Input Output Global Construction & Engineering

Midnight Foundation is hiring a remote DevOps Engineer to support and scale its Cardano and Midnight blockchain infrastructure across multi-cloud and bare metal environments.

AWS Cardano CI/CD GCP GitHub Actions GitOps Go Grafana Jenkins Kubernetes Linux Prometheus Python Rust Shell Scripting
21 hours, 14 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers