Lucidya

Lucidya

Lucidya provides a leading platform for customer experience management in the Arab World, utilizing AI-driven social media analytics and monitoring tools to enhance strategic decision-making and improve brand performance across various social channels.

Media
51-250
Founded 2016
$7M raised

Description

  • Design and maintain highly available, fault-tolerant, and scalable infrastructure.
  • Proactively identify and eliminate single points of failure before they cause incidents.
  • Manage and continuously improve cloud workloads across AWS, GCP, or Azure.
  • Use Infrastructure as Code, such as Terraform, to standardize and scale infrastructure.
  • Operate, troubleshoot, and scale Kubernetes clusters in production.
  • Implement and refine monitoring and alerting systems using tools such as Prometheus, Grafana, Datadog, or ELK.
  • Respond to incidents, lead root cause analysis, and drive follow-up improvements.
  • Write scripts and build tooling to automate repetitive operational work.
  • Collaborate with DevOps and engineering teams to resolve performance bottlenecks and improve CI/CD reliability.
  • Help define and promote reliability best practices across the organization.

Requirements

  • ~3 years of experience in SRE, DevOps, or infrastructure engineering.
  • Hands-on experience with cloud environments such as AWS, GCP, or Azure.
  • Production experience with Kubernetes and the ability to troubleshoot cluster issues.
  • Experience using Terraform or similar Infrastructure as Code tools.
  • Strong working knowledge of Docker and containerized workloads.
  • Ability to write automation scripts in Python, Bash, or similar languages.
  • Understanding of CI/CD pipelines such as Jenkins, GitHub Actions, or Bitbucket.
  • Solid grasp of networking, load balancing, and high-availability design.
  • Experience implementing observability tools such as Prometheus, Grafana, Datadog, or ELK.
  • Ability to distinguish meaningful alerts from noise and focus on actionable signals.
  • Experience with RabbitMQ or Redis in production is a plus.
  • Familiarity with Ansible or AWX is a plus.
  • Exposure to multi-cloud or hybrid environments is a plus.
  • Cloud certifications in AWS or GCP, or Linux certifications, are a plus.
  • Background from ITI (Information Technology Institute) is a plus.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer

SupplyHouse.com 251-1K Building Materials

SupplyHouse.com is hiring a full-time Site Reliability Engineer in India to support the scalability, reliability, and performance of its cloud infrastructure and applications.

Ansible Bash CI/CD Datadog Docker GCP GitLab CI Go Grafana Jenkins Kubernetes Linux Network Security Prometheus Python Terraform Unix
40 minutes ago

Site Reliability Engineer

Obsidian Security 51-250 Internet Software & Services

Obsidian Security is hiring a Site Reliability Engineer in the UK to help ensure the reliability, scalability, and operational excellence of its multi-tenant SaaS platform for enterprise and financial customers.

Argo CD AWS Datadog GCP GitHub Actions GitOps Grafana Helm Kubernetes Microservices Prometheus
2 hours, 10 minutes ago

Senior Site Reliability Engineer (SRE) - (GCP)

Devsu 51-250 Internet Software & Services

Devsu is hiring a Site Reliability Engineer to own monitoring, observability, and reliability operations for systems running across on-premises infrastructure and Google Cloud Platform, with backup support for application incidents when needed.

Bash GCP Grafana Kubernetes Linux PagerDuty Prometheus Python
4 hours, 21 minutes ago

Senior Site Reliability Engineer

Anduril Industries 1K-5K Aerospace & Defense

Anduril Industries is hiring a Site Reliability Engineer for its Mission Autonomy team to support the reliability and operational excellence of autonomous systems used across cloud, hardware-in-the-loop, and air-gapped environments.

Ansible AWS Azure DNS Docker GCP Go HTTP Kubernetes Linux Load Balancing Puppet Python Splunk TCP/IP Terraform
4 hours, 42 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers