Lucidya

Lucidya

Lucidya provides a leading platform for customer experience management in the Arab World, utilizing AI-driven social media analytics and monitoring tools to enhance strategic decision-making and improve brand performance across various social channels.

Media
51-250
Founded 2016
$7M raised

Description

  • Design and maintain highly available, fault-tolerant, and scalable infrastructure.
  • Proactively identify and eliminate single points of failure before they cause incidents.
  • Manage and continuously improve cloud workloads across AWS, GCP, or Azure.
  • Use Infrastructure as Code, such as Terraform, to standardize and scale infrastructure.
  • Operate, troubleshoot, and scale Kubernetes clusters in production.
  • Implement and refine monitoring and alerting systems using tools such as Prometheus, Grafana, Datadog, or ELK.
  • Respond to incidents, lead root cause analysis, and drive follow-up improvements.
  • Write scripts and build tooling to automate repetitive operational work.
  • Collaborate with DevOps and engineering teams to resolve performance bottlenecks and improve CI/CD reliability.
  • Help define and promote reliability best practices across the organization.

Requirements

  • ~3 years of experience in SRE, DevOps, or infrastructure engineering.
  • Hands-on experience with cloud environments such as AWS, GCP, or Azure.
  • Production experience with Kubernetes and the ability to troubleshoot cluster issues.
  • Experience using Terraform or similar Infrastructure as Code tools.
  • Strong working knowledge of Docker and containerized workloads.
  • Ability to write automation scripts in Python, Bash, or similar languages.
  • Understanding of CI/CD pipelines such as Jenkins, GitHub Actions, or Bitbucket.
  • Solid grasp of networking, load balancing, and high-availability design.
  • Experience implementing observability tools such as Prometheus, Grafana, Datadog, or ELK.
  • Ability to distinguish meaningful alerts from noise and focus on actionable signals.
  • Experience with RabbitMQ or Redis in production is a plus.
  • Familiarity with Ansible or AWX is a plus.
  • Exposure to multi-cloud or hybrid environments is a plus.
  • Cloud certifications in AWS or GCP, or Linux certifications, are a plus.
  • Background from ITI (Information Technology Institute) is a plus.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Operations Engineer

Mozilla 251-1K Internet Software & Services

Mozilla is hiring a Staff Operations Engineer to lead the design, reliability, and evolution of hybrid-cloud and workplace infrastructure across teams.

Ansible DNS Linux Puppet Python TCP/IP Unix
16 hours, 13 minutes ago

Principal Site Reliability Engineer (SRE)

Symmetrio Professional Services

Symmetrio is recruiting a Principal Site Reliability Engineer for a rapidly growing healthcare technology company to own the reliability, scalability, security, and performance of a mission-critical SaaS platform used by healthcare providers across the United States.

Active Directory AWS CI/CD Datadog Django Grafana Kubernetes Python Terraform Windows Server
16 hours, 29 minutes ago

Performance Test Engineer Lead

PartnerOne 51-250 Media

An enterprise performance engineering role at a cloud-focused organization, responsible for validating the scalability, stability, and production readiness of distributed systems across Azure and hybrid environments.

Azure CI/CD Kubernetes PowerShell
16 hours, 44 minutes ago

Site Reliability Engineer

MLabs 11-50 Internet Software & Services

Remote UK-hours Site Reliability Engineering role at a financial technology company, focused on automating and operating the infrastructure that supports global integration services for financial institutions.

Active Directory Ansible AWS CI/CD GCP OAuth PostgreSQL SAML
16 hours, 58 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers