Data & AI Operations Specialist

2 months, 2 weeks ago
Full-time
Senior
DevOps and Infrastructure
ZainTech

ZainTech

ZainTech is a regional Digital & ICT powerhouse, offering comprehensive solutions in Cloud, IoT, Cybersecurity, Big Data & Analytics, Drones, and Robotics for enterprise and government customers in the MENA region.

Internet Software & Services
51-250
Founded 2021

Description

  • Serve as the Level 3 technical lead for the AI and Data Platform estate, providing architecture, engineering, and advanced troubleshooting support across Azure and OCI.
  • Maintain and enhance the monitoring architecture for AI/ML platforms and configure advanced dashboards in Grafana and Azure Monitor for deep-tier diagnostics.
  • Manage Azure Machine Learning (AML) workspace configurations, compute targets, and Databricks cluster lifecycles, including runtime versioning and platform patching.
  • Oversee GPU resource allocation, reserved capacity planning, and cost-performance optimization to meet FinOps objectives.
  • Ensure security integration for AI services via private endpoints, VNET integration, and RBAC controls to protect sensitive citizen data.
  • Design, optimize, and remediate Azure Data Factory (ADF) and Synapse data pipelines and ETL processes.
  • Resolve complex operational and performance issues, including authentication failures, data format changes, and ETL performance bottlenecks.
  • Author and maintain step-by-step Standard Operating Procedures (SOPs) for the L1 NOC team and produce timely Root Cause Analysis (RCA) documentation after incidents.
  • Implement MLOps practices including CI/CD pipelines for model training, testing, deployment to AML endpoints, data drift detection thresholds, automated retraining triggers, and recovery runbooks/self-healing scripts.
  • Implement and maintain audit logging for AI decisions and model outputs to SIEM/vSOC and lead quarterly AI governance and compliance reviews to align with regulatory standards.

Requirements

  • Deep expertise with Azure Machine Learning (AML) and Databricks.
  • Proficiency designing and operating Azure Data Factory (ADF) and Synapse pipelines and ETL workflows.
  • Experience building reproducible infrastructure using Terraform or ARM Templates (Infrastructure-as-Code).
  • Hands-on experience with observability and diagnostics tools such as Dynatrace, Grafana, and Azure Monitor.
  • Familiarity with containerization and orchestration technologies including AKS, Istio Service Mesh, and KEDA.
  • Experience managing GPU resource allocation, reserved capacity, and cost-performance tradeoffs to meet FinOps goals.
  • Strong understanding of ITIL-aligned Incident, Change, and Problem management processes.
  • Security mindset with familiarity with NESA standards and UAE data residency requirements.
  • Ability to author complex SOPs and deliver RCA documents within 48 hours of an incident.
  • Experience implementing CI/CD for ML workflows, data drift detection, automated retraining, and recovery automation.
  • Microsoft Azure Data Scientist Associate or Azure AI Engineer Associate certification is highly preferred.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior/Staff Machine Learning Engineer, Data Infrastructure

Unity 5K-10K Internet Software & Services

Unity Vector is hiring a senior data infrastructure engineer to build and evolve the offline data platform that powers machine learning training, experimentation, and large-scale analytics.

Apache Airflow Apache Spark Machine Learning Python
11 hours, 58 minutes ago

Especialista de Dados/IA

iFood 5K-10K Air Freight & Logistics

O iFood está contratando para atuar no desenvolvimento de agentes de IA e soluções de dados aplicadas à monitorização e detecção de fraudes no ecossistema de pagamentos da empresa.

Agile Apache Airflow Apache Spark AWS Git GPT LLM MLOps Python SQL
11 hours, 58 minutes ago

MLOps Engineer

Booksy 251-1K Diversified Consumer Services

Booksy is hiring a Senior ML / MLOps Engineer to help build and productionize machine learning and GenAI systems within its new Data Science & Applied AI function.

Apache Airflow CI/CD dbt GCP Generative AI LLM Machine Learning MLOps Python SQL Terraform Vertex AI
12 hours, 13 minutes ago

Machine Learning Engineer, Detection and Tracking

Helsing 51-250 Aerospace & Defense

Helsing is hiring an applied machine learning engineer to own detection and tracking models for its AI-powered drone platform, from data curation and training through deployment on edge systems.

C++ Computer Vision Generative AI Machine Learning MLflow Python PyTorch Reinforcement Learning Rust
1 day, 11 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers