Data & AI Operations Specialist

1 month ago
Full-time
Senior
DevOps and Infrastructure
ZainTech

ZainTech

ZainTech is a regional Digital & ICT powerhouse, offering comprehensive solutions in Cloud, IoT, Cybersecurity, Big Data & Analytics, Drones, and Robotics for enterprise and government customers in the MENA region.

Internet Software & Services
51-250
Founded 2021

Description

  • Serve as the Level 3 technical lead for the AI and Data Platform estate, providing architecture, engineering, and advanced troubleshooting support across Azure and OCI.
  • Maintain and enhance the monitoring architecture for AI/ML platforms and configure advanced dashboards in Grafana and Azure Monitor for deep-tier diagnostics.
  • Manage Azure Machine Learning (AML) workspace configurations, compute targets, and Databricks cluster lifecycles, including runtime versioning and platform patching.
  • Oversee GPU resource allocation, reserved capacity planning, and cost-performance optimization to meet FinOps objectives.
  • Ensure security integration for AI services via private endpoints, VNET integration, and RBAC controls to protect sensitive citizen data.
  • Design, optimize, and remediate Azure Data Factory (ADF) and Synapse data pipelines and ETL processes.
  • Resolve complex operational and performance issues, including authentication failures, data format changes, and ETL performance bottlenecks.
  • Author and maintain step-by-step Standard Operating Procedures (SOPs) for the L1 NOC team and produce timely Root Cause Analysis (RCA) documentation after incidents.
  • Implement MLOps practices including CI/CD pipelines for model training, testing, deployment to AML endpoints, data drift detection thresholds, automated retraining triggers, and recovery runbooks/self-healing scripts.
  • Implement and maintain audit logging for AI decisions and model outputs to SIEM/vSOC and lead quarterly AI governance and compliance reviews to align with regulatory standards.

Requirements

  • Deep expertise with Azure Machine Learning (AML) and Databricks.
  • Proficiency designing and operating Azure Data Factory (ADF) and Synapse pipelines and ETL workflows.
  • Experience building reproducible infrastructure using Terraform or ARM Templates (Infrastructure-as-Code).
  • Hands-on experience with observability and diagnostics tools such as Dynatrace, Grafana, and Azure Monitor.
  • Familiarity with containerization and orchestration technologies including AKS, Istio Service Mesh, and KEDA.
  • Experience managing GPU resource allocation, reserved capacity, and cost-performance tradeoffs to meet FinOps goals.
  • Strong understanding of ITIL-aligned Incident, Change, and Problem management processes.
  • Security mindset with familiarity with NESA standards and UAE data residency requirements.
  • Ability to author complex SOPs and deliver RCA documents within 48 hours of an incident.
  • Experience implementing CI/CD for ML workflows, data drift detection, automated retraining, and recovery automation.
  • Microsoft Azure Data Scientist Associate or Azure AI Engineer Associate certification is highly preferred.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Manager, Machine Learning Engineering (Underwriting)

Affirm 1K-5K Diversified Financial Services

Affirm is hiring a Machine Learning Engineering Manager to lead an underwriting ML team building decisioning systems that optimize application outcomes and support the company’s broader machine learning strategy.

Deep Learning Machine Learning Transformers
1 minute ago

Principal Software Engineer - Vector Search - Elasticsearch

Elastic 1K-5K Internet Software & Services

Elastic is hiring a Principal Software Engineer for its globally distributed Elasticsearch Search team to advance vector similarity search and related features in Elasticsearch.

Cassandra CI/CD Elasticsearch GitHub Java Lucene MongoDB PostgreSQL Solr
16 minutes ago

Senior Machine Learning Engineer: Search Quality (Remote)

Constructor Internet Software & Services

Constructor is hiring a Senior Machine Learning Engineer to improve search relevance and quality for its global AI-first ecommerce search platform, helping major retailers deliver better product discovery at scale.

Apache Airflow Apache Spark LLM Machine Learning Python PyTorch SQL
16 minutes ago

Co-founder & Chief Technology Officer - AI ROI Measurement Platform

FutureSight 11-50 Internet Software & Services

FutureSight is seeking a Co-Founder & CTO to build and lead a new AI ROI measurement and governance platform for enterprise customers from the ground up.

GitHub JIRA LLM
1 hour, 11 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers