Data & AI Operations Specialist

11 hours, 18 minutes ago
Full-time
Senior
DevOps and Infrastructure
ZainTech

ZainTech

ZainTech is a regional Digital & ICT powerhouse, offering comprehensive solutions in Cloud, IoT, Cybersecurity, Big Data & Analytics, Drones, and Robotics for enterprise and government customers in the MENA region.

Internet Software & Services
51-250
Founded 2021

Description

  • Serve as the Level 3 technical lead for the AI and Data Platform estate, providing architecture, engineering, and advanced troubleshooting support across Azure and OCI.
  • Maintain and enhance the monitoring architecture for AI/ML platforms and configure advanced dashboards in Grafana and Azure Monitor for deep-tier diagnostics.
  • Manage Azure Machine Learning (AML) workspace configurations, compute targets, and Databricks cluster lifecycles, including runtime versioning and platform patching.
  • Oversee GPU resource allocation, reserved capacity planning, and cost-performance optimization to meet FinOps objectives.
  • Ensure security integration for AI services via private endpoints, VNET integration, and RBAC controls to protect sensitive citizen data.
  • Design, optimize, and remediate Azure Data Factory (ADF) and Synapse data pipelines and ETL processes.
  • Resolve complex operational and performance issues, including authentication failures, data format changes, and ETL performance bottlenecks.
  • Author and maintain step-by-step Standard Operating Procedures (SOPs) for the L1 NOC team and produce timely Root Cause Analysis (RCA) documentation after incidents.
  • Implement MLOps practices including CI/CD pipelines for model training, testing, deployment to AML endpoints, data drift detection thresholds, automated retraining triggers, and recovery runbooks/self-healing scripts.
  • Implement and maintain audit logging for AI decisions and model outputs to SIEM/vSOC and lead quarterly AI governance and compliance reviews to align with regulatory standards.

Requirements

  • Deep expertise with Azure Machine Learning (AML) and Databricks.
  • Proficiency designing and operating Azure Data Factory (ADF) and Synapse pipelines and ETL workflows.
  • Experience building reproducible infrastructure using Terraform or ARM Templates (Infrastructure-as-Code).
  • Hands-on experience with observability and diagnostics tools such as Dynatrace, Grafana, and Azure Monitor.
  • Familiarity with containerization and orchestration technologies including AKS, Istio Service Mesh, and KEDA.
  • Experience managing GPU resource allocation, reserved capacity, and cost-performance tradeoffs to meet FinOps goals.
  • Strong understanding of ITIL-aligned Incident, Change, and Problem management processes.
  • Security mindset with familiarity with NESA standards and UAE data residency requirements.
  • Ability to author complex SOPs and deliver RCA documents within 48 hours of an incident.
  • Experience implementing CI/CD for ML workflows, data drift detection, automated retraining, and recovery automation.
  • Microsoft Azure Data Scientist Associate or Azure AI Engineer Associate certification is highly preferred.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Machine Learning Engineer (Remote, Full-Time) [AS207]

Smart Working Internet Software & Services

Machine Learning Engineer at Smart Working (remote) responsible for architecting, building, and maintaining production-grade ML systems to power ranking, recommendation, and forecasting and to bridge experimentation into reliable, scalable production services.

AWS dbt Machine Learning MLOps Python Snowflake SQL
1 hour, 48 minutes ago

Senior AI-Enabled DevOps Engineer

PointClickCare 1K-5K Health Care Providers & Services

Senior DevOps Engineer at PointClickCare (Remote, USA) responsible for designing, building, and operating scalable cloud infrastructure and developer platforms that support application and AI-driven workloads while improving reliability and developer velocity.

Argo CD AWS Azure Bash Datadog DevSecOps Docker GCP Git GitHub Actions GitLab CI GitOps Go Grafana Jenkins Kubernetes Microservices Prometheus Python Scrum Terraform
2 hours, 33 minutes ago

ML/AI Ops Engineer

Veeam Software 1K-5K Internet Software & Services

ML/AI Ops Engineer at Veeam responsible for owning the end-to-end operationalization of ML/AI solutions to move models from research into scalable, reliable production and integrate intelligence into Veeam’s digital tools and services.

Apache Spark Azure Databricks Docker Feature Engineering Kubernetes MLflow Python REST API Salesforce SQL Tableau
8 hours, 33 minutes ago

AI/ML Engineer II

Precision Medicine Group 251-1K Pharmaceuticals

Precision AQ is seeking an AI/ML Engineer in India to design, develop, deploy, and scale production-grade AI/ML solutions that support oncology access, analytics, and AI-enabled productized services.

AWS Azure Deep Learning Docker Feature Engineering GCP Generative AI Kubernetes Machine Learning MLOps Python SQL
11 hours, 33 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers