AHEAD

AHEAD

AHEAD accelerates the impact of technology on clients by engineering customized data, developer, and infrastructure platforms that improve IT operations. By weaving together cloud infrastructure, intelligent operations, and modern applications, we help...

IT Services
1K-5K
$43M raised

Description

  • Architect and manage Kubernetes clusters tailored to AI/ML workloads.
  • Implement GPU resource orchestration and workload scheduling using Run:ai or similar operators.
  • Develop and maintain Python-based automation scripts and end-to-end ML pipelines.
  • Automate infrastructure provisioning with Terraform and manage configuration with Ansible.
  • Create and manage Jupyter Notebooks and environments to support experimentation and collaboration.
  • Integrate and optimize NVIDIA Enterprise Suite components (CUDA, NeMo, Triton, TensorRT, GPU drivers) for accelerated computing.
  • Establish and maintain MLOps best practices for model lifecycle management, CI/CD, and monitoring (e.g., MLflow, Kubeflow).
  • Collaborate with data scientists and platform engineers to drive efficient resource utilization and to deploy and scale AI workloads across cloud, hybrid, and HPC environments.

Requirements

  • 4+ years in platform or solutions architecture, with 2+ years focused on AI/ML workloads.
  • Strong proficiency in Python and practical experience with ML frameworks such as TensorFlow and PyTorch.
  • Hands-on experience with Kubernetes and container orchestration.
  • Familiarity with Run:ai or similar GPU scheduling platforms for GPU workload management.
  • Expertise in infrastructure automation using Terraform and configuration management with Ansible.
  • Experience using Jupyter Notebooks for ML development and collaboration.
  • Knowledge of NVIDIA Enterprise Suite components (CUDA, NeMo Framework, Triton, TensorRT, GPU drivers) and GPU acceleration optimization.
  • Solid understanding of MLOps principles and tools (e.g., MLflow, Kubeflow) for CI/CD and model monitoring.
  • Experience with high-performance computing (HPC), distributed training, and model optimization techniques.
  • Certifications in Kubernetes or cloud platforms (AWS, Azure, GCP) are preferred.

Benefits

  • Remote role based in India (remote work option).
  • Access to a multi-million-dollar lab and top-notch technology resources.
  • Sponsored certifications and support for continued learning and cross-department training and development.
  • Active diversity and inclusion programs (e.g., Moving Women AHEAD, RISE AHEAD).
  • Supportive culture that funds professional growth and skills development.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Machine Learning Engineer - Community Support Engineering

Airbnb 5K-10K Hotels, Restaurants & Leisure

Senior Machine Learning Engineer on Airbnb’s Community Support Products (CSP) Machine Learning team, responsible for building and deploying generative-AI-driven systems to transform and scale Airbnb’s customer support experience.

Generative AI Machine Learning
14 hours, 44 minutes ago

Staff/Principal Machine Learning Engineer (Modeling), Afterpay Risk

Block 10K-50K Capital Markets

Senior individual contributor on Afterpay's Fraud and Abuse team at Block, working remotely (US/Canada) to architect and build systems that prevent fraud and abuse across the lending lifecycle and strengthen the resilience of the lending ecosystem.

Apache Airflow Feature Engineering GitHub LightGBM Machine Learning MLflow NumPy Pandas Prefect Python PyTorch Scikit-learn Snowflake SQL XGBoost
1 month ago

AI/ML engineer

Remofirst 11-50 Professional Services

AI Engineer at a rapidly scaling, VC-backed US private company, responsible for building and deploying AI-driven product features, automations, and models to move concepts from proof-of-concept to production and accelerate company growth.

Computer Vision MLOps Neural Networks Python Rust
1 month ago

Senior Python Engineer - Agentic AI Deployment Services

Resil 251-1K Internet Software & Services

Senior Python Engineer at Resilinc on the Implementation Deployment Services team, responsible for building and scaling agentic AI‑powered, data‑intensive platforms that enable enterprises to predict supply chain disruptions and act in real time.

Databricks Machine Learning PostgreSQL Python SQL
1 month ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers