AHEAD

AHEAD

AHEAD accelerates the impact of technology on clients by engineering customized data, developer, and infrastructure platforms that improve IT operations. By weaving together cloud infrastructure, intelligent operations, and modern applications, we help...

IT Services
1K-5K
$43M raised

Description

  • Own deployment, runtime management, and operational governance across all layers of the Agentic Platform.
  • Design and implement infrastructure-as-code using Terraform or AWS CDK.
  • Build and maintain CI/CD pipelines using AWS CodePipeline, GitHub Actions, or GitLab CI.
  • Configure observability and monitoring for LLMs using CloudWatch and OpenTelemetry.
  • Implement and manage containerization and orchestration using Docker, ECS Fargate, or EKS.
  • Manage environment isolation and prompt/model versioning to support safe, reproducible models.
  • Track and manage platform costs and budgets using CloudWatch budgets and cost governance practices.
  • Drive high reliability and cost-efficiency standards across the platform, including incident response and operational improvements.

Requirements

  • Deep AWS operational expertise (production experience operating AWS services).
  • Proven experience with container orchestration and containerization (Docker, ECS Fargate, EKS).
  • Strong observability skills and experience with CloudWatch and OpenTelemetry for monitoring LLMs or services.
  • Experience building IaC with Terraform or AWS CDK.
  • Experience implementing CI/CD pipelines with CodePipeline, GitHub Actions, or GitLab CI.
  • Demonstrated focus on reliability and cost-efficiency in platform operations.
  • Bachelor’s degree in Computer Science, Information Systems, or a related field.
  • AWS Solutions Architect Associate or Professional certification and Kubernetes/CNCF certifications (preferred).

Benefits

  • Salary range $170,000 - $200,000 per year (OTE includes base and target bonus).
  • Fully remote role within the United States.
  • Medical, dental, and vision insurance.
  • 401(k) retirement plan.
  • Paid company holidays, paid time off, and paid parental and caregiver leave.
  • Cross-department training, sponsored certifications, and professional development support.
  • Employee resource groups and diversity-focused programs (e.g., Moving Women AHEAD, RISE AHEAD) and access to a multi-million-dollar tech lab.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Machine Learning Engineer - Community Support Engineering

Airbnb 5K-10K Hotels, Restaurants & Leisure

Senior Machine Learning Engineer on Airbnb’s Community Support Products (CSP) Machine Learning team, responsible for building and deploying generative-AI-driven systems to transform and scale Airbnb’s customer support experience.

Generative AI Machine Learning
14 hours, 39 minutes ago

Staff/Principal Machine Learning Engineer (Modeling), Afterpay Risk

Block 10K-50K Capital Markets

Senior individual contributor on Afterpay's Fraud and Abuse team at Block, working remotely (US/Canada) to architect and build systems that prevent fraud and abuse across the lending lifecycle and strengthen the resilience of the lending ecosystem.

Apache Airflow Feature Engineering GitHub LightGBM Machine Learning MLflow NumPy Pandas Prefect Python PyTorch Scikit-learn Snowflake SQL XGBoost
1 month ago

Senior ServiceNow ITSM Architect with AI & ITAM exposure

Muller Internet Software & Services

ServiceNow AI & ITSM Solution Architect at Müller Solutions responsible for designing, leading, and delivering AI-enabled ServiceNow solutions across ITSM, FSM, and Asset Management to align platform capabilities with business processes, data foundations, and measurable operational outcomes.

Agile Generative AI JavaScript Machine Learning
1 month ago

AI/ML engineer

Remofirst 11-50 Professional Services

AI Engineer at a rapidly scaling, VC-backed US private company, responsible for building and deploying AI-driven product features, automations, and models to move concepts from proof-of-concept to production and accelerate company growth.

Computer Vision MLOps Neural Networks Python Rust
1 month ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers