Artera

Artera

Artera specializes in modernizing and enhancing critical infrastructure for energy utilities and municipalities, providing solutions that ensure the reliable distribution and transmission of natural gas and electric power across America.

Construction & Engineering
51-250

Description

  • Develop the long-term vision and roadmap for Artera’s AI platform to support scaling inference volume and development workloads.
  • Own ML compute infrastructure, including distributed training infrastructure and developer libraries for foundation model development.
  • Build and evolve core libraries used by AI scientists to develop, launch, and monitor AI products.
  • Collaborate with model developers to improve GPU and CPU efficiency and data throughput for large-scale training runs.
  • Optimize storage and serving of terabytes of digital pathology data for large-scale training workflows.
  • Maintain and improve observability infrastructure to identify opportunities to optimize model performance across the platform.
  • Work closely with AI model developers, machine learning engineers, and platform engineering to support production deployment of optimized models.

Requirements

  • 8+ years of industry software engineering experience.
  • 4+ years of experience using ML orchestration frameworks such as Flyte, Ray, Kubeflow, Metaflow, MLflow, Dagster, Argo Workflows, or Prefect.
  • 4+ years of experience using PyTorch, TensorFlow, or JAX in Python.
  • 3+ years of experience building with AWS, Docker, and Kubernetes.
  • 1+ years of experience optimizing large-scale, high-throughput distributed machine learning training pipelines.
  • Experience with Terraform and SqlAlchemy is preferred.
  • Experience with multi-node and multi-GPU training is preferred.
  • Experience deploying and maintaining infrastructure for machine learning training and production inference is preferred.
  • Familiarity with TorchScript, ONNXRuntime, DeepSpeed, AWS Neuron, or similar inference optimization approaches is preferred.
  • Must be currently authorized to work in the United States or Canada without visa sponsorship.

Benefits

  • Base salary of $180,000 to $220,000 per year.
  • Equity is a core component of the compensation package.
  • 401(k) matching.
  • Unlimited paid time off (PTO).
  • Remote role open to candidates authorized to work in the U.S. or Canada.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Software Engineer II, Machine Learning (Feature Platform)

Affirm 1K-5K Diversified Financial Services

Affirm is hiring a software engineer for its ML Feature Platform team to build the self-serve data and feature infrastructure that powers machine learning and decisioning across the company.

AWS Kotlin Kubernetes Machine Learning MySQL Python
11 hours, 46 minutes ago

[Job-30069] AI Engineer Specialist, Brazil

CI&T 5K-10K Internet Software & Services

A CI&T is hiring an AI Engineer Specialist to work with a global energy client on building and evolving a corporate AI platform for generative AI, intelligent agents, and process automation.

Azure CI/CD Databricks Docker FastAPI Generative AI Git Grafana Kafka Kubernetes LLM Microservices MLflow Prometheus Python Redis REST API
12 hours, 16 minutes ago

Senior Staff Machine Learning Engineer, LLM/VLM Model Architecture & Optimization

Waymo Autonomous vehicles, robotics, AI, ride-hailing / mobility tech

Waymo is hiring a machine learning engineer to advance perception-focused large model systems for its autonomous driving platform, with an emphasis on integrating models efficiently into the Waymo Driver.

Computer Vision Deep Learning LLM Machine Learning PyTorch
1 day, 11 hours ago

Senior Machine Learning Engineer, Risk Modeling

Block 10K-50K Capital Markets

Block is hiring Senior and Staff Machine Learning Engineers for its Risk Machine Learning organization to develop large-scale fraud, abuse, and risk detection systems across Cash App and Square.

Apache Airflow Apache Spark AWS CI/CD Deep Learning GCP Keras LLM Machine Learning MLflow Mode MySQL NLP NumPy Pandas Prefect Python PyTorch Reinforcement Learning Scikit-learn Snowflake Tableau TensorFlow Vertex AI XGBoost
1 day, 11 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers