Later

Later

Later is a top social media management and influencer platform that simplifies visual content marketing for Instagram, Facebook, Twitter, and Pinterest. With over 2 million users globally, including renowned brands like Yelp and The Huffington Post, La...

Media
51-250
Founded 2014

Description

  • Define and own the long-term ML infrastructure roadmap for current experimentation needs and future AI initiatives.
  • Establish best practices for model lifecycle management, deployment standards, monitoring, and governance.
  • Identify infrastructure gaps and design scalable solutions to support high-velocity ML development.
  • Contribute to cross-functional technical planning so ML systems align with product and platform strategy.
  • Design, build, and maintain production-grade model deployment and inference systems using CI/CD, Docker, and API frameworks such as Flask.
  • Automate end-to-end ML lifecycle workflows, including training pipelines, model validation, registry management, deployment, and rollback strategies.
  • Implement monitoring for model performance, latency, drift detection, and infrastructure health using tools such as CloudWatch, Prometheus, and Grafana.
  • Operate across AWS and GCP environments to manage training and inference workloads, including GPU-based infrastructure and BigQuery datasets.
  • Develop and maintain infrastructure-as-code using Terraform and CloudFormation for scalable, repeatable, secure cloud environments.
  • Partner with Data Scientists, Analysts, Platform Engineers, and Product Engineers to translate experimentation needs into production-ready infrastructure.

Requirements

  • 4+ years of experience in ML Ops, ML infrastructure, backend engineering, or related roles supporting production ML systems.
  • Experience working in cloud-native environments, especially AWS and/or GCP, with hands-on deployment of ML workloads.
  • Proven track record designing and implementing CI/CD pipelines for ML systems.
  • Strong experience with Amazon SageMaker, Docker, Flask-based APIs, and infrastructure automation tools.
  • Hands-on experience with ML lifecycle tooling such as MLflow, SageMaker Studio, or Weights & Biases.
  • Experience managing container orchestration platforms such as Kubernetes, EKS, or GKE.
  • Strong programming experience in Python; additional experience in Go, Java, or Scala is a plus.
  • Experience with infrastructure-as-code tools such as Terraform or CloudFormation.
  • Familiarity with observability tools such as CloudWatch, Prometheus, Grafana, Datadog, or centralized logging platforms.
  • Experience managing GPU-based workloads and scaling training/inference systems.
  • Familiarity with data infrastructure tools such as BigQuery and cloud-native data pipelines.
  • Bonus: Experience supporting LLMs or generative AI pipelines, distributed training systems, feature stores such as Feast, real-time inference systems, or ML governance frameworks.
  • A mindset focused on automation, reliability, performance, and continuous improvement in fast-scaling environments.

Benefits

  • Salary range of $145,000 to $165,000.
  • Market-based and data-driven compensation approach with biannual compensation reviews.
  • Permanent team members are eligible to participate in benefits plans as part of their overall compensation package.
  • Flexible location policy, with fully remote candidates considered for select positions.
  • Offices available in Boston, Vancouver, Chicago, and Vancouver, WA.
  • Equal opportunity employer with an inclusion-first culture.
  • Accommodations and support available during the recruitment process.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Machine Learning Engineer I, Personalization , Minesweeper

Spotify Media

Spotify is hiring a Machine Learning Engineer I for its Personalization team to build and improve content-enrichment systems that understand music, podcasts, and audiobooks for recommendations and listening experiences.

Agile Apache Spark AWS GCP Java LLM Machine Learning Python PyTorch Scala SQL TensorFlow
3 hours, 42 minutes ago

Sagemaker DevOps Engineer - Europe

Xenon7 Internet Software & Services

Xenon7 is hiring a remote Sagemaker DevOps Engineer in Europe to build and automate enterprise-scale ML infrastructure and deployment workflows for clients across cutting-edge IT projects.

AWS CI/CD Docker Jenkins MLOps Python
5 hours, 53 minutes ago

Senior Machine Learning Infrastructure Engineer

Unity 5K-10K Internet Software & Services

Unity is hiring a Senior Machine Learning Infrastructure Engineer to build and operate real-time ML serving infrastructure for its global advertising platform, helping production ranking, bidding, and targeting systems run at scale.

Go Grafana Kubernetes OpenTelemetry Prometheus Python Terraform
5 hours, 55 minutes ago

Machine Learning Systems Engineer

Motional 1K-5K Automotive

Motional is hiring a Machine Learning Systems Engineer for its ML Acceleration team to improve large-scale model training systems for speed, cost, reliability, and throughput.

Machine Learning Python PyTorch
6 hours, 27 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers