Senior Machine Learning Engineer, AI Platform

6 hours, 40 minutes ago
Full-time
Senior
Software Development
Mozilla

Mozilla

Mozilla, the maker of Firefox, is a non-profit organization ensuring an open, safe, and accessible internet for all users worldwide.

Internet Software & Services
251-1K
Founded 2005
$2M raised

Description

  • Design, build, and operate core AI platform components for training, deploying, and serving machine learning models in production.
  • Own model serving and inference workflows end to end, improving reliability, scalability, performance, and operational excellence.
  • Optimize inference systems for throughput, latency, and cost across CPU and GPU workloads.
  • Design and manage GPU-based training and inference workloads, including performance tuning, capacity planning, and resource optimization.
  • Own key parts of the model lifecycle, including packaging, versioning, testing, validation, and deployment automation.
  • Implement and improve observability practices such as metrics, logging, tracing, and alerting for ML services and pipelines.
  • Partner with product, infrastructure, security, and data teams to design scalable AI platform capabilities.
  • Contribute to technical design discussions, propose architectural improvements, and mentor junior engineers through code reviews and knowledge sharing.
  • Participate in operational processes such as incident response, on-call rotations, and post-incident reviews.

Requirements

  • Bachelor’s degree with 4–6 years of relevant industry experience, Master’s degree with significant hands-on experience, or equivalent work experience.
  • Strong experience developing in Python for machine learning systems, backend services, or distributed data processing.
  • Proven experience deploying and operating ML workloads in cloud environments with production-grade infrastructure.
  • Solid understanding of model serving architectures, inference pipelines, and performance tradeoffs including latency, throughput, cost, and scaling strategies.
  • Hands-on experience working with GPU-based workloads and accelerated computing in production settings.
  • Experience designing CI/CD pipelines and development workflows that support reliable ML system deployment.
  • Ability to independently scope and drive technical initiatives while balancing product and operational priorities.
  • Strong problem-solving skills and ability to debug performance and reliability issues in distributed systems.
  • Clear communication skills and experience collaborating across engineering, product, and infrastructure teams.
  • Preferred: experience with inference optimization techniques such as batching, quantization, compilation, model conversion, or hardware-specific tuning.
  • Preferred: familiarity with containerization and orchestration systems such as Docker and Kubernetes in production environments.
  • Preferred: experience designing observability systems for distributed services, including metrics strategy and performance profiling.
  • Preferred: exposure to privacy-preserving ML techniques, security best practices, or responsible AI system design.
  • Preferred: contributions to open-source ML infrastructure projects or leadership in building reusable internal ML tooling.

Benefits

  • $128,000–$171,000 CAD salary range for Canada Tier 1 locations, or $116,000–$155,000 CAD for Canada Tier 2 locations.
  • Generous performance-based bonus plans for eligible employees.
  • Rich medical, dental, and vision coverage.
  • Generous retirement contributions with 100% immediate vesting.
  • Quarterly all-company wellness days.
  • Country-specific holidays plus a day off for your birthday.
  • One-time home office stipend.
  • Annual professional development budget.
  • Quarterly well-being stipend.
  • Considerable paid parental leave.
  • Employee referral bonus program.
  • Additional benefits including life/AD&D, disability, and EAP coverage, varying by country.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Machine Learning Engineer, ML Infrastructure - Online

Unity 5K-10K Internet Software & Services

Unity Vector is seeking a senior/staff ML engineer to build and evolve its online model inference platform for serving production machine learning models at scale.

GCP Kubernetes Machine Learning Python PyTorch
2 hours, 8 minutes ago

Machine Learning Engineer - Artist-First AI Music Lab

Spotify Media

Spotify’s Music Mission team is hiring a Machine Learning Engineer to help build production AI music experiences that center artists and deepen fan connections.

AWS Azure GCP Generative AI Java LLM Machine Learning Python Scala
3 hours, 46 minutes ago

Sr. Machine Learning Engineer

Mitek Systems 251-1K Communications Equipment

Mitek is hiring a remote Sr. Machine Learning Engineer to lead computer vision and image-based ML work for its identity verification and fraud prevention platform.

AWS CI/CD Computer Vision Docker DynamoDB Machine Learning Matplotlib MongoDB OpenCV Pandas Pillow Python PyTorch SageMaker Scikit-learn TensorFlow
6 hours, 36 minutes ago

Sr. Software Engineer III (6519)

MetroStar 251-1K IT Services

MetroStar is hiring a Sr. Software Engineer III to support federal-government technology work by operationalizing AI and data pipelines, deploying Python-native ML systems, and advising on secure identity management architecture.

Angular AWS DevSecOps Go Java JavaScript Machine Learning Microservices Next.js Python React TypeScript
6 hours, 40 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers