Senior Machine Learning Engineer

5 hours, 19 minutes ago
Full-time
Senior
Software Development
Rubrik

Rubrik

Rubrik provides cutting-edge data security and protection solutions, including Zero Trust Data Protection and ransomware recovery, to ensure data readiness and business resilience.

IT Services
1K-5K
Founded 2014
$553M raised

Description

  • Own the full lifecycle of production small language models and classifiers, from base-model selection and training through deployment and iteration.
  • Train, fine-tune, distill, and optimize models using supervised fine-tuning, preference optimization, adversarial training, and post-training techniques.
  • Build real-time and batch inference infrastructure for low-latency enforcement, offline scoring, back-testing, and corpus mining.
  • Optimize serving performance using GPU pooling, KV-cache-aware routing, continuous batching, quantization, and speculative decoding.
  • Design and maintain canary, shadow, and A/B traffic workflows to validate model changes on live customer traffic.
  • Build synthetic data pipelines, policy back-testing systems, and online/offline evaluation frameworks that detect regressions and quality issues.
  • Mine production signals, customer feedback, and agent-session insights to improve policies, reduce false positives, and surface security gaps.
  • Diagnose model failures across data, training, architecture, and serving layers and implement the correct fix in the right layer.
  • Partner with product, customer-facing, security, and platform teams to translate governance needs into modeling work and integrate models safely into production.
  • Provide technical leadership on a pillar of the model stack and mentor engineers working on applied ML systems.

Requirements

  • Bachelor's degree or higher in Computer Science, Machine Learning, Computer Engineering, Statistics, or a closely related technical field.
  • 2+ years of professional ML experience with end-to-end ownership of models in production.
  • Proficiency in Python and PyTorch, or equivalent tools, for training and evaluation.
  • Hands-on experience training, fine-tuning, or distilling language models or classifiers in production.
  • Experience with supervised fine-tuning and at least one preference-optimization method such as DPO, RLAIF, or RLHF.
  • Production experience with serving frameworks such as vLLM, SGLang, TensorRT-LLM, or equivalent.
  • Experience optimizing inference with continuous batching, KV-cache strategies, and inference-time quantization.
  • Experience designing closed-loop ML systems that connect evaluation, telemetry, data curation, and synthetic data back into training.
  • Comfort operating at production scale in high-QPS, safety-critical request paths with customer-visible consequences.
  • Preferred: deep background in AI safety and red-teaming, including adversarial ML and prompt-injection defense.
  • Preferred: expertise in evaluation methodology such as LLM-as-judge pipelines, calibration monitoring, and adversarial benchmarks.
  • Preferred: experience with context-fusion and retrieval systems that combine sensitivity, identity, and behavioral history.
  • Preferred: production experience with low-latency inference for streaming or safety-critical systems.
  • Preferred: knowledge of label-efficient training methods such as weak supervision, active learning, and embedding-based retrieval.
  • Preferred: hands-on knowledge distillation experience transferring frontier model capabilities to smaller production models.
  • Preferred: familiarity with tool-use frameworks, model gateway architectures such as MCP or LiteLLM, and autonomous agent patterns.
  • Preferred: active open-source contributions to mainstream ML training, serving, or evaluation libraries.

Benefits

  • US base salary range of $188,500 to $282,700.
  • Role is eligible for bonus potential.
  • Role is eligible for equity.
  • Role includes benefits.
  • Equal opportunity employer with accommodations available for qualified individuals with disabilities.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

AI Security Engineer - Mid-Atlantic region (Remote in VA, MD, PA, NC, DE, NJ, or DC)

GuidePoint Security 251-1K Internet Software & Services

GuidePoint Security is hiring an AI Security Engineer to help customers design, implement, secure, and operate generative AI security solutions across enterprise environments.

Cybersecurity Generative AI LLM Python SageMaker Terraform
4 hours, 34 minutes ago

Machine Learning Engineer, Next-Generation Recommendation Systems (New Grad / PhD)

Unity 5K-10K Internet Software & Services

Unity’s Vector AI team is hiring a PhD graduate to develop and productionize large-scale ranking and recommendation systems that optimize ad relevance, user value, and delivery outcomes across billions of monthly users.

Feature Engineering LLM Machine Learning Python PyTorch Reinforcement Learning TensorFlow
4 hours, 49 minutes ago

Machine Learning Engineer, Next-Generation Recommendation Systems (New Grad / PhD)

Unity 5K-10K Internet Software & Services

Unity’s Vector AI team is hiring a PhD-level machine learning researcher to develop production recommendation and ranking systems that power ad delivery across billions of users.

Feature Engineering LLM Machine Learning Python PyTorch Reinforcement Learning Statistics TensorFlow
5 hours, 34 minutes ago

Staff Machine Learning Engineer, Underwriting and Credit

Block 10K-50K Capital Markets

Block is hiring a Senior Credit Modeling individual contributor to build and evolve machine learning systems that power underwriting and credit decisioning for Cash App Borrow and Afterpay.

Apache Airflow AWS GCP GitHub LightGBM LLM MLflow NumPy Pandas Prefect Python PyTorch Scikit-learn Snowflake SQL XGBoost
5 hours, 34 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers