Orion Innovation

Orion Innovation

Orion Innovation is a global technology services provider specializing in digital transformation, offering solutions in data, analytics, enterprise collaboration, risk & compliance, and cloud services to enhance productivity and decision-making.

IT Services
1K-5K
Founded 1993

Description

  • Own the end-to-end infrastructure layer for the document intelligence platform, from GPU cluster configuration to model serving.
  • Design and manage Kubernetes-based workloads on Azure Kubernetes Service, including multi-node-pool architecture and autoscaling policies.
  • Configure and maintain GPU node pools, device plugins, driver compatibility, and resource limits for ML workloads.
  • Orchestrate Kubernetes jobs and event-driven processing using KEDA, queue triggers, and scaled jobs.
  • Manage CUDA and cuDNN runtime behavior for GPU inference workloads, including debugging performance and memory issues.
  • Support model deployment and inference for BERT-class NLP models using PyTorch and Hugging Face Transformers.
  • Implement batching, FP16 optimization, profiling, and memory management for efficient inference.
  • Build and maintain Azure-integrated services such as queue consumers, async workers, Key Vault, private endpoints, and Azure Data Lake Storage Gen2.
  • Author and maintain infrastructure and deployment assets including Docker images, Helm charts, and infrastructure as code.
  • Collaborate across platform engineering and applied ML to deliver a low-latency analyst-facing query interface.

Requirements

  • Strong experience with Kubernetes and Azure Kubernetes Service (AKS).
  • Experience designing multi-node-pool clusters with taints/tolerations, autoscaler configuration, and GPU node pools such as NC/ND series.
  • Hands-on knowledge of GPU workload tooling including device plugins, driver compatibility, resource limits, KEDA, and CUDA/cuDNN.
  • Experience with PyTorch for GPU inference and runtime configuration; raw kernel development is not required.
  • Experience with batching, FP16, memory management, profiling, and Hugging Face Transformers.
  • Experience loading and serving BERT, DistilBERT, or BGE models, including pipeline APIs and tokenization.
  • Strong Python experience in production environments.
  • Experience building async workers and queue consumers with Azure SDKs and Azure infrastructure.
  • Experience with VNet networking, private endpoints, Key Vault, ADLS, Azure AD, Docker, and Helm.
  • Experience authoring multi-stage builds, Helm charts, and infrastructure as code using Terraform or Bicep.
  • Preferred: willingness to learn and grow into adjacent technologies.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Staff Machine Learning Engineer, LLM/VLM Model Architecture & Optimization

Waymo Autonomous vehicles, robotics, AI, ride-hailing / mobility tech

Waymo is hiring a machine learning engineer to advance perception-focused large model systems for its autonomous driving platform, with an emphasis on integrating models efficiently into the Waymo Driver.

Computer Vision Deep Learning LLM Machine Learning PyTorch
22 hours, 40 minutes ago

Senior Machine Learning Engineer, Risk Modeling

Block 10K-50K Capital Markets

Block is hiring Senior and Staff Machine Learning Engineers for its Risk Machine Learning organization to develop large-scale fraud, abuse, and risk detection systems across Cash App and Square.

Apache Airflow Apache Spark AWS CI/CD Deep Learning GCP Keras LLM Machine Learning MLflow Mode MySQL NLP NumPy Pandas Prefect Python PyTorch Reinforcement Learning Scikit-learn Snowflake Tableau TensorFlow Vertex AI XGBoost
22 hours, 55 minutes ago

Principal Software Engineer - Vector Search - Elasticsearch

Elastic 1K-5K Internet Software & Services

Elastic is hiring a Principal Software Engineer for its globally distributed Elasticsearch Search team to advance vector similarity search capabilities and improve the search experience at scale.

Cassandra CI/CD Elasticsearch Git GitHub Java Lucene MongoDB PostgreSQL
23 hours, 10 minutes ago

FBS Ops Engineer Manager

Capgemini 100K+ Internet Software & Services

Capgemini is seeking an ML Ops Engineering Manager to lead the architecture, implementation, and evolution of machine learning operations for a major U.S. insurer’s enterprise-scale production environment.

AWS CI/CD Machine Learning MLOps Python SageMaker
23 hours, 40 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers