Intetics

Intetics

Intetics is a top custom software development company with 28 years of experience, offering high-quality software applications and AI/ML integration. They excel in various industries and provide global talent solutions for exceptional project outcomes.

Internet Software & Services
1K-5K
Founded 1995

Description

  • Build, operate, and improve the infrastructure powering the distributed inference platform.
  • Own reliability, scalability, and operational excellence across the AWS control plane and multi-provider GPU fleet.
  • Design and maintain the networking layer connecting control planes, Kubernetes clusters, and geographically distributed GPU hosts.
  • Operate and improve Kubernetes-based inference orchestration, primarily on EKS.
  • Manage deployments and infrastructure changes using Helm, FluxCD, and Terraform.
  • Improve observability with Prometheus, Grafana, Loki, Jaeger, and OpenTelemetry.
  • Tune alerts, improve runbooks, and strengthen operational readiness as the system scales.
  • Respond to production incidents, perform root cause analysis, and implement durable fixes.
  • Collaborate asynchronously with engineers across time zones and handle handoffs effectively.
  • Help expand Europe-based infrastructure coverage to support operations outside US business hours.

Requirements

  • 5+ years of experience in SRE, DevOps, platform engineering, or infrastructure engineering.
  • Strong production experience with networking and Kubernetes.
  • Experience operating AWS infrastructure in production, especially EKS.
  • Strong hands-on experience managing Linux hosts, clusters, and distributed systems outside fully abstracted cloud environments.
  • Experience with Prometheus, Grafana, Loki, Jaeger, and OpenTelemetry.
  • Experience with deployment and GitOps workflows using Helm and FluxCD.
  • Experience with infrastructure as code, ideally Terraform.
  • Familiarity with alert tuning, runbook development, and incident management in production systems.
  • Strong operational judgment and ability to troubleshoot independently during incidents.
  • Comfortable working in a fast-moving startup with changing infrastructure, product, and customer demands.
  • Clear communicator who works effectively in an async environment and handles shift handoffs cleanly.
  • Experience with AI inference, ML infrastructure, or adjacent high-performance distributed systems (nice to have).
  • Experience operating heterogeneous GPU fleets, bare-metal infrastructure, or multi-provider compute environments (nice to have).
  • Experience using AI tools productively in engineering workflows (nice to have).

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Principal Architect - Infrastructure

Aera Technology 251-1K Internet Software & Services

Aera Technology is hiring a Principal Architect, Infrastructure to design and operate the multi-cloud foundation for its AI-powered Decision Intelligence platform, with a focus on scalability, reliability, security, and global performance.

Argo CD Azure GitHub Actions GitOps Grafana Helm Kubernetes Machine Learning MySQL OpenTelemetry Prometheus Python Ruby Terraform
2 hours, 37 minutes ago

Infrastructure Software Engineer

Mechanical Orchard 11-50 Internet Software & Services

Mechanical Orchard is hiring a remote Infrastructure Software Engineer in Canada to help build and operate infrastructure for its Generative AI platform, Imogen, as it is deployed to customer cloud environments.

Agile Bash CI/CD DevSecOps Docker Generative AI Go Helm Kubernetes LLM Terraform
3 hours, 7 minutes ago

Senior Engineering Manager - Accelerated Compute Memory Systems

Pryon 51-250 Internet Software & Services

Pryon is seeking a Senior Engineering Manager to lead its Super Compute Memory team building cloud-native ingestion, retrieval, and inference infrastructure for large-scale AI memory workloads across commercial and federal deployments.

Apache Airflow AWS Azure C++ CloudFormation Datadog GCP Go Grafana Java Kafka Kubeflow Kubernetes Machine Learning NLP Prometheus Pulumi Python PyTorch RabbitMQ Rust TensorFlow Terraform
3 hours, 7 minutes ago

Principal Cloud Infrastructure Architect*

Egen.ai IT Services

Egen is seeking a Principal Cloud Infrastructure Architect to lead enterprise cloud strategy, governance, and large-scale multi-cloud solutions across GCP and a secondary cloud platform.

AWS Azure DevSecOps EC2 GCP Generative AI GitOps HIPAA Java Python Salesforce Terraform Vertex AI
3 hours, 22 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers