Nebius

Nebius

Nebius enables B2B companies to build local hyperscaling cloud platforms with cost-effective GPUs, InfiniBand network, and 50% less compute cost. They offer managed Kubernetes and a launch-ready business model for innovative cloud solutions.

Internet Software & Services
51-250

Description

  • Investigate and resolve complex technical issues in customer environments.
  • Troubleshoot Linux, Kubernetes, cloud infrastructure, networking, storage, and GPU-related workloads.
  • Support customers running containerized systems, inference workloads, training jobs, and other distributed platforms.
  • Act as a senior escalation point for production incidents.
  • Reproduce issues, identify root causes, and partner with engineering on long-term fixes.
  • Build or improve internal scripts, troubleshooting tools, and operational documentation.
  • Improve support scalability through automation, observability, and process enhancements.
  • Communicate clearly with customers during active investigations and incidents.
  • Participate in weekend coverage and urgent incident response.

Requirements

  • Strong Linux troubleshooting skills.
  • Strong Kubernetes and container experience.
  • Solid understanding of cloud infrastructure in AWS, GCP, Azure, OpenStack, or similar environments.
  • Good networking fundamentals.
  • Ability to write scripts or small tools in Python, Bash, Go, or similar.
  • Experience working on production issues that require structured debugging and cross-team collaboration.
  • Ability to work independently when the path to resolution is not obvious.
  • Clear written communication for explaining technical issues to customers and internal teams.
  • Experience with GPU-based infrastructure (especially valuable).
  • Familiarity with AI/ML or LLM-related workloads (especially valuable).
  • Understanding of inference and training pipelines (especially valuable).
  • Experience improving observability, tooling, or operational workflows (especially valuable).
  • History of building useful internal tools or automating repetitive work (especially valuable).
  • Personal or open-source projects that demonstrate technical depth (especially valuable).
  • Applicants must be authorized to work in the country where they apply.

Benefits

  • Competitive compensation.
  • Career growth and learning opportunities.
  • Flexibility and work-life balance.
  • Collaborative and innovative culture.
  • Opportunity to work on impactful AI projects.
  • International environment and talented teams.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Production Support Engineer - SQL (Remote - Mexico Only)

Varicent 251-1K Professional Services

Varicent is hiring a Technical Analyst to provide frontline support for its SaaS sales performance management products, resolving customer issues and helping ensure a smooth post-sales experience for clients and partners.

CRM SQL
20 minutes ago

Travel Technology Specialist

WorldVia 11-50 Consumer Services

WorldVia is hiring a Travel Technology Specialist to support and improve travel technology solutions for clients and internal teams in a collaborative travel industry environment.

1 hour, 46 minutes ago

Technical Support Specialist (Egypt)

Bask Health 1-10 Internet Software & Services

Bask Health is hiring a Technical Support Specialist to support daily operations for its telehealth SaaS platform by resolving subscriber issues, managing order queues, and coordinating with internal teams as the company scales.

2 hours, 10 minutes ago

ISP Network Support Engineer

Pavago IT Services

Remote ISP Network Support Engineer role at a growing ISP environment focused on troubleshooting customer connectivity and maintaining high-availability production network operations across multi-WAN, SD-WAN, wireless, and routing infrastructure.

Fiber TCP/IP
3 hours, 24 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers