Bee Talents

Bee Talents

Bee Talents is your dedicated partner in tech recruitment, saving up to 75% on standard costs. We build efficient tech teams with tailored hires for niche positions in new technologies.

Professional Services
51-250
Founded 2015

Description

  • Design end-to-end GPU cluster architectures for on-premises and cloud environments using Ansible, Terraform, Kubernetes, and Slurm.
  • Lead technical deep-dives, workshops, and solution presentations for stakeholders at different levels.
  • Build and maintain Infrastructure as Code modules to automate GPU resource provisioning, scaling, and monitoring.
  • Produce whitepapers, runbooks, and training materials to support customer enablement.
  • Host webinars and training sessions for customers and internal audiences.
  • Partner with engineering and product teams to share customer feedback and help drive product improvements.
  • Support deployment of AI infrastructure and workflows for production use cases.

Requirements

  • Proven track record deploying GPU clusters at scale, including multi-node and multi-GPU setups.
  • Hands-on experience with Ansible or similar configuration management tools.
  • Experience with Terraform and Infrastructure as Code practices.
  • Strong familiarity with Kubernetes and Slurm.
  • Proficiency in Python or Go.
  • Solid understanding of ML ecosystems, including models, tooling, and production deployment patterns.
  • Excellent verbal and written communication skills with the ability to explain complex technical concepts to diverse audiences.
  • Nice to have: experience deploying high-availability inference infrastructure for production AI workloads.
  • Nice to have: experience implementing and optimizing distributed training and inference pipelines with MLflow, REST APIs, and frameworks such as PyTorch, TensorFlow, or JAX.
  • Nice to have: experience transitioning ML pipelines from proof of concept to scalable production systems.
  • Nice to have: familiarity with GitOps workflows, Docker, Helm charts, and CI/CD for ML.
  • Nice to have: knowledge of Hugging Face transformers, Scikit-learn, and experiment tracking best practices.

Benefits

  • Competitive compensation.
  • Flexible working hours with hybrid or remote options, depending on the role.
  • Work from anywhere in the world for up to 45 days per year.
  • Private medical insurance for you and your family.
  • Extra paid vacation and sick leave days.
  • Support for important life moments and celebrations.
  • Language courses for professional growth.
  • Modern offices with snacks, drinks, and entertainment.
  • Team sports and social activities.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Software Engineer – Backend Infrastructure

Logz.io 51-250 IT Services

Software Engineer – Backend Infrastructure at Warsaw Engineering is a remote role focused on owning and improving the data ingestion, storage, and serving systems that power the company’s backend infrastructure at scale.

AWS EC2 Elasticsearch Go Kafka Kotlin Kubernetes Lucene Microservices OpenSearch Prometheus
28 minutes ago

AI/ML Data Contributor

TSMG Professional Services

AI/ML Data Contributors are needed by a U.S.-based project team to support flexible, task-based work that helps improve machine learning models through data collection and testing activities.

Machine Learning
1 hour, 11 minutes ago

First-Person Video AI Trainer (Worldwide)

Toloka 251-1K Internet Software & Services

Toloka Annotators is hiring freelance contributors to record point-of-view videos of everyday household tasks to help train generative AI systems.

Generative AI
1 hour, 22 minutes ago

First-Person Video AI Trainer (Worldwide)

Toloka 251-1K Internet Software & Services

Toloka Annotators is hiring freelance contributors to record point-of-view videos of everyday household tasks that will be used to train generative AI systems.

Generative AI
1 hour, 25 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers