Data/Infrastructure Advocate Engineer - US Remote

1 hour, 48 minutes ago
Full-time
Mid Level
DevOps and Infrastructure
Hugging Face

Hugging Face

Hugging Face: Advancing AI through open collaboration. Platform for ML model collaboration and tools for AI project creation.

IT Services
51-250
Founded 2016
$395M raised

Description

  • Grow and nurture the open-source data and infrastructure community through initiatives, events, challenges, and collaborations.
  • Engage with communities such as Apache Parquet, Open Table Formats, and data engineering forums to promote best practices and Hugging Face tools.
  • Promote the Hugging Face Hub as a platform for data storage, versioning, and collaboration by showcasing datasets, benchmarks, and Xet use cases.
  • Create demos, benchmarks, and tools such as Colab notebooks to demonstrate best practices for data storage and versioning.
  • Experiment with Xet, Parquet, and other formats to illustrate efficient large-dataset updates, Parquet editing, and deduplication workflows.
  • Produce tutorials, blog posts, and videos that make complex technical topics accessible to developers.
  • Share insights on storage optimization, dataset versioning, and deduplication through content and community engagement.
  • Actively participate in Discord, GitHub, forums, and other online communities to answer questions and foster collaboration.
  • Ensure datasets and tools released on the Hub are well-documented with clear examples, benchmarks, and use cases.
  • Collaborate with Datasets, Hub, and Infrastructure teams to shape how developers interact with data on the platform.

Requirements

  • 3+ years of experience in developer relations or developer advocacy, ideally for data engineering, infrastructure, or ML tools and platforms.
  • An established public presence as a technical voice with a demonstrable, engaged audience on LinkedIn and X (Twitter).
  • A portfolio of developer-facing content such as tutorials, blog posts, videos, demos, benchmarks, or conference talks.
  • Hands-on experience building and engaging open-source or developer communities across Discord, GitHub, or forums.
  • Strong Python skills.
  • Hands-on experience with data libraries such as pandas, pyarrow, and huggingface/datasets.
  • Practical experience with storage systems and formats including Parquet, Open Table Formats, and S3.
  • Working knowledge of dataset versioning, deduplication, and compression.
  • Ability to explain complex technical topics clearly through writing, demos, or talks.
  • Fluent written and spoken English.
  • Experience with the Hugging Face Hub and datasets ecosystem, or with Xet, is preferred.
  • Open-source maintainer or contributor experience is preferred.
  • Familiarity with large-scale data pipelines and data engineering workflows is preferred.
  • Experience producing notebooks, such as Colab, for tutorials and benchmarks is preferred.
  • Applicants who do not meet every requirement are still encouraged to apply.

Benefits

  • Reimbursement for relevant conferences, training, and education.
  • Flexible working hours and remote work options.
  • Health, dental, and vision benefits for employees and their dependents.
  • Parental leave and flexible paid time off.
  • Company equity as part of the compensation package.
  • Opportunity to visit Hugging Face office spaces in NYC and Paris, if remote.
  • Workstation outfitting support, if needed.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Software Engineer II, Big Data

tvScientific 11-50 Media

tvScientific is hiring a Data Engineer to build and evolve the company’s AWS-based data infrastructure and pipelines that support its CTV advertising platform and data-heavy operations.

Apache Spark AWS Machine Learning Scala SQL
48 minutes ago

Manager, Data Engineering

ConnectWise 1K-5K Internet Software & Services

ConnectWise is seeking a Manager of Data Engineering to lead a team building and maintaining scalable data infrastructure and pipelines that support data-driven decision-making across the organization.

Apache Spark AWS Azure GCP Hadoop Java Python Scala
48 minutes ago

Senior Data Engineer

Nextech 251-1K Internet Software & Services

Nextech is hiring a Senior Data Engineer to optimize, support, and deploy cloud-based data systems for mission-critical healthcare applications and analytics in a remote R&D environment.

C# Databricks HIPAA Kafka PowerShell Python SQL Server Terraform
1 hour, 9 minutes ago

Senior Director, Data Platform & Engineering

ZoomInfo 1K-5K Professional Services

ZoomInfo is hiring a Senior Director, Data Platform & Engineering to lead the enterprise data platform and data engineering organizations that power internal data infrastructure, pipelines, and analytics enablement across the company.

Agile Apache Airflow AWS CI/CD dbt GCP HIPAA LLM Looker Microservices Snowflake Tableau Terraform
1 hour, 18 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers