Data/Infrastructure Advocate Engineer - US Remote

1 month, 3 weeks ago
Full-time
Mid Level
DevOps and Infrastructure
Hugging Face

Hugging Face

Hugging Face: Advancing AI through open collaboration. Platform for ML model collaboration and tools for AI project creation.

IT Services
51-250
Founded 2016
$395M raised

Description

  • Grow and nurture the open-source data and infrastructure community through initiatives, events, and collaborations with data-focused groups.
  • Engage with external standards and communities (e.g., Apache Parquet, Open Table Formats, data engineering forums) to promote best practices and Hugging Face tools.
  • Promote the Hugging Face Hub and Xet as the go-to platform for data storage, versioning, and collaboration by curating and showcasing datasets, benchmarks, and tools.
  • Create demos, benchmarks, notebooks, and tooling (e.g., Colab notebooks) that illustrate best practices for storage, versioning, and large-dataset workflows.
  • Experiment with Xet, Parquet, and other data formats to surface practical use cases (efficient large-dataset updates, Parquet editing, deduplication) for ML and data engineering.
  • Produce high-quality tutorials, blog posts, and videos to explain complex topics and make them accessible to developers and researchers.
  • Actively participate in online communities (Discord, GitHub, forums) to answer questions, highlight contributions, and foster collaboration.
  • Ensure datasets and tools released on the Hub are well-documented with clear examples, benchmarks, and real-world use cases.

Requirements

  • Strong technical proficiency with Python and data libraries such as pandas, pyarrow, and huggingface/datasets.
  • Experience with storage systems and data formats including Parquet, Open Table Formats, and object stores like S3.
  • Hands-on builder mentality with experience experimenting on storage optimization, dataset versioning, deduplication, and related tooling.
  • Ability to clearly explain complex technical topics (deduplication, compression, Parquet editing) through writing, demos, or talks.
  • Active participation in developer and open-source communities (GitHub, Discord, forums) and a passion for knowledge sharing.
  • Comfort working in fast-moving environments and building in public to inspire others.
  • Demonstrated ability to collaborate cross-functionally with teams such as Datasets, Hub, and Infrastructure.
  • Prior experience producing demos, benchmarks, tutorials, or community-facing materials is preferred.

Benefits

  • Flexible working hours with remote options and distributed work (offices in NYC and Paris available).
  • Health, dental, and vision benefits for employees and their dependents.
  • Parental leave and flexible paid time off.
  • Company equity included as part of compensation.
  • Reimbursement for relevant conferences, training, and education.
  • Workstation outfitting support for remote employees and opportunities to visit company offices.
  • Supportive, inclusive culture valuing diversity, learning, and continuous growth.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

AI Data Engineer

Influur 11-50 Media

Influur is hiring an AI Data Engineer in New York/remote to own the full data-to-agent pipeline behind its autonomous viral marketing system for influencer campaigns.

AWS GCP LLM Python
4 hours, 44 minutes ago

Senior Data Engineer

Zencore Group 11-50 Internet Software & Services

Zencore is hiring a Senior Data Engineer in its LATAM Data & Analytics team to help customers modernize and migrate data platforms on Google Cloud through hands-on pipeline engineering and advisory work.

Apache Airflow Apache Spark CI/CD Databricks GCP MLOps Oracle Python Snowflake SQL
5 hours, 29 minutes ago

Data Observability Consultant - Dynatrace

Lingaro 5K-10K IT Services

Dynatrace India’s Consulting and Advisory Data Consulting Practice is hiring a remote Data Observability Consultant to support data-focused consulting work.

5 hours, 44 minutes ago

Senior Data Engineer

Lodgify 251-1K Internet Software & Services

Lodgify is hiring a Senior Data Engineer in Barcelona to build and optimize the company’s modern data platform that powers data-driven decisions across its vacation rental business.

Apache Airflow AWS Azure dbt GCP JavaScript Machine Learning Python SQL
5 hours, 44 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers