Data/Infrastructure Advocate Engineer - US Remote

1 month ago
Full-time
Mid Level
DevOps and Infrastructure
Hugging Face

Hugging Face

Hugging Face: Advancing AI through open collaboration. Platform for ML model collaboration and tools for AI project creation.

IT Services
51-250
Founded 2016
$395M raised

Description

  • Grow and nurture the open-source data and infrastructure community through initiatives, events, and collaborations with data-focused groups.
  • Engage with external standards and communities (e.g., Apache Parquet, Open Table Formats, data engineering forums) to promote best practices and Hugging Face tools.
  • Promote the Hugging Face Hub and Xet as the go-to platform for data storage, versioning, and collaboration by curating and showcasing datasets, benchmarks, and tools.
  • Create demos, benchmarks, notebooks, and tooling (e.g., Colab notebooks) that illustrate best practices for storage, versioning, and large-dataset workflows.
  • Experiment with Xet, Parquet, and other data formats to surface practical use cases (efficient large-dataset updates, Parquet editing, deduplication) for ML and data engineering.
  • Produce high-quality tutorials, blog posts, and videos to explain complex topics and make them accessible to developers and researchers.
  • Actively participate in online communities (Discord, GitHub, forums) to answer questions, highlight contributions, and foster collaboration.
  • Ensure datasets and tools released on the Hub are well-documented with clear examples, benchmarks, and real-world use cases.

Requirements

  • Strong technical proficiency with Python and data libraries such as pandas, pyarrow, and huggingface/datasets.
  • Experience with storage systems and data formats including Parquet, Open Table Formats, and object stores like S3.
  • Hands-on builder mentality with experience experimenting on storage optimization, dataset versioning, deduplication, and related tooling.
  • Ability to clearly explain complex technical topics (deduplication, compression, Parquet editing) through writing, demos, or talks.
  • Active participation in developer and open-source communities (GitHub, Discord, forums) and a passion for knowledge sharing.
  • Comfort working in fast-moving environments and building in public to inspire others.
  • Demonstrated ability to collaborate cross-functionally with teams such as Datasets, Hub, and Infrastructure.
  • Prior experience producing demos, benchmarks, tutorials, or community-facing materials is preferred.

Benefits

  • Flexible working hours with remote options and distributed work (offices in NYC and Paris available).
  • Health, dental, and vision benefits for employees and their dependents.
  • Parental leave and flexible paid time off.
  • Company equity included as part of compensation.
  • Reimbursement for relevant conferences, training, and education.
  • Workstation outfitting support for remote employees and opportunities to visit company offices.
  • Supportive, inclusive culture valuing diversity, learning, and continuous growth.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Data Engineering Tech Lead

Lingaro 5K-10K IT Services

Data Engineering Tech Lead at Lingaro (Data Engineering & Management) — lead a Poland-based remote/full-time team to design, deliver, and maintain scalable, secure data engineering solutions while mentoring engineers and ensuring timely, high-quality project delivery.

Azure CI/CD Python Scala SQL
14 hours, 42 minutes ago

Senior Software Engineer - Data Integration & JVM Ecosystem

ClickHouse 51-250 IT Services

Senior Software Engineer (JVM) at ClickHouse joining the Connectors team to own and maintain JVM-based data framework integrations, connectors, and drivers that enable high-performance data ingestion and a seamless developer experience for data engineering workloads.

Apache Airflow Apache Spark ClickHouse dbt Grafana HTTP Java Kafka Metabase Pandas Power BI Python SQL Tableau TCP/IP
1 month ago

Junior Data Engineer (Remote Argentina) / Ingénieur données junior (à distance)

GlobalVision 51-250 Internet Software & Services

Junior Data Engineer at GlobalVision supporting and maintaining the company’s data infrastructure to ensure reliable, accessible, and actionable data that informs business decision-making across the organization.

dbt Domo Machine Learning Power BI Python Salesforce SQL Tableau
1 month ago

Associate Software Engineer - Data Engineer

GroundTruth 251-1K Media

GroundTruth is hiring a Data Engineering Associate Software Engineer on the Attribution Team to build and maintain scalable data pipelines and infrastructure that enable accurate, real-world ad attribution and analytics.

Apache Airflow Apache Spark AWS Docker Git Hadoop Java Looker MapReduce Python REST API Shell Scripting SQL
1 month ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers