Data/Infrastructure Advocate Engineer - US Remote

2 months, 1 week ago
Full-time
Mid Level
DevOps and Infrastructure
Hugging Face

Hugging Face

Hugging Face: Advancing AI through open collaboration. Platform for ML model collaboration and tools for AI project creation.

IT Services
51-250
Founded 2016
$395M raised

Description

  • Grow and nurture the open-source data and infrastructure community through initiatives, events, and collaborations with data-focused groups.
  • Engage with external standards and communities (e.g., Apache Parquet, Open Table Formats, data engineering forums) to promote best practices and Hugging Face tools.
  • Promote the Hugging Face Hub and Xet as the go-to platform for data storage, versioning, and collaboration by curating and showcasing datasets, benchmarks, and tools.
  • Create demos, benchmarks, notebooks, and tooling (e.g., Colab notebooks) that illustrate best practices for storage, versioning, and large-dataset workflows.
  • Experiment with Xet, Parquet, and other data formats to surface practical use cases (efficient large-dataset updates, Parquet editing, deduplication) for ML and data engineering.
  • Produce high-quality tutorials, blog posts, and videos to explain complex topics and make them accessible to developers and researchers.
  • Actively participate in online communities (Discord, GitHub, forums) to answer questions, highlight contributions, and foster collaboration.
  • Ensure datasets and tools released on the Hub are well-documented with clear examples, benchmarks, and real-world use cases.

Requirements

  • Strong technical proficiency with Python and data libraries such as pandas, pyarrow, and huggingface/datasets.
  • Experience with storage systems and data formats including Parquet, Open Table Formats, and object stores like S3.
  • Hands-on builder mentality with experience experimenting on storage optimization, dataset versioning, deduplication, and related tooling.
  • Ability to clearly explain complex technical topics (deduplication, compression, Parquet editing) through writing, demos, or talks.
  • Active participation in developer and open-source communities (GitHub, Discord, forums) and a passion for knowledge sharing.
  • Comfort working in fast-moving environments and building in public to inspire others.
  • Demonstrated ability to collaborate cross-functionally with teams such as Datasets, Hub, and Infrastructure.
  • Prior experience producing demos, benchmarks, tutorials, or community-facing materials is preferred.

Benefits

  • Flexible working hours with remote options and distributed work (offices in NYC and Paris available).
  • Health, dental, and vision benefits for employees and their dependents.
  • Parental leave and flexible paid time off.
  • Company equity included as part of compensation.
  • Reimbursement for relevant conferences, training, and education.
  • Workstation outfitting support for remote employees and opportunities to visit company offices.
  • Supportive, inclusive culture valuing diversity, learning, and continuous growth.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Data Engineer II

Samsara 1K-5K IT Services

Samsara is hiring a remote Data Engineer II to build and scale the Databricks-based data platforms that power its Revenue Operations AI and data infrastructure for GTM analytics and generative AI applications.

Apache Spark AWS Databricks dbt Generative AI Machine Learning Python Salesforce Snowflake SQL
1 hour, 42 minutes ago

Synthetic Data Engineer (AI Data/Training)

Hyphen Connect 1-10 staffing & recruiting

A Synthetic Data Engineer at the organization will design and manage domain-specific synthetic data pipelines that support data processing and model training workflows.

Apache Airflow Apache Spark
2 hours, 18 minutes ago

Senior Developer / Systems & ETL Engineer

Metova 51-250 Internet Software & Services

Senior Developer / Systems & ETL Engineer at an unnamed company, responsible for building end-to-end information processing systems that span ETL, APIs, cloud-native deployment, and client-facing technical delivery.

ActiveMQ AWS Azure C CI/CD Docker Hadoop Java Kubernetes Linux Microservices MySQL Oracle OWASP Perl PostgreSQL Python RabbitMQ REST API Snowflake Spring Boot SQL SQL Server Unix
2 hours, 43 minutes ago

INGENIERO DE DATOS

NEORIS 5K-10K Internet Software & Services

NEORIS busca un Data Engineer para diseñar, desarrollar y desplegar soluciones de datos en un entorno Big Data y Cloud, alineadas con la arquitectura de datos y orientadas a eficiencia y mantenibilidad.

Agile Apache Spark AWS Azure Cassandra Elasticsearch GCP Hadoop HDFS MongoDB Neo4j Oracle PostgreSQL Python SQL Server
3 hours, 7 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers