Hugging Face

Hugging Face: Advancing AI through open collaboration. Platform for ML model collaboration and tools for AI project creation.

IT Services

Information Technology

51-250 (170)

Founded 2016

$395M raised

7 open positions

Links

View All Jobs

Data/Infrastructure Advocate Engineer - US Remote

2 months, 1 week ago

United States

Full-time

Mid Level

Data Engineer

DevOps and Infrastructure

AWS GitHub Machine Learning Pandas Python

Apply Now

Hugging Face

Hugging Face: Advancing AI through open collaboration. Platform for ML model collaboration and tools for AI project creation.

IT Services

51-250

Founded 2016

$395M raised

View All Jobs 7

Description

Grow and nurture the open-source data and infrastructure community through initiatives, events, and collaborations with data-focused groups.
Engage with external standards and communities (e.g., Apache Parquet, Open Table Formats, data engineering forums) to promote best practices and Hugging Face tools.
Promote the Hugging Face Hub and Xet as the go-to platform for data storage, versioning, and collaboration by curating and showcasing datasets, benchmarks, and tools.
Create demos, benchmarks, notebooks, and tooling (e.g., Colab notebooks) that illustrate best practices for storage, versioning, and large-dataset workflows.
Experiment with Xet, Parquet, and other data formats to surface practical use cases (efficient large-dataset updates, Parquet editing, deduplication) for ML and data engineering.
Produce high-quality tutorials, blog posts, and videos to explain complex topics and make them accessible to developers and researchers.
Actively participate in online communities (Discord, GitHub, forums) to answer questions, highlight contributions, and foster collaboration.
Ensure datasets and tools released on the Hub are well-documented with clear examples, benchmarks, and real-world use cases.

Requirements

Strong technical proficiency with Python and data libraries such as pandas, pyarrow, and huggingface/datasets.
Experience with storage systems and data formats including Parquet, Open Table Formats, and object stores like S3.
Hands-on builder mentality with experience experimenting on storage optimization, dataset versioning, deduplication, and related tooling.
Ability to clearly explain complex technical topics (deduplication, compression, Parquet editing) through writing, demos, or talks.
Active participation in developer and open-source communities (GitHub, Discord, forums) and a passion for knowledge sharing.
Comfort working in fast-moving environments and building in public to inspire others.
Demonstrated ability to collaborate cross-functionally with teams such as Datasets, Hub, and Infrastructure.
Prior experience producing demos, benchmarks, tutorials, or community-facing materials is preferred.

Benefits

Flexible working hours with remote options and distributed work (offices in NYC and Paris available).
Health, dental, and vision benefits for employees and their dependents.
Parental leave and flexible paid time off.
Company equity included as part of compensation.
Reimbursement for relevant conferences, training, and education.
Workstation outfitting support for remote employees and opportunities to visit company offices.
Supportive, inclusive culture valuing diversity, learning, and continuous growth.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Samsara is hiring a remote Data Engineer II to build and scale the Databricks-based data platforms that power its Revenue Operations AI and data infrastructure for GTM analytics and generative AI applications.

United States Full-time Junior Data Engineer

$102k-$154k

Apache Spark AWS Databricks dbt Generative AI Machine Learning Python Salesforce Snowflake SQL

1 hour, 42 minutes ago

Apply

1 hour, 42 minutes ago

Synthetic Data Engineer (AI Data/Training)

Hyphen Connect 1-10 staffing & recruiting

A Synthetic Data Engineer at the organization will design and manage domain-specific synthetic data pipelines that support data processing and model training workflows.

China Mid Level AI (Artificial Intelligence) Data Engineer

Apache Airflow Apache Spark

2 hours, 18 minutes ago

Apply

2 hours, 18 minutes ago

Senior Developer / Systems & ETL Engineer

Metova 51-250 Internet Software & Services

Senior Developer / Systems & ETL Engineer at an unnamed company, responsible for building end-to-end information processing systems that span ETL, APIs, cloud-native deployment, and client-facing technical delivery.

Chile Ecuador Peru Mexico Argentina Contract Senior Data Engineer

ActiveMQ AWS Azure C CI/CD Docker Hadoop Java Kubernetes Linux Microservices MySQL Oracle OWASP Perl PostgreSQL Python RabbitMQ REST API Snowflake Spring Boot SQL SQL Server Unix

2 hours, 43 minutes ago

Apply

2 hours, 43 minutes ago

INGENIERO DE DATOS

NEORIS 5K-10K Internet Software & Services

NEORIS busca un Data Engineer para diseñar, desarrollar y desplegar soluciones de datos en un entorno Big Data y Cloud, alineadas con la arquitectura de datos y orientadas a eficiencia y mantenibilidad.

Ecuador Full-time Mid Level Data Engineer

Agile Apache Spark AWS Azure Cassandra Elasticsearch GCP Hadoop HDFS MongoDB Neo4j Oracle PostgreSQL Python SQL Server

3 hours, 7 minutes ago

Apply

3 hours, 7 minutes ago

Hugging Face

Tags

Links

Data/Infrastructure Advocate Engineer - US Remote

Hugging Face

Description

Requirements

Benefits

Similar Roles

Data Engineer II

Synthetic Data Engineer (AI Data/Training)

Senior Developer / Systems & ETL Engineer

INGENIERO DE DATOS

You're on a roll! Sign up now to keep applying.