Data/Infrastructure Advocate Engineer - EMEA Remote

2 months, 1 week ago
Full-time
Senior
DevOps and Infrastructure
Hugging Face

Hugging Face

Hugging Face: Advancing AI through open collaboration. Platform for ML model collaboration and tools for AI project creation.

IT Services
51-250
Founded 2016
$395M raised

Description

  • Grow and nurture the open-source data and infrastructure community by launching initiatives, collaborating with data-focused groups, and organizing events or challenges.
  • Engage with external communities (e.g., Apache Parquet, Open Table Formats, data engineering forums) to promote best practices and Hugging Face tools.
  • Promote the Hugging Face Hub as the go-to platform for data storage, versioning, and collaboration by curating and showcasing datasets, benchmarks, and tools like Xet.
  • Create demos, benchmarks, tools, and example notebooks (e.g., Colab) to illustrate best practices for data storage, versioning, and pipeline optimization.
  • Experiment with Xet, Parquet, and other data formats to demonstrate their potential for machine learning and data engineering workflows.
  • Produce high-quality technical content (tutorials, blog posts, videos) that makes complex topics accessible to developers and data engineers.
  • Share insights and guidance on storage optimization, dataset versioning, deduplication, and related workflows to empower users.
  • Actively participate in online communities (Discord, GitHub, forums) to highlight contributions, answer questions, and foster collaboration.
  • Collaborate cross-functionally with teams like Datasets, Hub, and Infrastructure to shape how developers interact with data on the platform and ensure released datasets/tools are well-documented with clear examples and benchmarks.

Requirements

  • Strong technical experience with Python and data libraries such as pandas, pyarrow, and huggingface/datasets.
  • Familiarity with storage systems and formats including Parquet, Open Table Formats, and object storage like S3.
  • Hands-on experience building and experimenting with data tools, storage optimization, and dataset versioning.
  • Ability to clearly explain complex topics (e.g., deduplication, compression, Parquet editing) through writing, demos, or talks.
  • Active participation in developer and open-source communities (GitHub, Discord, forums) and a passion for knowledge sharing.
  • Comfort working in fast-moving environments and building in public to inspire others.
  • Experience creating demos, benchmarks, tutorials, or example notebooks to illustrate technical workflows.
  • Interest in advocating for platform adoption and collaborating with product and infrastructure teams to shape developer workflows.

Benefits

  • Flexible working hours and remote work options, with office spaces available in NYC and Paris.
  • Health, dental, and vision benefits for employees and their dependents.
  • Parental leave and flexible paid time off.
  • Reimbursement for relevant conferences, training, and education.
  • Company equity included as part of the compensation package.
  • Support for remote employees to visit offices and provision of workstation equipment if needed.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Data Engineer II

Samsara 1K-5K IT Services

Samsara is hiring a remote Data Engineer II to build and scale the Databricks-based data platforms that power its Revenue Operations AI and data infrastructure for GTM analytics and generative AI applications.

Apache Spark AWS Databricks dbt Generative AI Machine Learning Python Salesforce Snowflake SQL
1 hour, 41 minutes ago

Synthetic Data Engineer (AI Data/Training)

Hyphen Connect 1-10 staffing & recruiting

A Synthetic Data Engineer at the organization will design and manage domain-specific synthetic data pipelines that support data processing and model training workflows.

Apache Airflow Apache Spark
2 hours, 18 minutes ago

Senior Developer / Systems & ETL Engineer

Metova 51-250 Internet Software & Services

Senior Developer / Systems & ETL Engineer at an unnamed company, responsible for building end-to-end information processing systems that span ETL, APIs, cloud-native deployment, and client-facing technical delivery.

ActiveMQ AWS Azure C CI/CD Docker Hadoop Java Kubernetes Linux Microservices MySQL Oracle OWASP Perl PostgreSQL Python RabbitMQ REST API Snowflake Spring Boot SQL SQL Server Unix
2 hours, 42 minutes ago

INGENIERO DE DATOS

NEORIS 5K-10K Internet Software & Services

NEORIS busca un Data Engineer para diseñar, desarrollar y desplegar soluciones de datos en un entorno Big Data y Cloud, alineadas con la arquitectura de datos y orientadas a eficiencia y mantenibilidad.

Agile Apache Spark AWS Azure Cassandra Elasticsearch GCP Hadoop HDFS MongoDB Neo4j Oracle PostgreSQL Python SQL Server
3 hours, 7 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers