MLabs

MLabs

MLabs is a Haskell, Rust, Blockchain, and AI consultancy specializing in mission-critical software development, cross-team collaboration, and cutting-edge value delivery for fintech, blockchain, and information technology sectors.

Internet Software & Services
11-50
Founded 2018

Description

  • Construct and maintain large-scale web crawlers across diverse domains.
  • Design high-throughput, fault-tolerant systems for collecting data from millions to billions of URLs per day.
  • Navigate anti-bot systems, rate limits, and dynamic JavaScript-heavy websites.
  • Develop pipelines for data cleaning, deduplication, filtering, and normalization.
  • Build and maintain datasets structured for research and machine learning model training.
  • Monitor and optimize crawl performance, coverage, and data quality through rapid iteration.
  • Collaborate with research teams to align data collection efforts with modeling requirements.
  • Optimize infrastructure for cost-efficiency, low latency, and reliability.

Requirements

  • Extensive programming experience in Go, Rust, Python, Java, or C++.
  • Proven experience building web crawlers or large-scale data pipelines.
  • Solid understanding of HTTP, networking protocols, and browser behavior.
  • Familiarity with distributed systems and parallel processing techniques.
  • Experience handling large datasets, ideally at the terabyte to petabyte scale.
  • Demonstrated ability to debug and maintain systems in unstable or adversarial environments.
  • Experience with NLP pipelines or dataset curation for machine learning (preferred).
  • Familiarity with LLM pre-training data or retrieval systems (preferred).
  • Practical experience with headless browsers such as Playwright, Puppeteer, or Chrome DevTools Protocol (preferred).
  • Knowledge of proxy systems, IP rotation, and large-scale request orchestration (preferred).
  • Background in data quality evaluation or benchmarking (preferred).
  • Experience running workloads on cloud or bare-metal infrastructure (preferred).
  • Must have a 6-hour overlap with EST.

Benefits

  • Competitive compensation of $80K-$175K, commensurate with experience.
  • Comprehensive benefits package.
  • Equity included in the compensation package.
  • Fully remote work with flexibility and autonomy.
  • Opportunity to work on a web-scale crawler and knowledge graph at the forefront of AI data accessibility.
  • Lean, low-ego team environment focused on high output and professional growth.
  • Equal opportunity and accessibility commitments throughout the hiring process.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Lead Data Platform Engineer

PR Newswire 1K-5K Internet Software & Services

INFOnline, part of saas.group, is seeking a Lead Data Platform Engineer to own and evolve its GCP-native data platform that powers digital audience measurement for the German and Austrian media industry.

CI/CD dbt Docker GCP Go Serverless SQL Terraform
2 hours, 1 minute ago

OFSAA - Basel Technical Consultant

Unison Group Technology consulting

An experienced OFSAA Basel Technical Consultant is needed to design, develop, and support Basel regulatory reporting solutions for Oracle Financial Services Analytical Applications at a banking environment.

2 hours, 16 minutes ago

Data Engineer for AI Product

Qonto 1K-5K Banks

Qonto is hiring a Data Engineer for AI Product to build the data layer and production infrastructure that powers machine learning products for its finance workspace serving SMEs across Europe.

Apache Airflow Apache Spark CI/CD dbt Machine Learning Python
2 hours, 31 minutes ago

Senior Azure Data Consultant

Trility Consulting 51-250 Internet Software & Services

Trility Consulting is hiring a Senior Azure Data Consultant to work remotely with U.S. clients and lead data architecture and engineering efforts from initial discovery through production delivery.

Agile Azure CI/CD Databricks SQL
2 hours, 31 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers