MLabs

MLabs

MLabs is a Haskell, Rust, Blockchain, and AI consultancy specializing in mission-critical software development, cross-team collaboration, and cutting-edge value delivery for fintech, blockchain, and information technology sectors.

Internet Software & Services
11-50
Founded 2018

Description

  • Construct and maintain large-scale web crawlers across diverse domains.
  • Design high-throughput, fault-tolerant systems for collecting data from millions to billions of URLs per day.
  • Navigate anti-bot systems, rate limits, and dynamic JavaScript-heavy websites.
  • Develop pipelines for data cleaning, deduplication, filtering, and normalization.
  • Build and maintain datasets structured for research and machine learning model training.
  • Monitor and optimize crawl performance, coverage, and data quality through rapid iteration.
  • Collaborate with research teams to align data collection efforts with modeling requirements.
  • Optimize infrastructure for cost-efficiency, low latency, and reliability.

Requirements

  • Extensive programming experience in Go, Rust, Python, Java, or C++.
  • Proven experience building web crawlers or large-scale data pipelines.
  • Solid understanding of HTTP, networking protocols, and browser behavior.
  • Familiarity with distributed systems and parallel processing techniques.
  • Experience handling large datasets, ideally at the terabyte to petabyte scale.
  • Demonstrated ability to debug and maintain systems in unstable or adversarial environments.
  • Experience with NLP pipelines or dataset curation for machine learning (preferred).
  • Familiarity with LLM pre-training data or retrieval systems (preferred).
  • Practical experience with headless browsers such as Playwright, Puppeteer, or Chrome DevTools Protocol (preferred).
  • Knowledge of proxy systems, IP rotation, and large-scale request orchestration (preferred).
  • Background in data quality evaluation or benchmarking (preferred).
  • Experience running workloads on cloud or bare-metal infrastructure (preferred).
  • Must have a 6-hour overlap with EST.

Benefits

  • Competitive compensation of $80K-$175K, commensurate with experience.
  • Comprehensive benefits package.
  • Equity included in the compensation package.
  • Fully remote work with flexibility and autonomy.
  • Opportunity to work on a web-scale crawler and knowledge graph at the forefront of AI data accessibility.
  • Lean, low-ego team environment focused on high output and professional growth.
  • Equal opportunity and accessibility commitments throughout the hiring process.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Scientific AI Evaluation & Computational Problem Designer

Weekday 11-50 Construction & Engineering

An independent contractor role for a client building a benchmark to evaluate advanced AI reasoning through original, research-grade computational problems across scientific and engineering domains.

Linux Python
26 minutes ago

Data Engineer, Azure - Remote, Latin America

Bluelight Consulting 11-50 Internet Software & Services

Bluelight is hiring a remote Data Engineer to build and optimize Azure-based data pipelines and warehousing solutions for client projects across Latin America.

Agile Apache Spark Azure Git Machine Learning Power BI Python REST API SQL SQL Server Tableau
54 minutes ago

Data Engineer, Azure - Remote, Latin America

Bluelight Consulting 11-50 Internet Software & Services

Bluelight is hiring a remote Data Engineer, Azure to build and optimize data pipelines and warehousing solutions for client projects across Latin America.

Apache Spark Azure Git Machine Learning Power BI Python REST API SQL Tableau
1 hour, 30 minutes ago

Staff Simulation Engineer - Dexterity

Apptronik 51-250 Aerospace & Defense

Apptronik is hiring a Staff Simulation Engineer to own dexterous hand simulation end-to-end for its Apollo humanoid robot, ensuring simulation accurately predicts real-world hand behavior as the company brings the robot to market at scale.

C++ Python
1 hour, 45 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers