MLabs

MLabs

MLabs is a Haskell, Rust, Blockchain, and AI consultancy specializing in mission-critical software development, cross-team collaboration, and cutting-edge value delivery for fintech, blockchain, and information technology sectors.

Internet Software & Services
11-50
Founded 2018

Description

  • Construct and maintain large-scale web crawlers across diverse domains.
  • Design high-throughput, fault-tolerant systems for collecting data from millions to billions of URLs per day.
  • Navigate anti-bot systems, rate limits, and dynamic JavaScript-heavy websites.
  • Develop pipelines for data cleaning, deduplication, filtering, and normalization.
  • Build and maintain datasets structured for research and machine learning model training.
  • Monitor and optimize crawl performance, coverage, and data quality through rapid iteration.
  • Collaborate with research teams to align data collection efforts with modeling requirements.
  • Optimize infrastructure for cost-efficiency, low latency, and reliability.

Requirements

  • Extensive programming experience in Go, Rust, Python, Java, or C++.
  • Proven experience building web crawlers or large-scale data pipelines.
  • Solid understanding of HTTP, networking protocols, and browser behavior.
  • Familiarity with distributed systems and parallel processing techniques.
  • Experience handling large datasets, ideally at the terabyte to petabyte scale.
  • Demonstrated ability to debug and maintain systems in unstable or adversarial environments.
  • Experience with NLP pipelines or dataset curation for machine learning (preferred).
  • Familiarity with LLM pre-training data or retrieval systems (preferred).
  • Practical experience with headless browsers such as Playwright, Puppeteer, or Chrome DevTools Protocol (preferred).
  • Knowledge of proxy systems, IP rotation, and large-scale request orchestration (preferred).
  • Background in data quality evaluation or benchmarking (preferred).
  • Experience running workloads on cloud or bare-metal infrastructure (preferred).
  • Must have a 6-hour overlap with EST.

Benefits

  • Competitive compensation of $80K-$175K, commensurate with experience.
  • Comprehensive benefits package.
  • Equity included in the compensation package.
  • Fully remote work with flexibility and autonomy.
  • Opportunity to work on a web-scale crawler and knowledge graph at the forefront of AI data accessibility.
  • Lean, low-ego team environment focused on high output and professional growth.
  • Equal opportunity and accessibility commitments throughout the hiring process.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Mathematical Optimisation Engineer

NEORIS 5K-10K Internet Software & Services

NEORIS, now part of EPAM Systems, is hiring an Operations Research professional to develop optimization models and production-ready scheduling and planning solutions for operations and manufacturing environments.

CI/CD Git Python
51 minutes ago

Data Engineer, Analytics Data Engineering

Dropbox 1K-5K Internet Software & Services

Dropbox is hiring an engineer to build large-scale analytics and data platform pipelines from the ground up using modern big data technologies.

Apache Airflow Apache Spark C++ Databricks Java Python Scala SQL
1 hour, 11 minutes ago

Senior Research Engineer

STR 251-1K Aerospace & Defense

STR’s APEX Group is seeking a Senior Research Engineer to develop advanced radar sensing concepts and demonstrations for defense research programs.

Machine Learning MATLAB Python
1 hour, 55 minutes ago

PhD Algorithm Developer

GlobalDev Tech 51-250 Internet Software & Services

A PhD-level specialist at a growing project will develop and validate advanced algorithms for electrical systems and power networks, turning research into practical solutions for digital substations, measurement technologies, and power system performance.

MATLAB
2 hours, 5 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers