MLabs

MLabs

MLabs is a Haskell, Rust, Blockchain, and AI consultancy specializing in mission-critical software development, cross-team collaboration, and cutting-edge value delivery for fintech, blockchain, and information technology sectors.

Internet Software & Services
11-50
Founded 2018

Description

  • Build and maintain infrastructure for concurrent AI trading agents, including cron schedules, state files, and trailing stop processes.
  • Deploy and manage agent environments with workspace persistence, isolated sessions, and MCP server connectivity.
  • Design and operate CI/CD pipelines to ship trading skills and plugins without interrupting live trading activity.
  • Implement zero-downtime deployment strategies such as blue/green and canary releases.
  • Build alerting and monitoring across metrics, logs, and traces to detect failures, corruption, and regressions before financial loss occurs.
  • Operate and scale core platform infrastructure across Kubernetes, Redis, Postgres, ClickHouse, and Kafka.
  • Maintain blockchain node infrastructure and stable connectivity to exchange APIs and on-chain transaction systems.
  • Lead incident response, on-call practices, debugging, mitigation, and post-mortems to improve reliability.

Requirements

  • Extensive experience in DevOps, SRE, or Infrastructure Engineering, preferably in a startup environment.
  • Proven experience deploying, scaling, and debugging production workloads in AWS EKS Kubernetes.
  • Proficiency with infrastructure as code tools such as Terraform, Ansible, or equivalent frameworks.
  • Hands-on experience with Docker and Helm for packaging and deploying production services.
  • Experience operating production-grade systems such as Redis, Postgres/RDS, ClickHouse, and Kafka.
  • Strong experience with observability tools such as Prometheus, Grafana, Datadog, Loki, or OpenTelemetry.
  • Ability to debug across multiple languages, including Python, Node.js, and Go.
  • Understanding of real-time systems where latency and reliability have direct financial consequences.
  • Familiarity with blockchain node infrastructure, exchange APIs, wallet operations, and on-chain monitoring.
  • Experience managing secrets, access controls, and production hardening in sensitive environments.
  • Experience defining SLOs and building mature on-call practices.
  • Experience with OpenClaw agent deployments and workspace templates (preferred).
  • Familiarity with Model Context Protocol (MCP) server deployment and auth management (preferred).
  • Direct experience with Hyperliquid or other DEX protocols (preferred).
  • Background in fintech, market data infrastructure, or high-frequency trading systems (preferred).

Benefits

  • Competitive compensation of $120K - $150K.
  • Remote role with US-based coverage aligned to GMT timezones.
  • High-autonomy environment with significant technical ownership.
  • Opportunity to build infrastructure for autonomous AI agents.
  • Commitment to equality, accessibility, and reasonable accommodations during hiring.
  • Privacy and secure handling of applicant data during recruitment.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

815 | Senior DevOps Engineer (Short term)

Intetics 1K-5K Internet Software & Services

Intetics Inc. is seeking a Senior DevOps Engineer to help design and secure cloud infrastructure and support business continuity for cloud-based systems and applications.

AWS Azure DevSecOps Terraform
4 minutes ago

Sr. Site Reliability Engineer

Element Solutions 11-50 Professional Services

Element is hiring a Senior Site Reliability Engineer to lead a 16-week technical assessment and optional implementation phase focused on stabilizing and strengthening complex enterprise infrastructure for government and commercial clients during modernization initiatives.

AWS Azure GCP Kubernetes
4 minutes ago

Site Reliability Engineer (SRE) Manager

Leadtech 251-1K IT Services

Leadtech is hiring an SRE Engineering Manager to lead reliability-focused infrastructure work for its web and mobile products, ensuring scalable, secure, and highly available systems.

Agile AWS GCP
4 minutes ago

Principal Site Reliability Engineer

Parallel Domain 51-250 Aerospace & Defense

Parallel Domain is hiring a Principal Site Reliability Engineer in Vancouver to own the reliability, scalability, and security of its multi-region cloud infrastructure supporting simulation workloads for autonomous vehicle development.

Argo CD AWS Bash CI/CD DNS Docker Elasticsearch GitHub Actions GitOps Grafana Helm Jenkins Kubernetes Linux Load Balancing Packer Prometheus Python Terraform Windows Server
4 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers