Capital.com

Capital.com

Capital.com is a leading fintech company providing online trading services through a smart investment app, offering access to 3700+ global markets with AI-powered features for secure and efficient trading.

Capital Markets
251-1K
Founded 2016
$25M raised

Description

  • Own the full observability stack for metrics, logs, and traces, from pipeline design through day-2 operations.
  • Architect and operate the VictoriaMetrics cluster topology, including scraping, remote write, alerting rules, and cardinality control.
  • Operate OpenSearch clusters, including index lifecycle management, hot-warm-cold architecture, shard tuning, and ingest pipelines.
  • Build and maintain OpenTelemetry Collector pipelines and instrument services across Java, Python, and JavaScript/TypeScript stacks.
  • Run Kafka as the telemetry transport layer, including topic design, partition strategy, lag monitoring, and throughput tuning.
  • Manage log shipping infrastructure with Fluent Bit, Vector, or Fluentd and define structured logging standards across services.
  • Build Grafana dashboards and alerting that are clear, actionable, and useful for engineering teams.
  • Improve sampling, batching, and context propagation strategies across distributed services.
  • Participate in incident response, post-mortems, and reliability improvements driven by observability signals.
  • Mentor engineers on observability practices, tooling, and structured logging standards.

Requirements

  • 6+ years of experience in DevOps, SRE, or platform engineering roles.
  • At least 2 years of experience focused on observability tooling at production scale.
  • Deep hands-on experience with VictoriaMetrics or Prometheus, including MetricsQL/PromQL, exporters, service discovery, remote write, downsampling, and retention management.
  • Solid OpenSearch or Elasticsearch experience, including cluster operations, Query DSL, ISM policies, and ingest pipeline design.
  • Production experience with OpenTelemetry, including Collector configuration, OTLP, context propagation, and instrumentation across multiple languages.
  • Strong Kafka experience, including producer/consumer patterns, consumer group management, Kafka Connect, Schema Registry, and JMX-based monitoring.
  • Experience with Strimzi is a plus for running Kafka on Kubernetes.
  • Proficiency with log shippers such as Fluent Bit, Vector, or Fluentd and structured log parsing/normalization.
  • Working knowledge of Kubernetes, Helm, Argo CD/GitOps, Terraform, and Ansible.
  • Comfort in a hybrid AWS and on-prem environment, with solid networking knowledge as it applies to scraping and shipping pipelines.
  • Scripting ability in Bash or Python for automation and tooling.
  • Strong communication skills and English proficiency.

Benefits

  • Competitive salary.
  • Flexible work-life harmony with a hybrid work setup.
  • Generous annual leave.
  • Employee referral program.
  • Comprehensive health and pension benefits, including medical insurance.
  • 30 extra days to work remotely from anywhere in the world, with some restrictions.
  • Two additional paid volunteer days each year.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior SRE - Platform (Managed Kubernetes Infrastructure)

Elastic 1K-5K Internet Software & Services

Elastic is hiring a Site Reliability Engineer on its Platform Engineering team to design and operate the multi-cloud platform that hosts Elastic Cloud services and supports rapid, reliable product delivery.

Docker Go InfluxDB Kubernetes Linux Prometheus Terraform
8 hours, 8 minutes ago

Site Reliability Engineer

Dropbox 1K-5K Internet Software & Services

Dropbox is hiring a Corporate Site Reliability Engineer to lead infrastructure reliability, observability, automation, and security for its IT Services environment.

Ansible AWS Bash Chef Datadog DHCP DNS Docker EC2 GitHub GitHub Actions GitOps Kubernetes Linux Python REST API Serverless Terraform Ubuntu WAF
8 hours, 23 minutes ago

Senior Observability Engineer

Ensono 1K-5K IT Services

Ensono is hiring an observability and monitoring engineer to operate and improve hybrid cloud monitoring platforms for enterprise clients, with the goal of delivering real-time visibility, reliable alerting, and compliant monitoring operations.

Ansible AWS Azure Bash Datadog GCP JavaScript Kubernetes Python Terraform
8 hours, 53 minutes ago

Sr. Site Reliability Engineer

SpaceX 10K-50K Aerospace & Defense

SpaceX is hiring a Sr. Site Reliability Engineer for the Starshield program to build and operate highly reliable infrastructure and automation for government-focused satellite software systems.

Ansible Bash CI/CD Kubernetes Linux Python TCP/IP Terraform
8 hours, 53 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers