Capital Markets Gateway

Capital Markets Gateway

Capital Markets Gateway (cmgx.io) is a leading provider of efficient capital markets solutions, offering compliant workflow management and data analytics services. With a platform designed by industry experts with over 100 years of experience, we deliv...

Capital Markets
51-250
$39M raised

Description

  • Design, implement, and maintain monitoring and observability solutions for infrastructure and applications.
  • Define and manage SLOs, SLIs, and error budgets to measure system reliability.
  • Develop and optimize dashboards, alerts, and reports for system performance and business metrics.
  • Design actionable alerting strategies to reduce noise and improve incident response time.
  • Integrate alerting systems with Jira and maintain on-call runbooks.
  • Analyze system performance, identify bottlenecks, and implement optimizations for scalability and cost efficiency.
  • Support load testing and capacity planning to prepare systems for peak traffic.
  • Identify automation opportunities and build tools for failover, configuration management, and monitoring.
  • Collaborate with software, operations, and infrastructure teams to drive technical solutions and share SRE practices across the company.

Requirements

  • Must be based in Latin America.
  • English proficiency at C1 or C2 level.
  • Proven experience as a Site Reliability Engineer or in a similar role.
  • Experience with logging, metrics, and tracing tools such as DataDog, Loki, Prometheus, and OpenTelemetry.
  • Experience with cloud platforms, with Azure preferred.
  • Experience with infrastructure-as-code tools such as Terraform.
  • Strong programming and scripting skills in Python and Bash.
  • Experience with Docker and Kubernetes.
  • Understanding of Linux-based systems, networking, and security principles for containerized applications.
  • Strong problem-solving, troubleshooting, communication, and collaboration skills.
  • Ability to thrive in a fast-paced, constantly evolving environment.
  • Experience with PostgreSQL monitoring and optimization is preferred.

Benefits

  • 2-year+ contract.
  • 15 business days of vacation.
  • Tech courses and conferences.
  • Top-of-the-line MacBook.
  • Flexible working hours.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Site Reliability Engineer, Security & Compliance (L3)

CoinGecko 51-250 IT Services

CoinGecko is hiring a Senior Site Reliability Engineer to support its remote Malaysia-based engineering team in maintaining secure, reliable, and compliant operations for a high-scale cryptocurrency data platform.

AWS Blockchain Cloudflare CloudFormation GCP Go Python Ruby Terraform
45 minutes ago

E01-L03 Reliability Engineer IV

TalentWerx 11-50 Professional Services

EXPANSIA is hiring a Remote Reliability Engineer IV to support cloud platforms and services for U.S. Department of Defense and national security programs, with the main objective of improving availability, performance, monitoring, incident response, and production reliability.

Prototyping
1 hour ago

Staff Site Reliability Engineer (Platform Reliability)

Qonto 1K-5K Banks

Qonto is hiring a Staff Site Reliability Engineer to lead platform reliability work, shape infrastructure decisions, and help scale its cloud platform for millions of customers across Europe.

Argo CD AWS Docker Elasticsearch GitLab CI GitOps Go Kafka Kubernetes Microservices OpenTelemetry OpsGenie PostgreSQL Prometheus Python Redis Terraform
2 hours ago

Incident Engineer

Netomi 51-250 IT Services

Netomi is hiring a remote Incident Engineer in Gurugram to manage end-to-end incident response for its enterprise AI customer experience platform and keep customer- and internal-facing systems running reliably.

AWS Datadog LLM
2 hours, 30 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers