Capital Markets Gateway

Capital Markets Gateway

Capital Markets Gateway (cmgx.io) is a leading provider of efficient capital markets solutions, offering compliant workflow management and data analytics services. With a platform designed by industry experts with over 100 years of experience, we deliv...

Capital Markets
51-250
$39M raised

Description

  • Design, implement, and maintain monitoring and observability solutions using Prometheus, Grafana, Datadog, and OpenTelemetry.
  • Define and implement SLOs, SLIs, and error budgets to measure and improve system reliability.
  • Develop and optimize dashboards, alerts, and reports for system performance and business metrics.
  • Design actionable alerting strategies to reduce noise and improve mean time to recovery.
  • Integrate alerting systems with Jira and establish runbooks for on-call incident response.
  • Analyze performance metrics, identify bottlenecks, and implement optimizations for scalability and cost efficiency.
  • Support load testing and capacity planning to prepare systems for peak traffic.
  • Identify opportunities for automation and build tools to streamline failover, configuration management, and monitoring.
  • Work closely with software, operations, and infrastructure teams to provide technical guidance and drive solutions.
  • Promote SRE principles and practices across the company.

Requirements

  • Must be based in Latin America.
  • English proficiency at C1 or C2 level.
  • Proven experience as a Site Reliability Engineer or in a similar role.
  • Experience with logging, metrics, and tracing frameworks such as Datadog, Loki, Prometheus, and OpenTelemetry.
  • Experience with cloud platforms, with Azure preferred.
  • Experience with infrastructure-as-code tools such as Terraform.
  • Strong programming and scripting skills in Python and Bash.
  • Proficiency with Docker and Kubernetes.
  • Understanding of Linux-based systems, networking, and security principles for containerized applications.
  • Experience with PostgreSQL monitoring and optimization is preferred.

Benefits

  • Equity.
  • Unlimited PTO, including 15 days plus bank holidays and unlimited additional paid leave.
  • Comprehensive benefits program managed by Globalization Partners.
  • Premium life and income protection.
  • Top private medical and dental insurance.
  • Employee Assistance Program (EAP).
  • Pension contributions.
  • Remote work environment.
  • Education reimbursement and continuous learning opportunities.
  • Employee referral bonus.
  • Parental leave.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Software Engineer II - Inline Mailflow

Abnormal AI Internet Software & Services

Abnormal AI is hiring a Software Engineer for the Inline Mailflow team to build next-generation SMTP relay infrastructure for outbound email security and long-term secure email gateway displacement.

Apache Spark AWS Django DNS Docker Go Kubernetes Prometheus Python
13 hours, 28 minutes ago

Staff Software Engineer - Reliability

Rubrik 1K-5K IT Services

Rubrik is hiring a Staff Site Reliability Engineer to lead reliability, automation, and cloud infrastructure architecture for its global SaaS and government-compliant environments, while also guiding the Application-SRE team and bridging customer issues back into engineering priorities.

AWS GCP Go Grafana Java Kubernetes MySQL OpenTelemetry Prometheus Pulumi Python Terraform
21 hours, 30 minutes ago

Sr. Database Reliability Engineer

SpaceX 10K-50K Aerospace & Defense

SpaceX is seeking a Senior Database Reliability Engineer to own and improve the reliability, performance, and operational support of the company’s Oracle and PostgreSQL database environment within its IT Engineering organization.

Bash Git Linux Machine Learning MySQL Oracle PostgreSQL Python SQL Windows Server
21 hours, 30 minutes ago

Site Reliability Engineer

Orion Health 251-1K Internet Software & Services

Orion Health is seeking an experienced Site Reliability Engineer to strengthen the reliability, scalability, and operational performance of its cloud infrastructure and healthcare platforms serving millions of users worldwide.

AWS Azure Bash CI/CD CloudFormation Docker GCP Kubernetes Microservices PowerShell Python Terraform
21 hours, 30 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers