Site Reliability Engineer II (Santo Domingo)

1 hour, 49 minutes ago
Full-time
Senior
DevOps and Infrastructure
InvestorFlow

InvestorFlow

InvestorFlow is a leading provider of front office software applications for private equity, real estate, and hedge fund investment firms. The company offers off-platform CRM and portal solutions for firms looking to quickly establish a professional pr...

Capital Markets
51-250
Founded 2015
$30M raised

Description

  • Participate in architectural design reviews and validate reliability standards for new and existing systems.
  • Audit production systems and verify that services meet SRE production-readiness requirements.
  • Design and implement monitoring strategies across shared observability platforms.
  • Define golden signals dashboards, measure SLOs/SLIs and error budgets, and help implement actionable alerting.
  • Drive structured logging, distributed tracing, and OpenTelemetry standards for engineering teams.
  • Monitor and audit production instrumentation to ensure observability coverage is complete.
  • Own production incident response, lead incident handling, and drive remediation efforts.
  • Conduct blameless post-incident reviews and ensure follow-through on corrective actions.
  • Monitor resource utilization, forecast capacity needs, and tune autoscaling in partnership with Engineering.
  • Validate disaster recovery environments, test failover processes, and lead regular DR drills with cross-functional teams.

Requirements

  • 5+ years of experience in Site Reliability Engineering, Production Engineering, or a related operations role.
  • Strong knowledge of cloud-native systems, preferably Microsoft Azure.
  • Experience with observability tooling such as Grafana, Prometheus/Loki, Azure Monitor, and Application Insights.
  • Understanding of disaster recovery concepts, failover validation, and operational readiness.
  • Strong grasp of SRE principles including SLOs/SLIs, error budgets, toil reduction, and postmortems.
  • Strong collaboration and communication skills.
  • Ability to read Terraform/HCL is a plus, but not required.
  • Familiarity with chaos engineering practices is a nice-to-have.
  • Experience supporting capacity planning, autoscaling, and cost optimization is preferred.
  • Ability to work effectively with Engineering, DevOps, Platform, and Non-Functional Testing teams.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff SRE Engineer

Stellar Cyber 51-250 Professional Services

Stellar Cyber is seeking a Staff Site Reliability Engineer to improve the reliability, scalability, and operational efficiency of its cloud-native production platforms supporting cybersecurity services.

Apache Spark Argo CD AWS Azure Bash Bitbucket CI/CD Elasticsearch GCP GitHub Actions Grafana Helm Kafka Kubernetes Linux MongoDB Prometheus Python Redis Terraform
4 minutes ago

Staff SRE Engineer

Stellar Cyber 51-250 Professional Services

Stellar Cyber is seeking a Staff Site Reliability Engineer to improve the reliability, scalability, and operational efficiency of its cloud-based cybersecurity platform and production systems.

Apache Spark Argo CD AWS Azure Bash Bitbucket Elasticsearch GCP GitHub Actions Grafana Helm Kafka Kubernetes MongoDB Prometheus Python Redis Terraform
34 minutes ago

Manager, Site Reliability Engineering I

Filevine 251-1K Specialized Consumer Services

Filevine is hiring a Manager of Site Reliability Engineering I to lead reliability and platform project execution for its Legal AI platform in close partnership with product and development teams.

AWS Kubernetes Terraform
1 hour, 19 minutes ago

Site Reliability Engineer

DEUNA 51-250 Diversified Financial Services

DEUNA is hiring a Mid Site Reliability Engineer to help ensure the reliability, scalability, and performance of its AWS-based payments platform through observability, automation, and SRE practices.

AWS Go Grafana OpenTelemetry Prometheus
1 hour, 19 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers