Site Reliability Engineer II (Santo Domingo)

3 weeks, 1 day ago
Full-time
Senior
DevOps and Infrastructure
InvestorFlow

InvestorFlow

InvestorFlow is a leading provider of front office software applications for private equity, real estate, and hedge fund investment firms. The company offers off-platform CRM and portal solutions for firms looking to quickly establish a professional pr...

Capital Markets
51-250
Founded 2015
$30M raised

Description

  • Participate in architectural design reviews and validate reliability standards for new and existing systems.
  • Audit production systems and verify that services meet SRE production-readiness requirements.
  • Design and implement monitoring strategies across shared observability platforms.
  • Define golden signals dashboards, measure SLOs/SLIs and error budgets, and help implement actionable alerting.
  • Drive structured logging, distributed tracing, and OpenTelemetry standards for engineering teams.
  • Monitor and audit production instrumentation to ensure observability coverage is complete.
  • Own production incident response, lead incident handling, and drive remediation efforts.
  • Conduct blameless post-incident reviews and ensure follow-through on corrective actions.
  • Monitor resource utilization, forecast capacity needs, and tune autoscaling in partnership with Engineering.
  • Validate disaster recovery environments, test failover processes, and lead regular DR drills with cross-functional teams.

Requirements

  • 5+ years of experience in Site Reliability Engineering, Production Engineering, or a related operations role.
  • Strong knowledge of cloud-native systems, preferably Microsoft Azure.
  • Experience with observability tooling such as Grafana, Prometheus/Loki, Azure Monitor, and Application Insights.
  • Understanding of disaster recovery concepts, failover validation, and operational readiness.
  • Strong grasp of SRE principles including SLOs/SLIs, error budgets, toil reduction, and postmortems.
  • Strong collaboration and communication skills.
  • Ability to read Terraform/HCL is a plus, but not required.
  • Familiarity with chaos engineering practices is a nice-to-have.
  • Experience supporting capacity planning, autoscaling, and cost optimization is preferred.
  • Ability to work effectively with Engineering, DevOps, Platform, and Non-Functional Testing teams.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer

Sitetracker 251-1K Diversified Telecommunication Services

Site Reliability Engineer at a Canada-based technology company, responsible for building and scaling a proactive reliability practice for AI-driven platform workloads in a remote environment.

AWS Bash CloudFormation EC2 GitHub Actions Load Balancing Terraform
24 minutes ago

SRE / Platform Reliability Architect

NEORIS 5K-10K Internet Software & Services

EPAM NEORIS is seeking an SRE/Platform Reliability Architect to lead platform reliability and resiliency design, incident response, and cross-functional alignment for digital transformation initiatives.

CI/CD Grafana Kubernetes OpenTelemetry Prometheus Terraform
8 hours, 18 minutes ago

Contract: Senior Site Reliability Engineer

Newsela 251-1K Diversified Consumer Services

Newsela is hiring a Senior Site Reliability Contractor to improve and automate infrastructure, monitoring, and release operations for its cloud-based education platform.

Agile AWS CI/CD Datadog Docker GCP GitHub Actions JIRA MySQL Neo4j PostgreSQL Prefect Python Redis SQL Terraform
8 hours, 48 minutes ago

Principal Site Reliability Engineer

Zscaler 1K-5K Internet Software & Services

Zscaler is hiring a Principal Site Reliability Engineer to join its Infrastructure Services and Architecture team, owning cloud and infrastructure reliability for customer-facing systems in a hybrid or remote role.

Agile Ansible CI/CD Git Go HashiCorp Vault Kubernetes Linux OpenID Connect Python Terraform
9 hours, 18 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers