Site Reliability Engineer (Remote) - #35039

3 weeks, 1 day ago
Full-time
Mid Level
DevOps and Infrastructure
Recruitment & Search Agency - Headhunter in the Philippines

Recruitment & Search Agency - Headhunter in the Philippines

Manila Recruitment is a top recruitment agency in the Philippines, offering hiring solutions for executive search, IT, developers, managers, and specialized roles. With a database of over 250,000 candidates, we provide innovative headhunting services a...

Professional Services
11-50
Founded 2010

Description

  • Monitor the platform using Cloud Run logs, Temporal workflow UI, GKE pod status, and Pub/Sub queue states.
  • Triage issues to determine whether problems originate in the Python agent layer, Temporal workflows, Go APIs, or Vue frontend.
  • Investigate and resolve paralegal-facing operational issues such as stuck cases, failed faxes, and pending qualifications.
  • Use SQL against AlloyDB PostgreSQL to support troubleshooting and issue investigation.
  • Write and maintain runbooks and escalation procedures for recurring incidents and support workflows.
  • Support integrations across fax, email, SMS/voice, authentication, and external legal or healthcare systems.
  • Work closely with the platform components across backend, workflow, infrastructure, and data services to keep operations running smoothly.

Requirements

  • Experience troubleshooting production systems across logs, workflows, pods, queues, APIs, and UI layers.
  • Comfort working with SQL against PostgreSQL or similar databases.
  • Familiarity with cloud-based infrastructure and services such as GCP, Cloud Run, GKE, Pub/Sub, Redis, and Terraform.
  • Ability to diagnose issues in Python services, Go microservices, and web applications.
  • Experience writing runbooks, support documentation, or escalation procedures.
  • Legal operations, litigation support, or similar domain experience is a bonus.
  • Understanding of integration-based workflows with external systems such as fax, email, SMS/voice, or CRM/CMS tools is preferred.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer

Alpaca 51-250 Capital Markets

Alpaca is hiring a Site Reliability Engineer to keep its brokerage platform reliable and operable across cloud, Kubernetes, observability, messaging, and database systems, with a strong focus on PostgreSQL reliability on the trading-critical path.

DNS GitOps Go Kafka Kubernetes Linux Load Balancing PostgreSQL Python RabbitMQ Secrets Management TLS
1 hour, 41 minutes ago

Site Reliability Engineer

Kaseya 1K-5K IT Services

Kaseya is hiring a Site Reliability Engineer to own the reliability, automation, and production stability of its AWS-based services used by thousands of MSPs worldwide.

Ansible AWS Chef CloudFormation Datadog DevSecOps Elasticsearch Kibana Kubernetes MySQL PostgreSQL Puppet Secrets Management Serverless Terraform
5 hours, 41 minutes ago

SRE - DevOps Engineer - Argentina

Coderio 51-250 Internet Software & Services

Coderio is hiring a remote DevOps/SRE Engineer in Argentina to ensure the stability, scalability, and efficient operation of the infrastructure that supports its global digital solutions.

Argo CD CI/CD Flux GitHub Actions GitOps Helm Jenkins Kubernetes OpenShift Terraform
9 hours, 21 minutes ago

Senior Site Reliability Engineer

Cribl 251-1K IT Services

Cribl is hiring a Senior Site Reliability Engineer in Poland to help build and operate the telemetry infrastructure and observability platform that supports its cloud products and enterprise customers.

Ansible AWS Azure CI/CD Grafana JavaScript Kibana Linux New Relic Node.js PagerDuty Prometheus Splunk Terraform TypeScript
16 hours, 54 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers