Site Reliability Engineer (Remote) - #35039

2 hours, 58 minutes ago
Full-time
Mid Level
DevOps and Infrastructure
Recruitment & Search Agency - Headhunter in the Philippines

Recruitment & Search Agency - Headhunter in the Philippines

Manila Recruitment is a top recruitment agency in the Philippines, offering hiring solutions for executive search, IT, developers, managers, and specialized roles. With a database of over 250,000 candidates, we provide innovative headhunting services a...

Professional Services
11-50
Founded 2010

Description

  • Monitor the platform using Cloud Run logs, Temporal workflow UI, GKE pod status, and Pub/Sub queue states.
  • Triage issues to determine whether problems originate in the Python agent layer, Temporal workflows, Go APIs, or Vue frontend.
  • Investigate and resolve paralegal-facing operational issues such as stuck cases, failed faxes, and pending qualifications.
  • Use SQL against AlloyDB PostgreSQL to support troubleshooting and issue investigation.
  • Write and maintain runbooks and escalation procedures for recurring incidents and support workflows.
  • Support integrations across fax, email, SMS/voice, authentication, and external legal or healthcare systems.
  • Work closely with the platform components across backend, workflow, infrastructure, and data services to keep operations running smoothly.

Requirements

  • Experience troubleshooting production systems across logs, workflows, pods, queues, APIs, and UI layers.
  • Comfort working with SQL against PostgreSQL or similar databases.
  • Familiarity with cloud-based infrastructure and services such as GCP, Cloud Run, GKE, Pub/Sub, Redis, and Terraform.
  • Ability to diagnose issues in Python services, Go microservices, and web applications.
  • Experience writing runbooks, support documentation, or escalation procedures.
  • Legal operations, litigation support, or similar domain experience is a bonus.
  • Understanding of integration-based workflows with external systems such as fax, email, SMS/voice, or CRM/CMS tools is preferred.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

1 hour, 10 minutes ago

Senior Site Reliability Engineer

Parallel Domain 51-250 Aerospace & Defense

Parallel Domain is hiring a Senior Site Reliability Engineer to operate and evolve the infrastructure that powers large-scale simulation and validation for autonomous systems in a remote role across Canada and the U.S. Pacific Northwest.

Active Directory Argo CD AWS Bash DNS Docker GitHub Actions Grafana Helm Kubernetes Linux Load Balancing Packer Prometheus Python Terraform
3 hours, 10 minutes ago

Junior Site Reliability Engineer

Coalfire 251-1K Internet Software & Services

Coalfire is hiring a Junior Site Reliability Engineer to support managed cloud services for clients by operating and maintaining secure, resilient SaaS infrastructure across major public cloud environments.

Ansible AWS Azure Bash CI/CD Docker GCP HIPAA JIRA Kubernetes Linux Palo Alto PowerShell Python SOC Splunk Terraform TLS Windows Server
3 hours, 25 minutes ago

SRE Technical Project Manager

HHAeXchange 251-1K Health Care Providers & Services

HHAeXchange is hiring a remote SRE Technical Project Manager to help improve the stability, resiliency, and scalability of its homecare technology platform through project delivery, incident management, and operational reporting.

Agile Datadog JIRA Kanban OpsGenie PagerDuty
3 hours, 40 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers