Xsolla

Xsolla

Xsolla is an international payment solution provider for online games, offering tools to launch, monetize, and scale games worldwide with local payment methods and fraud prevention.

Internet Software & Services
251-1K
Founded 2005

Description

  • Serve as the primary dashboard monitor during shifts by watching the GTO Operational Dashboard in Datadog and detecting anomalies across APM, logs, metrics, synthetic tests, and RUM.
  • Triage and investigate production incidents by creating tickets in JIRA Service Management, analyzing traces, logs, infrastructure and application metrics, and routing issues to the appropriate team.
  • Own lower-severity incidents end-to-end from detection through resolution, executing runbook procedures and escalating when thresholds are exceeded or code changes are required.
  • Support the TSO Lead during major incidents by surfacing real-time technical data, maintaining incident timelines, linking evidence, and executing mitigation actions.
  • Draft internal and external incident communications, including Slack updates, stakeholder notifications, and customer-facing status page posts.
  • Analyze recurring incidents, production bugs, and trends using Datadog, JIRA, and Slack, and contribute findings to reports for product and engineering teams.
  • Publish periodic health reports for critical applications and prepare incident timelines and initial PIR drafts.
  • Track PIR action items after review sessions and flag overdue items to the TSO Lead.
  • Build and maintain operational automation such as alert enrichment scripts, incident templates, Slack workflows, and dashboard widgets.
  • Conduct structured shift handoffs and participate in knowledge transfer sessions to expand independent resolution capability.

Requirements

  • 4+ years of experience in SRE, DevOps, production operations, NOC, or technical operations in a high-availability environment.
  • Experience supporting payments, e-commerce, SaaS, or gaming workloads is preferred.
  • Strong troubleshooting and investigation skills with the ability to trace issues through logs, APM, infrastructure metrics, database queries, and network paths.
  • Hands-on experience with Datadog or a similar observability platform such as Grafana, Splunk, New Relic, or Elastic.
  • Proficiency in at least one scripting language: Python, Go, or Bash.
  • Clear written and verbal communication skills in English for incident tickets, shift handoffs, status updates, and PIR drafts.
  • Working knowledge of Kubernetes and cloud infrastructure, with GCP preferred and AWS/Azure acceptable.
  • Understanding of SLOs, error budgets, and multi-window burn-rate alerting.
  • Experience with incident management tools such as JIRA Service Management, PagerDuty or OpsGenie, Slack, and Confluence.
  • Comfort with 24x7 shift-based operations in a follow-the-sun model, including rotating weekend on-call.
  • Experience in gaming, payments, or fintech environments is a plus.
  • Familiarity with Datadog Service Catalog, synthetic monitoring, and RUM is a plus.
  • Experience debugging distributed systems and tracing failures across microservices is a plus.
  • Exposure to MySQL, PostgreSQL, Redis, or Kafka is a plus.
  • Familiarity with CI/CD and deployment tools such as GitLab CI, ArgoCD, or Helm is a plus.
  • JIRA Service Management administration experience is a plus.
  • ITIL Foundation certification is a plus but not required.

Benefits

  • Salary range of RM144,000 to RM216,000 per year.
  • Latest Mac workstation and additional hardware provided for work.
  • Free trainings and participation in specialized conferences.
  • Rich internal knowledge sharing and collaboration opportunities.
  • Health insurance covering medical, dental, and optical care for employees and dependants.
  • Flexible hours to organize your day around your needs and team demands.
  • No dress code.
  • Comfortable, new office environment.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Mobile Mapping Operator

TSMG Professional Services

Terry Soot Management Group (TSMG) is hiring a full-time Mobile Mapping Operator in Würzburg to collect street and public-area imagery for an EMEA field project that will help improve a widely used online map.

6 minutes ago

Standortdaten-Spezialist

TSMG Professional Services

Terry Soot Management Group (TSMG) is hiring a remote full-time field data collection specialist in Passau to capture street-level imagery and related data for map improvement projects across public roads and areas in Germany.

6 minutes ago

Mobile Mapping Operator

TSMG Professional Services

Terry Soot Management Group (TSMG) is hiring a full-time Mobile Mapping Operator to collect street, landmark, and public-area imagery in and around Steinau an der Straße for a long-term mapping project.

6 minutes ago

Data collector / Driver

TSMG Professional Services

Terry Soot Management Group is hiring a full-time field data collector/driver in Spartanburg, SC to drive assigned routes and capture street and public-area imagery for mapping projects.

6 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers