Xsolla

Xsolla

Xsolla is an international payment solution provider for online games, offering tools to launch, monetize, and scale games worldwide with local payment methods and fraud prevention.

Internet Software & Services
251-1K
Founded 2005

Description

  • Serve as the primary dashboard monitor during shifts and continuously watch production health signals in Datadog.
  • Detect anomalies by correlating APM, logs, metrics, synthetic tests, and Real User Monitoring data.
  • Triage and investigate production incidents, create incident tickets in JIRA Service Management, and route issues to the correct team.
  • Own lower-severity incidents end-to-end from detection through resolution, including diagnosis and runbook execution.
  • Support the Technical Shift Operations Lead during major incidents as a technical partner in the war room.
  • Draft internal and customer-facing incident communications, including Slack updates and status page posts.
  • Analyze incident trends, recurring issues, and production bugs and contribute findings to reports and post-incident reviews.
  • Compile incident timelines, draft initial PIR documents, and track action items after reviews.
  • Build and maintain operational automation, incident templates, Slack workflows, dashboard widgets, and runbooks.
  • Conduct structured shift handoffs and participate in knowledge transfer sessions to improve independent resolution capability.
  • Cover for the TSO Lead when needed, including severity classification, escalation decisions, and basic incident commander functions.
  • Publish periodic health reports for critical applications.

Requirements

  • 4+ years of experience in SRE, DevOps, production operations, NOC, or technical operations in a high-availability environment.
  • Experience supporting payments, e-commerce, SaaS, or gaming workloads is preferred.
  • Strong troubleshooting and investigation skills across logs, traces, metrics, databases, and network paths.
  • Hands-on experience with Datadog or a similar observability platform such as Grafana, Splunk, New Relic, or Elastic.
  • Proficiency in at least one scripting language: Python, Go, or Bash.
  • Clear written and verbal communication skills in English.
  • Working knowledge of Kubernetes and cloud infrastructure; GCP is preferred, while AWS or Azure are acceptable.
  • Understanding of SLOs, error budgets, and burn-rate alerting.
  • Experience with JIRA or JIRA Service Management, PagerDuty or OpsGenie, Slack, and Confluence.
  • Interest in or experience with AI/ML-assisted operations such as anomaly detection, alert correlation, predictive monitoring, or automated remediation.
  • Comfort with 24x7 shift-based operations in a follow-the-sun model, including weekend on-call rotation.
  • Experience in gaming, payments, or fintech environments is a plus.
  • Familiarity with Datadog Service Catalog, synthetic monitoring, and RUM is a plus.
  • Exposure to database and platform tools such as MySQL, PostgreSQL, Redis, Kafka, GitLab CI, ArgoCD, and Helm is a plus.
  • JIRA Service Management administration experience or ITIL Foundation certification is preferred.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Water / Wastewater Operator (Treatment Plant & Field Services)

Apex Companies 251-1K Construction & Engineering

Apex Companies is hiring a remote Water/Wastewater Operator in the Somerset, PA area to support field services, treatment plant operations, and compliance-focused maintenance across client sites.

15 hours, 26 minutes ago

Driver

WelbeHealth 251-1K Family Services

WelbeHealth is hiring a Driver to safely transport PACE participants to and from destinations while supporting participant dignity, comfort, and connection to care.

15 hours, 41 minutes ago

Working Student (gn) - Application Management

The Quality Group Health Care Providers & Services

The Quality Group is hiring a part-time working student in Germany, remote, to support application management and IT service management for its digital application landscape.

15 hours, 41 minutes ago

Medi-Cal Enrollment Advocate

WelbeHealth 251-1K Family Services

WelbeHealth is hiring an Eligibility & Coverage Advocate to help enrolled seniors maintain state and federal benefit coverage and resolve Medi-Cal eligibility issues across the care team and government agencies.

15 hours, 41 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers