Xsolla

Xsolla

Xsolla is an international payment solution provider for online games, offering tools to launch, monetize, and scale games worldwide with local payment methods and fraud prevention.

Internet Software & Services
251-1K
Founded 2005

Description

  • Continuously monitor the GTO Operational Dashboard in Datadog to detect anomalies and determine whether they require incident creation or immediate investigation.
  • Triage and investigate production incidents using Datadog, JIRA Service Management, and related observability data to identify blast radius and likely root cause domains.
  • Route incidents to the appropriate team using the smart routing model and escalate unresolved or code-level issues within defined thresholds.
  • Own lower-severity incidents end-to-end from detection through resolution without escalation when possible.
  • Support the TSO Lead during major incidents by surfacing live technical data, maintaining the incident timeline, linking evidence, and executing mitigation actions.
  • Draft internal and external incident communications, including Slack updates, stakeholder notifications, and status page posts.
  • Analyze incident trends, recurring issues, and production bugs, and contribute findings to reports for product and engineering teams.
  • Compile incident timelines, draft initial PIR documents, and track PIR action items through completion.
  • Build and maintain operational automation such as alert enrichment scripts, incident templates, Slack workflows, and dashboard widgets.
  • Create and maintain runbooks, conduct structured shift handoffs, participate in knowledge transfer, and cover for the TSO Lead when needed.
  • Publish periodic health reports for critical applications.

Requirements

  • 4+ years of experience in SRE, DevOps, production operations, NOC, or technical operations in a high-availability environment.
  • Experience supporting payments, e-commerce, SaaS, or gaming workloads is preferred.
  • Strong troubleshooting and investigation skills across logs, APM traces, infrastructure metrics, database queries, and network paths.
  • Hands-on experience with Datadog or a similar observability platform such as Grafana, Splunk, New Relic, or Elastic.
  • Proficiency in at least one scripting language: Python, Go, or Bash.
  • Clear written and verbal communication skills in English for incident tickets, updates, handoffs, status communications, and PIR drafts.
  • Working knowledge of Kubernetes and cloud infrastructure, with GCP preferred and AWS/Azure acceptable.
  • Understanding of SLOs, error budgets, and burn-rate alerting.
  • Experience with incident management tools such as JIRA/JIRA Service Management, PagerDuty/OpsGenie, Slack, and Confluence.
  • Experience with or strong interest in AI/ML-assisted operations such as anomaly detection, alert correlation, predictive monitoring, or automated remediation.
  • Comfort with 24x7 follow-the-sun shift work and rotating weekend on-call coverage.
  • Nice to have: experience in gaming, payments, or fintech environments.
  • Nice to have: familiarity with Datadog Service Catalog, synthetic monitoring, and RUM.
  • Nice to have: experience debugging distributed systems and cascading microservice failures.
  • Nice to have: exposure to MySQL, PostgreSQL, Redis, or Kafka for incident investigation.
  • Nice to have: familiarity with CI/CD and deployment tools such as GitLab CI, ArgoCD, or Helm.
  • Nice to have: JIRA Service Management administration experience.
  • Nice to have: ITIL Foundation certification.

Benefits

  • $90,000 - $115,000 annual salary for British Columbia, based on location and experience.
  • Medical, dental, and vision coverage.
  • PTO.
  • A personalized career roadmap for each employee.
  • Training and educational opportunities for professional development.
  • A supportive environment focused on employees’ physical, mental, and emotional well-being.
  • Remote work arrangement.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Partner Operations Specialist

Plusgrade 251-1K Consumer Services

Plusgrade is hiring a Partner Operations Specialist to build scalable processes, data foundations, and automation that help Partner Success onboard, support, and grow partners more efficiently across the lifecycle.

JSON Salesforce SQL
1 hour, 22 minutes ago

Senior Transportation Specialist

ShipBob 251-1K Air Freight & Logistics

ShipBob is hiring a remote Australia-based Senior Transportation Specialist to coordinate carrier, final mile, and freight operations, ensuring reliable execution and timely issue resolution across assigned sites.

Power BI
1 hour, 56 minutes ago

Experienced Heavy Body Technician

Carvana 10K-50K Automotive

Carvana is hiring an Experienced Heavy Body Technician to perform extensive autobody repair work on multiple panels at its vehicle inspection and reconditioning centers.

2 hours, 40 minutes ago

Estimator (Civil Infrastructure) - 217

D2B Professional Services

Estimator (Civil Infrastructure) at an Australian client, responsible for preparing accurate construction cost estimates and bid documentation while coordinating project details across operations, engineering, and subcontractor teams.

Salesforce
3 hours, 3 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers