PlayON! Sports Network

PlayON! Sports Network

PlayON! Sports Network provides a comprehensive platform for high school sports programs, offering digital ticketing, live streaming, statistics, coaching tools, and social content to enhance community engagement and support student athletes.

Media
51-250
Founded 2006
$10M raised

Description

  • Assess and improve system visibility by reviewing dashboards, metrics, and logs and closing observability gaps.
  • Tighten monitoring and alerting for critical services to detect issues earlier and improve response times.
  • Build observability into build and deploy workflows by adding instrumentation and telemetry to release processes.
  • Help define SLIs and SLOs for core user flows and align the team on reliability expectations.
  • Improve incident response by partnering with the Event Commander/on-call rotation and strengthening communication, coordination, and follow-up.
  • Automate routine checks and monitoring tasks to reduce manual effort and free up engineering time.
  • Develop automation, tooling, and monitoring solutions that support high service availability.
  • Partner with application and quality engineering teams on reliability practices, release automation, and testing.
  • Drive operational excellence through incident prevention, blameless postmortems, and capacity planning.
  • Participate in on-call rotations to support critical services and respond quickly to incidents.

Requirements

  • Solid experience in Python for automation, tooling, and data-driven operational work.
  • Proficiency in at least one of Java, C++, or Go.
  • Strong understanding of Linux systems, cloud infrastructure, and modern deployment practices.
  • Experience with AWS, GCP, or Azure.
  • Experience with Docker, Kubernetes, and Terraform.
  • Experience with CI/CD pipelines, version control, and automated testing frameworks.
  • Experience with observability tools such as Prometheus, Grafana, ELK, or Datadog.
  • Experience analyzing logs and metrics to diagnose issues.
  • Proven experience facilitating and documenting Critical User Journeys and translating them into actionable SLAs/SLOs for automation.
  • Strong collaboration and communication skills in cross-functional, high-impact situations.
  • Familiarity with AI-augmented development tools such as Claude and Codex.
  • Nice to have: experience writing or maintaining end-to-end or integration tests for distributed systems.
  • Nice to have: background in performance testing, capacity planning, or chaos engineering.
  • Nice to have: contributions to internal developer tooling or reliability-focused frameworks.
  • Nice to have: exposure to security, compliance, or change management processes in production environments.
  • Nice to have: relevant certifications.

Benefits

  • Multiple medical insurance plans to choose from.
  • Dental, vision, life, and disability insurance.
  • Employee Emergency Fund.
  • Company equity in the form of stock options.
  • Open PTO policy.
  • 401(k) plan with company match.
  • Hybrid/flexible work environment.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Operations Engineer

Mozilla 251-1K Internet Software & Services

Mozilla is hiring a Staff Operations Engineer to lead the design, reliability, and evolution of hybrid-cloud and workplace infrastructure across teams.

Ansible DNS Linux Puppet Python TCP/IP Unix
14 hours, 59 minutes ago

Principal Site Reliability Engineer (SRE)

Symmetrio Professional Services

Symmetrio is recruiting a Principal Site Reliability Engineer for a rapidly growing healthcare technology company to own the reliability, scalability, security, and performance of a mission-critical SaaS platform used by healthcare providers across the United States.

Active Directory AWS CI/CD Datadog Django Grafana Kubernetes Python Terraform Windows Server
15 hours, 14 minutes ago

Performance Test Engineer Lead

PartnerOne 51-250 Media

An enterprise performance engineering role at a cloud-focused organization, responsible for validating the scalability, stability, and production readiness of distributed systems across Azure and hybrid environments.

Azure CI/CD Kubernetes PowerShell
15 hours, 30 minutes ago

Site Reliability Engineer

MLabs 11-50 Internet Software & Services

Remote UK-hours Site Reliability Engineering role at a financial technology company, focused on automating and operating the infrastructure that supports global integration services for financial institutions.

Active Directory Ansible AWS CI/CD GCP OAuth PostgreSQL SAML
15 hours, 44 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers