PlayON! Sports Network

PlayON! Sports Network

PlayON! Sports Network provides a comprehensive platform for high school sports programs, offering digital ticketing, live streaming, statistics, coaching tools, and social content to enhance community engagement and support student athletes.

Media
51-250
Founded 2006
$10M raised

Description

  • Assess and improve system visibility by reviewing dashboards, metrics, and logs and closing observability gaps.
  • Tighten monitoring and alerting for critical services to detect issues earlier and improve response times.
  • Build observability into build and deploy workflows by adding instrumentation and telemetry to release processes.
  • Help define SLIs and SLOs for core user flows and align the team on reliability expectations.
  • Improve incident response by partnering with the Event Commander/on-call rotation and strengthening communication, coordination, and follow-up.
  • Automate routine checks and monitoring tasks to reduce manual effort and free up engineering time.
  • Develop automation, tooling, and monitoring solutions that support high service availability.
  • Partner with application and quality engineering teams on reliability practices, release automation, and testing.
  • Drive operational excellence through incident prevention, blameless postmortems, and capacity planning.
  • Participate in on-call rotations to support critical services and respond quickly to incidents.

Requirements

  • Solid experience in Python for automation, tooling, and data-driven operational work.
  • Proficiency in at least one of Java, C++, or Go.
  • Strong understanding of Linux systems, cloud infrastructure, and modern deployment practices.
  • Experience with AWS, GCP, or Azure.
  • Experience with Docker, Kubernetes, and Terraform.
  • Experience with CI/CD pipelines, version control, and automated testing frameworks.
  • Experience with observability tools such as Prometheus, Grafana, ELK, or Datadog.
  • Experience analyzing logs and metrics to diagnose issues.
  • Proven experience facilitating and documenting Critical User Journeys and translating them into actionable SLAs/SLOs for automation.
  • Strong collaboration and communication skills in cross-functional, high-impact situations.
  • Familiarity with AI-augmented development tools such as Claude and Codex.
  • Nice to have: experience writing or maintaining end-to-end or integration tests for distributed systems.
  • Nice to have: background in performance testing, capacity planning, or chaos engineering.
  • Nice to have: contributions to internal developer tooling or reliability-focused frameworks.
  • Nice to have: exposure to security, compliance, or change management processes in production environments.
  • Nice to have: relevant certifications.

Benefits

  • Multiple medical insurance plans to choose from.
  • Dental, vision, life, and disability insurance.
  • Employee Emergency Fund.
  • Company equity in the form of stock options.
  • Open PTO policy.
  • 401(k) plan with company match.
  • Hybrid/flexible work environment.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Observability Architect

Geotab 1K-5K Road & Rail

Geotab is hiring an SRE Observability Architect to define and lead the observability architecture for its cloud platforms, with the goal of delivering scalable, cost-efficient, and highly reliable insight across distributed systems.

Elasticsearch GCP Go Grafana Helm Jaeger Kubernetes OpenTelemetry Prometheus Python Terraform
5 hours, 29 minutes ago

Senior Site Reliability Engineer (SRE)

Sleek 251-1K Professional Services

Sleek is hiring a Senior SRE Engineer to architect and scale its cloud and AI-ready infrastructure across a multi-country, fast-growing platform serving micro SMEs.

API Gateway Argo CD AWS Azure CI/CD Cloudflare CloudFormation Flux GCP GitOps Kong Kubernetes Microservices NestJS Node.js OpenSearch OpenTelemetry Prometheus Pulumi Python Secrets Management Serverless Terraform Traefik WAF
5 hours, 29 minutes ago

[Job 30278] SRE (DevOps)

CI&T 5K-10K Internet Software & Services

CI&T is hiring a senior SRE/DevOps to evolve the infrastructure behind critical digital products, with a focus on resilient multi-region AWS architecture and mobile delivery pipelines.

Android Ansible API Gateway AWS Bash CI/CD DynamoDB GitHub Actions GitLab CI Grafana iOS Jenkins Kubernetes Prometheus Python Secrets Management Terraform
5 hours, 44 minutes ago

Senior Manager, Engineering

Sumo Logic 251-1K Internet Software & Services

Sumo Logic is hiring a Senior Manager, Engineering for Application Security to lead global programs that improve product security, reliability, and operational efficiency across its cloud platform.

Agile AWS C++ Docker GCP Java Kafka Kubernetes OWASP Ruby Scala SIEM
1 day, 5 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers