Incident Commander

5 days, 2 hours ago
Full-time
Senior
DevOps and Infrastructure
Caseware

Caseware

CaseWare International Inc. provides cutting-edge software solutions for accounting firms, corporations, and governments, enabling users worldwide to work smarter and transform insights into impact.

Internet Software & Services
251-1K
Founded 1988

Description

  • Initiate and oversee incident response efforts as the primary point of coordination when incidents are detected.
  • Lead cross-functional teams, including engineering, product management, and support, through rapid incident resolution.
  • Communicate timely updates, resolution plans, and status information to internal and external stakeholders.
  • Use and integrate tools such as JIRA, PagerDuty, New Relic, AWS, and Microsoft Teams to monitor and coordinate incident handling.
  • Understand the software and infrastructure environment to guide resolution strategies and recovery actions.
  • Ensure the right stakeholders are engaged during active incidents to support swift restoration of service.
  • Track and report uptime metrics to promote transparency around system reliability and performance.
  • Coordinate and lead post-mortem sessions, documenting root causes, lessons learned, and action items.
  • Create post-incident reports and RCA documents that include timeline, impact, remediation, root cause, and preventive steps.
  • Follow up on corrective actions and implement proactive strategies to reduce risk and improve system resilience.
  • Participate in an on-call rotation.

Requirements

  • 5+ years of experience managing critical incidents in SaaS environments.
  • Prior knowledge of cloud environments, AWS, DevOps practices, or related technical operations.
  • Experience in a similar role within a software or technology company is strongly preferred.
  • Strong technical background in incident management and response.
  • Proven ability to lead teams toward rapid incident resolution.
  • Solid understanding of modern software landscapes and familiarity with JIRA and PagerDuty integrations.
  • Excellent written and verbal communication skills.
  • Ability to perform well under pressure and manage competing priorities effectively.
  • Strong English communication and collaboration skills.
  • An AI-first mindset, with willingness to leverage AI tools to improve productivity and decision-making.
  • Ability to use AI-assisted tools for drafting, summarization, research, and data analysis.
  • Open to a fully remote role based in Colombia.
  • Background check completion through Certn.co is required for successful candidates, with additional soft credit checks for executives and senior managers.

Benefits

  • Contrato a término indefinido with all legal benefits.
  • Prepaid medicine, life insurance, and funeral assistance.
  • Internet allowance and home office stipend.
  • Competitive compensation above the market average.
  • 100% remote work environment with excellent work-life balance.
  • Training budget and mentorship from a highly experienced professional.
  • 5 personal PTO days per year, plus sick leave top-up from day 3 to 90.
  • Recognition award with additional paid time off based on years of service.
  • Vacation upgrade starting at 5 years of service.
  • Employee Assistance Program (EAP) through TELUS Health.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Manager, Software Engineering

Anduril Industries 1K-5K Aerospace & Defense

Anduril Industries is seeking a Senior Manager to lead CorpTech Platform software teams that build and operate AI-enabled production systems and improve how internal engineering work is designed, shipped, and maintained.

CI/CD Computer Vision ERP LLM Microservices
19 minutes ago

Senior Site Reliability Engineer

Anduril Industries 1K-5K Aerospace & Defense

Anduril Industries is hiring a Site Reliability Engineer for its Mission Autonomy team to support the reliability and operational excellence of autonomous systems used across cloud, hardware-in-the-loop, and air-gapped environments.

Ansible AWS Azure DNS Docker GCP Go HTTP Kubernetes Linux Load Balancing Puppet Python Splunk TCP/IP Terraform
20 minutes ago

Staff Site Reliability Engineer

Veeam Software 1K-5K Internet Software & Services

Veeam is hiring a Staff Site Reliability Engineer to lead reliability and observability efforts across its global platform and help shape resilient architecture and SRE practices at scale.

Azure C# Go Grafana Java JavaScript Kubernetes OpenTelemetry Prometheus Pulumi Terraform TypeScript
34 minutes ago

Site Reliability Engineer

66degrees 251-1K IT Services

66degrees is hiring a Site Reliability Engineer to help enterprise cloud clients maintain, optimize, and scale Google Cloud environments through reliability engineering, automation, and incident response.

Agile Datadog GCP JIRA Kanban Kubernetes Linux Prometheus Python Scrum SQL Server Terraform
51 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers