Site Reliability Engineer I

1 month, 2 weeks ago
Full-time
Lead
DevOps and Infrastructure
Zafin

Zafin

Zafin is a provider of relationship banking software solutions to the financial services industry. Their transformative solutions range from core modernization to innovative platforms in billing, analytics, and rates & fees to quote to cash, empowering...

Internet Software & Services
251-1K
Founded 2002
$47M raised

Description

  • Manage the resolution of complex technical issues involving Zafin’s products and Azure cloud environment.
  • Design and implement operational enhancements to improve resiliency and system reliability.
  • Conduct root cause analysis for high-severity incidents and reduce repeat failures.
  • Represent the organization on external client escalation calls and provide expert guidance and solutions.
  • Optimize cloud infrastructure for performance, scalability, and cost-effectiveness.
  • Provide leadership in managing and scaling container orchestration platforms such as AKS and OpenShift.
  • Implement advanced monitoring solutions and use predictive analytics for proactive issue resolution.
  • Develop and execute automation strategies for operational workflows and incident response.
  • Create and maintain documentation for cloud architectures, processes, and incident management strategies.
  • Mentor and coach junior engineers while collaborating with cross-functional teams on strategic initiatives.

Requirements

  • Bachelor’s degree in computer science, engineering, or a related field; master’s degree preferred.
  • 8+ years of experience in cloud support, operations, or a related role.
  • Advanced expertise in Microsoft Azure, or equivalent cloud platforms.
  • Experience designing and scaling container orchestration systems such as AKS or OpenShift.
  • Proven leadership managing automated deployment pipelines, including Azure DevOps.
  • Experience with enterprise monitoring platforms such as Azure Insights and Grafana, plus predictive analytics tools.
  • Advanced scripting skills with PowerShell, Python, or similar languages.
  • Extensive experience in incident management and defining SLAs for global production environments.
  • In-depth knowledge of database management, particularly Postgres.
  • Preferred: advanced cloud certifications such as Azure Solutions Architect Expert.
  • Preferred: experience with ITSM tools and processes such as ServiceNow.
  • Preferred: strong understanding of security and compliance in cloud environments.
  • Strong analytical and problem-solving abilities.
  • Strong leadership, mentoring, communication, and collaboration skills.

Benefits

  • Competitive salaries.
  • Annual bonus potential.
  • Generous paid time off.
  • Paid volunteering days.
  • Wellness benefits.
  • Robust opportunities for professional growth and career advancement.
  • Accommodations available for candidates with disabilities during the selection process.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Manager, Engineering

Sumo Logic 251-1K Internet Software & Services

Sumo Logic is hiring a Senior Manager, Engineering for Application Security to lead global programs that improve product security, reliability, and operational efficiency across its cloud platform.

Agile AWS C++ Docker GCP Java Kafka Kubernetes OWASP Ruby Scala SIEM
13 hours, 46 minutes ago

Staff Software Engineer - Databases SRE | Sweden | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer, SRE to improve the reliability and scalability of Grafana Cloud’s database products for high-value customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Microservices Python Terraform
1 day, 13 hours ago

Senior Site Reliability Engineer (SRE)

Oowlish 51-250 Internet Software & Services

Oowlish is hiring a Senior Site Reliability Engineer to own the reliability and operational excellence of business-critical production systems for international clients in a remote, collaborative environment.

AWS Datadog Go Heroku Kubernetes PostgreSQL Python SQL Server TypeScript
1 day, 13 hours ago

Staff Software Engineer - Databases SRE | Spain | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer - SRE to strengthen the reliability of its cloud database products for high-SLA customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Python Terraform
1 day, 13 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers