Senior SRE Engineer

1 month, 3 weeks ago
Full-time
Senior
Software Development
Trustly

Trustly

Trustly specializes in developing and providing online payment solutions that leverage Open Banking technology to enhance payment processes, reduce costs, and streamline financial services for consumers, merchants, and banks.

Diversified Financial Services
251-1K
Founded 2008

Description

  • Architect, design, and implement strategies to ensure high availability, reliability, and fault tolerance of infrastructure and applications.
  • Lead incident response efforts, perform root cause analysis, implement preventative measures, and own post-incident follow-ups and remediation.
  • Monitor and observe production systems using automation tools to detect, triage, and resolve reliability issues.
  • Identify performance bottlenecks, conduct performance analysis, and optimize system and application performance.
  • Drive automation initiatives to remove manual toil by developing and maintaining tools, scripts, and frameworks for deployment, monitoring, and troubleshooting.
  • Generate regular reports on system reliability, uptime, and performance metrics and present findings, trends, and recommendations to management and stakeholders.
  • Collaborate with cross-functional teams to define SLIs/SLOs/error budgets, KPIs, and develop reporting frameworks to track system health and operational efficiency.
  • Support and maintain critical services running in AWS and on-premises, including system, security, and network monitoring and maintenance.

Requirements

  • Bachelor's degree in Computer Science or a related field.
  • Experience building SLIs, SLOs, and error budgets based on business rules.
  • IT project management experience.
  • Coding experience with Python, Java, Shell, Bash, or similar languages.
  • Experience supporting critical production services in the cloud (AWS) and on-premises environments.
  • Experience with network technologies and system, security, and network monitoring tools.
  • Detailed technical knowledge of databases and the Linux operating system, including standards and best practices for keeping services up and running.
  • Proactive approach to spotting problems, removing manual processes/toil using code, and fixing performance concerns programmatically.
  • Advanced English.
  • Ability to work remotely from Brazil (remote-first culture; position supports working from any city in Brazil).

Benefits

  • Bradesco health and dental plan for you and your dependents with no co-payment cost.
  • Life insurance with differentiated coverage.
  • Meal voucher and supermarket voucher.
  • Home office allowance and remote-first flexible hours (work from any city in Brazil).
  • Gympass access to physical activity spaces and online classes.
  • English program with online group classes and private teacher.
  • Welcome kit with Apple equipment (MacBook Pro, iPhone) and option to purchase equipment under internal criteria.
  • Annual discretionary bonus (annual premium) based on company KPIs and employee referral program rewards.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Site Reliability Engineer

Cribl 251-1K IT Services

Cribl is hiring a Senior Site Reliability Engineer in Poland to help build and operate the telemetry infrastructure and observability platform that supports its cloud products and enterprise customers.

Ansible AWS Azure CI/CD Grafana JavaScript Kibana Linux New Relic Node.js PagerDuty Prometheus Splunk Terraform TypeScript
9 hours, 39 minutes ago

Senior AIOps Engineer, Incident Response [Remote-US]

Quanata 201-500 information technology & services

Quanata is hiring an experienced production operations and reliability leader to oversee production health, incident response, and operational support for its AI-driven insurance technology platform.

AWS Confluence JIRA
12 hours, 17 minutes ago

Site Reliability Engineer II

Backblaze 251-1K IT Services

Backblaze is hiring a Site Reliability Engineer II to support the stability, scalability, and reliability of customer-facing cloud storage services and the infrastructure that powers them.

Ansible AWS Azure Bash CI/CD Docker GCP Go Grafana Jenkins Kubernetes Linux Microservices Prometheus Python Terraform
12 hours, 47 minutes ago

DevOps & Site Reliability Engineer

Oowlish 51-250 Internet Software & Services

Oowlish is hiring a DevOps & Site Reliability Engineer for a remote role supporting an AI-focused SaaS startup’s infrastructure, deployment, and reliability needs.

AWS Azure Azure Pipelines Bash CI/CD CircleCI Datadog Docker GCP Grafana Helm Jenkins Kubernetes New Relic Prometheus
13 hours, 17 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers