Senior SRE Engineer

2 months, 2 weeks ago
Full-time
Senior
Software Development
Trustly

Trustly

Trustly specializes in developing and providing online payment solutions that leverage Open Banking technology to enhance payment processes, reduce costs, and streamline financial services for consumers, merchants, and banks.

Diversified Financial Services
251-1K
Founded 2008

Description

  • Architect, design, and implement strategies to ensure high availability, reliability, and fault tolerance of infrastructure and applications.
  • Lead incident response efforts, perform root cause analysis, implement preventative measures, and own post-incident follow-ups and remediation.
  • Monitor and observe production systems using automation tools to detect, triage, and resolve reliability issues.
  • Identify performance bottlenecks, conduct performance analysis, and optimize system and application performance.
  • Drive automation initiatives to remove manual toil by developing and maintaining tools, scripts, and frameworks for deployment, monitoring, and troubleshooting.
  • Generate regular reports on system reliability, uptime, and performance metrics and present findings, trends, and recommendations to management and stakeholders.
  • Collaborate with cross-functional teams to define SLIs/SLOs/error budgets, KPIs, and develop reporting frameworks to track system health and operational efficiency.
  • Support and maintain critical services running in AWS and on-premises, including system, security, and network monitoring and maintenance.

Requirements

  • Bachelor's degree in Computer Science or a related field.
  • Experience building SLIs, SLOs, and error budgets based on business rules.
  • IT project management experience.
  • Coding experience with Python, Java, Shell, Bash, or similar languages.
  • Experience supporting critical production services in the cloud (AWS) and on-premises environments.
  • Experience with network technologies and system, security, and network monitoring tools.
  • Detailed technical knowledge of databases and the Linux operating system, including standards and best practices for keeping services up and running.
  • Proactive approach to spotting problems, removing manual processes/toil using code, and fixing performance concerns programmatically.
  • Advanced English.
  • Ability to work remotely from Brazil (remote-first culture; position supports working from any city in Brazil).

Benefits

  • Bradesco health and dental plan for you and your dependents with no co-payment cost.
  • Life insurance with differentiated coverage.
  • Meal voucher and supermarket voucher.
  • Home office allowance and remote-first flexible hours (work from any city in Brazil).
  • Gympass access to physical activity spaces and online classes.
  • English program with online group classes and private teacher.
  • Welcome kit with Apple equipment (MacBook Pro, iPhone) and option to purchase equipment under internal criteria.
  • Annual discretionary bonus (annual premium) based on company KPIs and employee referral program rewards.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Site Reliability Engineer (SRE)

Oowlish 51-250 Internet Software & Services

Oowlish is hiring a Senior Site Reliability Engineer to strengthen the reliability, observability, and incident management of production systems used by international clients.

AWS CI/CD Kubernetes Microservices Terraform
12 minutes ago

Reliability Engineer

Anduril Industries 1K-5K Aerospace & Defense

Anduril Industries is hiring Reliability Engineers to support autonomous defense systems from concept through production and fielded operations, with the goal of improving product robustness, performance, and lifecycle reliability.

12 minutes ago

Manager, Software Engineering - Storage Platform

Figma 1K-5K Internet Software & Services

Figma is hiring an Engineering Manager to lead its Databases team, which owns the core data layer behind the company’s product and platform as it scales.

LLM MySQL PostgreSQL
23 hours, 27 minutes ago

Site Reliability Engineer

Stack AV 201-500 information technology & services

Stack AV is hiring a Site Reliability Engineer to keep its compute platform for large-scale autonomous systems development reliable, scalable, and ready to support engineering and research workloads.

CI/CD Kubernetes Linux OpenTelemetry Prometheus
23 hours, 42 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers