Devsu

Master the Art of Digital Innovation with Devsu Learn to create digital solutions that drive change and growth. Devsu provides the tools and resources you need to master the art of digital innovation. Devsu is a technology agency that provides software...

Internet Software & Services

Information Technology

51-250 (210)

Founded 2010

8 open positions

Links

View All Jobs

Senior Site Reliability Engineer (SRE) - (GCP)

2 months ago

Guatemala, Honduras, Colombia

Full-time

Senior

Site Reliability Engineer (SRE)

DevOps and Infrastructure

Bash GCP Grafana Kubernetes Linux PagerDuty Prometheus Python

Apply Now

Devsu

Internet Software & Services

51-250

Founded 2010

View All Jobs 8

Description

Own and operate the monitoring and observability stack across on-premises and GCP environments.
Design, build, and maintain Grafana dashboards for infrastructure, Kubernetes, and application visibility.
Define, tune, and maintain alerts to improve signal quality and reduce noise.
Establish observability standards and best practices across teams.
Improve system health, performance, and reliability through monitoring and operational improvements.
Apply SRE practices to improve availability, resilience, and performance.
Define and track SLIs, SLOs, and error budgets.
Participate in on-call rotations and SEV incident response.
Lead or contribute to incident investigations and root cause analysis, and drive preventative actions.
Support and monitor Kubernetes environments, including GKE and on-prem clusters, and troubleshoot platform issues affecting application reliability.
Provide L2/L3 application support coverage during resource shortages, major incidents, or escalations.
Triage application issues using runbooks and dashboards, and document actions and resolutions in ServiceNow.

Requirements

Strong experience as a Site Reliability Engineer or Reliability Engineer.
Deep hands-on expertise with Grafana, including dashboards, alerting, and troubleshooting.
Solid experience with monitoring and observability systems.
Production experience operating Kubernetes environments.
Experience supporting systems in both GCP and on-premises environments is mandatory.
Strong Linux systems and troubleshooting skills.
Fluent English, written and spoken.
Ability to work in PST time zone.
Ability to participate in an on-call rotation that includes one weekend day.
Weekend on-call time is compensated with one day off during the week.
Experience supporting application teams during SEV incidents is preferred.
Knowledge of capacity planning and performance tuning is preferred.
Scripting skills such as Python or Bash are preferred.
Experience with hybrid infrastructure environments is preferred.
Experience with Prometheus, logging platforms, PagerDuty, ServiceNow, Slack, networking, and infrastructure monitoring tools is relevant to the role.

Benefits

Stable, long-term contract with opportunities for career growth.
Private health insurance.
Remote-friendly culture that supports work-life balance.
Continuous training, mentorship, and learning programs.
Free access to AI training resources and AI tools.
Flexible paid time off policy plus paid holiday days.
Challenging software projects for clients in the US and LatAm.
Collaboration with talented engineers across Latin America and the US in a diverse work environment.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer 2 (Azure)

PhonePe 5K-10K Capital Markets

PhonePe Limited is hiring a Site Reliability Engineer to manage and scale core cloud infrastructure for a high-volume digital payments environment in India.

India Full-time Senior Site Reliability Engineer (SRE)

Ansible Azure Bash DNS Docker Go Grafana HAProxy InfluxDB Java Linux MySQL Nginx Prometheus Python RabbitMQ SaltStack Terraform Ubuntu

18 hours, 10 minutes ago

Apply

18 hours, 10 minutes ago

Sr. Control System Engineer/Site Reliability Engineer (SRE)

QuEra Computing 11-50 Internet Software & Services

QuEra is seeking a Sr. Control System Engineer/Site Reliability Engineer to integrate and maintain the hardware and software systems that support its quantum control stack and keep development and production environments reliable.

Japan Full-time Senior Site Reliability Engineer (SRE)

Ansible Bash CI/CD Debian DHCP DNS Docker ELK Stack Embedded Systems Git GitLab CI Go Grafana Jenkins Kubernetes Linux Prometheus Python TCP/IP Terraform Ubuntu

18 hours, 40 minutes ago

Apply

18 hours, 40 minutes ago

Incident Commander

PENN Entertainment 10K-50K Hotels, Restaurants & Leisure

PENN Interactive is hiring an Incident Commander to join its site reliability team and lead cross-functional incident response for its online and physical platforms.

Canada Full-time Senior Site Reliability Engineer (SRE)

$67k-$102k

Ansible AWS Docker Elasticsearch GCP Helm JIRA Kafka Kubernetes Linux MySQL PostgreSQL Prometheus Python Redis Terraform

18 hours, 55 minutes ago

Apply

18 hours, 55 minutes ago

Site Reliability Engineer

VantageScore 11-50 Banks

Site Reliability Engineer at a growing engineering team, focused on DevSecOps for maintaining the reliability, security, and compliance of cloud infrastructure, APIs, and software supply chains.

United States Full-time Senior Site Reliability Engineer (SRE)

$150k-$150k

Agile AWS AWS CDK Bash CI/CD CloudFormation CodePipeline Datadog DevSecOps Docker EC2 GitHub Actions Grafana HashiCorp Vault Kong Kubernetes Microservices Python REST API Scrum Terraform

1 day, 18 hours ago

Apply

1 day, 18 hours ago

Devsu

Tags

Links

Senior Site Reliability Engineer (SRE) - (GCP)

Devsu

Description

Requirements

Benefits

Similar Roles

Site Reliability Engineer 2 (Azure)

Sr. Control System Engineer/Site Reliability Engineer (SRE)

Incident Commander

Site Reliability Engineer

You're on a roll! Sign up now to keep applying.