Staff Site Reliability Engineer, Database

2 hours, 13 minutes ago
Full-time
Senior
DevOps and Infrastructure
Alpaca

Alpaca

Alpaca is a developer-first API for stock and crypto trading, offering easy-to-use APIs for building apps and trading algorithms.

Capital Markets
51-250
Founded 2015
$87M raised

Description

  • Triage difficult technical problems and implement effective solutions.
  • Improve the observability stack, including monitoring, logging, and profiling.
  • Respond to incidents promptly and lead post-incident reviews to drive improvements.
  • Work with development teams to design new features and services for reliability and scalability.
  • Monitor system capacity and performance, then recommend and implement changes to support future growth.

Requirements

  • 5+ years of experience in Site Reliability Engineering, Performance Engineering, or a similar role.
  • 5+ years of experience with multi-terabyte PostgreSQL clusters.
  • Proven experience managing and maintaining large-scale, high-availability, high-performance PostgreSQL databases.
  • Experience designing and implementing SLIs, SLOs, and SLAs for internal systems and databases.
  • Experience troubleshooting PostgreSQL performance issues and slow queries.
  • Extensive experience with efficient schema design and efficient query design.
  • Experience migrating multi-terabyte tables into more efficient schemas.
  • Proficiency with Go.
  • Proficiency with Prometheus and Linux.
  • Knowledge of trading or fintech domains, low-latency systems, distributed tracing, and PostgreSQL tooling such as pgx, gorm, or sqlc.

Benefits

  • Competitive salary with stock options.
  • Health benefits.
  • One-time USD $500 new hire home-office setup stipend.
  • Monthly USD $150 stipend via a Brex Card.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Site Reliability Engineer, Fabric

MongoDB 1K-5K Internet Software & Services

MongoDB is hiring a Site Reliability Engineer for its Fabric team to build and operate the multi-cloud network infrastructure that enables secure, reliable communication between services and the public internet.

AWS Azure DNS GCP Kubernetes Load Balancing TCP/IP TLS
2 hours, 13 minutes ago

Senior Site Reliability Engineer I

Sumo Logic 251-1K Internet Software & Services

Sumo Logic is hiring a Senior Site Reliability Engineer I in San Jose, Costa Rica (remote) to own the availability and operational excellence of its planet-scale observability and security products.

Ansible AWS CI/CD Go Java Jenkins Kafka Kanban Kubernetes Linux Microservices Python Scala Scrum Terraform
2 hours, 26 minutes ago

Assoc, Protocol Engineer (Chainlink)

Galaxy 251-1K Capital Markets

Galaxy is hiring an experienced Protocol, DevOps, or SRE Engineer to help build and operate secure blockchain infrastructure supporting its digital assets platform and custody offerings.

AWS Azure Bash Blockchain C C++ Datadog Docker ELK Stack Encryption Ethereum GCP Go Grafana Java Kubernetes Linux Network Security Perl Prometheus Python Rust Solana Terraform
2 hours, 43 minutes ago

Staff Site Reliability Engineer, Fabric

MongoDB 1K-5K Internet Software & Services

MongoDB is hiring a Site Reliability Engineer for its Fabric team to build and operate the multi-cloud networking infrastructure that keeps service-to-service communication secure, reliable, and globally connected.

AWS Azure CDN DNS GCP Kubernetes Load Balancing MongoDB TCP/IP TLS
3 hours, 58 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers