Staff Site Reliability Engineer, Database

4 weeks, 1 day ago
Full-time
Senior
DevOps and Infrastructure
Alpaca

Alpaca

Alpaca is a developer-first API for stock and crypto trading, offering easy-to-use APIs for building apps and trading algorithms.

Capital Markets
51-250
Founded 2015
$87M raised

Description

  • Triage difficult technical problems and implement effective solutions.
  • Improve the observability stack, including monitoring, logging, and profiling.
  • Respond to incidents promptly and lead post-incident reviews to drive improvements.
  • Work with development teams to design new features and services for reliability and scalability.
  • Monitor system capacity and performance, then recommend and implement changes to support future growth.

Requirements

  • 5+ years of experience in Site Reliability Engineering, Performance Engineering, or a similar role.
  • 5+ years of experience with multi-terabyte PostgreSQL clusters.
  • Proven experience managing and maintaining large-scale, high-availability, high-performance PostgreSQL databases.
  • Experience designing and implementing SLIs, SLOs, and SLAs for internal systems and databases.
  • Experience troubleshooting PostgreSQL performance issues and slow queries.
  • Extensive experience with efficient schema design and efficient query design.
  • Experience migrating multi-terabyte tables into more efficient schemas.
  • Proficiency with Go.
  • Proficiency with Prometheus and Linux.
  • Knowledge of trading or fintech domains, low-latency systems, distributed tracing, and PostgreSQL tooling such as pgx, gorm, or sqlc.

Benefits

  • Competitive salary with stock options.
  • Health benefits.
  • One-time USD $500 new hire home-office setup stipend.
  • Monthly USD $150 stipend via a Brex Card.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior AIOps Engineer, Incident Response [Remote-US]

Quanata 201-500 information technology & services

Quanata is hiring an experienced production operations and reliability leader to oversee production health, incident response, and operational support for its AI-driven insurance technology platform.

AWS Confluence JIRA
1 hour, 45 minutes ago

Senior Site Reliability Engineer

Cribl 251-1K IT Services

Cribl is hiring a Senior Site Reliability Engineer in Poland to help build and operate the telemetry infrastructure and observability platform that supports its cloud products and enterprise customers.

Ansible AWS Azure CI/CD Grafana JavaScript Kibana Linux New Relic Node.js PagerDuty Prometheus Splunk Terraform TypeScript
10 hours, 38 minutes ago

Site Reliability Engineer II

Backblaze 251-1K IT Services

Backblaze is hiring a Site Reliability Engineer II to support the stability, scalability, and reliability of customer-facing cloud storage services and the infrastructure that powers them.

Ansible AWS Azure Bash CI/CD Docker GCP Go Grafana Jenkins Kubernetes Linux Microservices Prometheus Python Terraform
13 hours, 45 minutes ago

DevOps & Site Reliability Engineer

Oowlish 51-250 Internet Software & Services

Oowlish is hiring a DevOps & Site Reliability Engineer for a remote role supporting an AI-focused SaaS startup’s infrastructure, deployment, and reliability needs.

AWS Azure Azure Pipelines Bash CI/CD CircleCI Datadog Docker GCP Grafana Helm Jenkins Kubernetes New Relic Prometheus
14 hours, 16 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers