Alpaca

Alpaca

Alpaca is a developer-first API for stock and crypto trading, offering easy-to-use APIs for building apps and trading algorithms.

Capital Markets
51-250
Founded 2015
$87M raised

Description

  • Operate production systems day to day, including on-call support, incident response, postmortems, and follow-up remediation.
  • Define and refine reliability practices, including SLIs, SLOs, and error budgets.
  • Improve observability across metrics, logs, traces, and alerting.
  • Ship infrastructure as code through a GitOps workflow for cloud resources and Kubernetes workloads.
  • Support and improve PostgreSQL performance, schema and migration review, online migrations, high availability, disaster recovery, and CDC pipelines.
  • Mentor engineers on reliability and database fundamentals through code review, design review, and pairing.
  • Collaborate with product teams to help ensure services operate within reliability objectives.

Requirements

  • 4+ years of experience in SRE, DevOps, Platform/Infrastructure, or backend engineering with significant production operations ownership.
  • Hands-on experience operating production services on Kubernetes.
  • Experience shipping infrastructure as code in a GitOps workflow.
  • Solid working knowledge of PostgreSQL in production, including query plans, pg_stat_* views, indexing, schema trade-offs, and safe online migrations on non-trivial tables.
  • Cloud networking fundamentals, including VPCs, routing, L4/L7 load balancing, DNS, and TLS.
  • Comfort debugging cross-service connectivity issues.
  • Comfortable with a modern observability stack and proficient with Linux at the operator level.
  • Experience with incident response, structured debugging, and postmortems that drive change.
  • Working proficiency in Go or Python, along with strong written and verbal communication skills.
  • Genuine interest in databases and willingness to grow PostgreSQL/DBA expertise.
  • Deeper PostgreSQL experience with large clusters at OLTP load, connection pooling at scale, HA/DR ownership, or CDC pipelines is preferred.
  • Experience with typed SQL access layers in Go, such as pgx, gorm, or sqlc, is preferred.
  • Production experience with messaging systems at scale, such as RabbitMQ, Kafka, or Redpanda, is preferred.
  • Security and compliance experience in a regulated environment, including SOC 2, secrets management, and audit logging, is preferred.
  • Familiarity with trading, brokerage, or other regulated fintech domains is preferred.

Benefits

  • Competitive salary with stock options.
  • Health benefits.
  • One-time USD $500 new hire home-office setup stipend.
  • Monthly USD $150 stipend via a Brex Card.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Manager, Engineering

Sumo Logic 251-1K Internet Software & Services

Sumo Logic is hiring a Senior Manager, Engineering for Application Security to lead global programs that improve product security, reliability, and operational efficiency across its cloud platform.

Agile AWS C++ Docker GCP Java Kafka Kubernetes OWASP Ruby Scala SIEM
13 hours, 46 minutes ago

Staff Software Engineer - Databases SRE | Sweden | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer, SRE to improve the reliability and scalability of Grafana Cloud’s database products for high-value customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Microservices Python Terraform
1 day, 13 hours ago

Senior Site Reliability Engineer (SRE)

Oowlish 51-250 Internet Software & Services

Oowlish is hiring a Senior Site Reliability Engineer to own the reliability and operational excellence of business-critical production systems for international clients in a remote, collaborative environment.

AWS Datadog Go Heroku Kubernetes PostgreSQL Python SQL Server TypeScript
1 day, 13 hours ago

Staff Software Engineer - Databases SRE | Spain | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer - SRE to strengthen the reliability of its cloud database products for high-SLA customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Python Terraform
1 day, 13 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers