Alpaca

Alpaca

Alpaca is a developer-first API for stock and crypto trading, offering easy-to-use APIs for building apps and trading algorithms.

Capital Markets
51-250
Founded 2015
$87M raised

Description

  • Operate production systems day to day, including on-call support, incident response, postmortems, and follow-up remediation.
  • Define and refine reliability practices, including SLIs, SLOs, and error budgets.
  • Improve observability across metrics, logs, traces, and alerting.
  • Ship infrastructure as code through a GitOps workflow for cloud resources and Kubernetes workloads.
  • Support and improve PostgreSQL performance, schema and migration review, online migrations, high availability, disaster recovery, and CDC pipelines.
  • Mentor engineers on reliability and database fundamentals through code review, design review, and pairing.
  • Collaborate with product teams to help ensure services operate within reliability objectives.

Requirements

  • 4+ years of experience in SRE, DevOps, Platform/Infrastructure, or backend engineering with significant production operations ownership.
  • Hands-on experience operating production services on Kubernetes.
  • Experience shipping infrastructure as code in a GitOps workflow.
  • Solid working knowledge of PostgreSQL in production, including query plans, pg_stat_* views, indexing, schema trade-offs, and safe online migrations on non-trivial tables.
  • Cloud networking fundamentals, including VPCs, routing, L4/L7 load balancing, DNS, and TLS.
  • Comfort debugging cross-service connectivity issues.
  • Comfortable with a modern observability stack and proficient with Linux at the operator level.
  • Experience with incident response, structured debugging, and postmortems that drive change.
  • Working proficiency in Go or Python, along with strong written and verbal communication skills.
  • Genuine interest in databases and willingness to grow PostgreSQL/DBA expertise.
  • Deeper PostgreSQL experience with large clusters at OLTP load, connection pooling at scale, HA/DR ownership, or CDC pipelines is preferred.
  • Experience with typed SQL access layers in Go, such as pgx, gorm, or sqlc, is preferred.
  • Production experience with messaging systems at scale, such as RabbitMQ, Kafka, or Redpanda, is preferred.
  • Security and compliance experience in a regulated environment, including SOC 2, secrets management, and audit logging, is preferred.
  • Familiarity with trading, brokerage, or other regulated fintech domains is preferred.

Benefits

  • Competitive salary with stock options.
  • Health benefits.
  • One-time USD $500 new hire home-office setup stipend.
  • Monthly USD $150 stipend via a Brex Card.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

DevOps - SRE Engineer - Argentina

Coderio 51-250 Internet Software & Services

Coderio is hiring a remote DevOps/SRE Engineer in Argentina to ensure the stability, scalability, and efficient operation of the infrastructure supporting its digital platforms.

Argo CD Flux GitHub Actions Helm Jenkins Kubernetes OpenShift Terraform
1 hour ago

Développeuse ou développeur en fiabilité de production / Production Reliability Engineer

Unity 5K-10K Internet Software & Services

Unity is hiring an experienced engineer to help own a shared internal platform that enables hundreds of developers to build, deploy, and operate services across the company.

Argo CD AWS Azure CI/CD DNS Docker GCP GitHub Actions Go HashiCorp Vault Helm Kubernetes Node.js Python Secrets Management Terraform
1 hour, 10 minutes ago

Senior Site Reliability Engineer

Anduril Industries 1K-5K Aerospace & Defense

Anduril Industries is hiring a Site Reliability Engineer for its Mission Autonomy team to support the reliability and operational excellence of autonomous systems used across cloud, hardware-in-the-loop, and air-gapped environments.

Ansible AWS Azure DNS Docker GCP Go HTTP Kubernetes Linux Load Balancing Puppet Python Splunk TCP/IP Terraform
10 hours, 19 minutes ago

Senior Site Reliability Engineer (SRE) - (GCP)

Devsu 51-250 Internet Software & Services

Devsu is hiring a Site Reliability Engineer to own monitoring, observability, and reliability operations for systems running across on-premises infrastructure and Google Cloud Platform, with backup support for application incidents when needed.

Bash GCP Grafana Kubernetes Linux PagerDuty Prometheus Python
13 hours, 58 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers