AlphaSense

AlphaSense develops an artificial intelligence-based search platform that enables investment and corporate professionals to quickly access and analyze extensive financial data and market insights from over 500 million documents, enhancing decision-maki...

Internet Software & Services

Information Technology

251-1K (1000)

Founded 2011

$770M raised

13 open positions

Links

View All Jobs

Staff Site Reliability Engineer

2 months, 2 weeks ago

India

Full-time

Lead

Site Reliability Engineer (SRE)

DevOps and Infrastructure

AWS Azure Datadog DNS GCP Go Grafana Kubernetes Load Balancing OpenTelemetry Prometheus Python TCP/IP

Apply Now

AlphaSense

Internet Software & Services

251-1K

Founded 2011

$770M raised

View All Jobs 13

Description

Architect reliability frameworks and self-service tooling that enable teams to own the reliability of their services.
Drive AIOps initiatives to automate diagnostics, remediation, and proactive failure prevention.
Embed SRE practices across engineering through design reviews, production readiness, and operational standards.
Serve as Incident Commander during critical incidents and ensure blameless postmortems lead to durable improvements.
Deliver end-to-end monitoring, tracing, and profiling to improve system performance proactively.
Mentor engineers across SRE and product teams through technical guidance and knowledge sharing.
Influence architectural decisions and set the technical bar for reliability across the organization.
Lead by example in incident response and help scale a “You Build It, You Run It” culture.

Requirements

8+ years of experience in Site Reliability Engineering, DevOps, or a similar role.
At least 3+ years of experience in a Senior+ SRE position.
Experience running production SaaS systems at scale.
Proficiency in at least one programming or scripting language such as Python or Go.
Hands-on experience with cloud platforms such as AWS, GCP, or Azure and Kubernetes.
Deep understanding of networking fundamentals, including TCP/IP, DNS, HTTP/S, and load balancing.
Experience with monitoring and alerting tools such as Prometheus, Grafana, Datadog, or ELK.
Familiarity with advanced observability tooling such as OTEL and continuous profiling.
Proven incident management experience, including leading high-severity incidents and postmortems.
Strong troubleshooting, communication, and collaboration skills.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Platform Engineering Manager

Prolific 51-250 Professional Services

Prolific is hiring a Platform Engineering Manager to lead its Cloud Platform and SRE teams, owning the technical foundation, reliability, and scalability of the infrastructure that supports its AI-focused human data platform.

United Kingdom Full-time Lead Platform Engineer Site Reliability Engineer (SRE)

Argo CD AWS Celery CircleCI Datadog DynamoDB Elasticsearch GCP GitHub Actions GitOps JavaScript Kubernetes MongoDB PostgreSQL Python Serverless Terraform TypeScript

2 hours, 6 minutes ago

Apply

2 hours, 6 minutes ago

Senior Site Reliability Engineer

Develocity

Develocity is hiring founding Site Reliability Engineers to help build and operate its remote-first SaaS platform, ensuring reliability, performance, and availability for customer-facing and supporting services.

Europe Full-time Senior Site Reliability Engineer (SRE)

AWS Bash EC2 Grafana Java JUnit Kotlin Kubernetes Prometheus Python Spring Terraform

3 hours, 6 minutes ago

Apply

3 hours, 6 minutes ago