AlphaSense

AlphaSense

AlphaSense develops an artificial intelligence-based search platform that enables investment and corporate professionals to quickly access and analyze extensive financial data and market insights from over 500 million documents, enhancing decision-maki...

Internet Software & Services
251-1K
Founded 2011
$770M raised

Description

  • Architect reliability frameworks and self-service tooling that enable teams to own the reliability of their services.
  • Drive AIOps initiatives to automate diagnostics, remediation, and proactive failure prevention.
  • Embed SRE practices across engineering through design reviews, production readiness, and operational standards.
  • Serve as Incident Commander during critical incidents and ensure blameless postmortems lead to durable improvements.
  • Deliver end-to-end monitoring, tracing, and profiling to improve system performance proactively.
  • Mentor engineers across SRE and product teams through technical guidance and knowledge sharing.
  • Influence architectural decisions and set the technical bar for reliability across the organization.
  • Lead by example in incident response and help scale a “You Build It, You Run It” culture.

Requirements

  • 8+ years of experience in Site Reliability Engineering, DevOps, or a similar role.
  • At least 3+ years of experience in a Senior+ SRE position.
  • Experience running production SaaS systems at scale.
  • Proficiency in at least one programming or scripting language such as Python or Go.
  • Hands-on experience with cloud platforms such as AWS, GCP, or Azure and Kubernetes.
  • Deep understanding of networking fundamentals, including TCP/IP, DNS, HTTP/S, and load balancing.
  • Experience with monitoring and alerting tools such as Prometheus, Grafana, Datadog, or ELK.
  • Familiarity with advanced observability tooling such as OTEL and continuous profiling.
  • Proven incident management experience, including leading high-severity incidents and postmortems.
  • Strong troubleshooting, communication, and collaboration skills.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer I

Zafin 251-1K Internet Software & Services

Zafin is hiring a Cloud Site Reliability Engineer I to support the reliability, scalability, and performance of its Azure-based cloud infrastructure and applications for enterprise banking products.

Azure Grafana OpenShift PostgreSQL PowerShell Python
1 hour, 43 minutes ago

[Job-29357] Senior Devops, Brazil

CI&T 5K-10K Internet Software & Services

CI&T is hiring a Mid/Senior DevOps/SRE in Brazil to support and evolve a scalable cloud platform, with both business-hours coverage and on-call responsibility.

Apache Airflow Argo CD AWS Bash CI/CD Datadog EC2 GitHub Actions GitLab CI GitOps Helm Kubernetes Python Snowflake Terraform
2 hours, 28 minutes ago

Senior Site Reliability Engineer

Anduril Industries 1K-5K Aerospace & Defense

Anduril Industries is hiring a Site Reliability Engineer for its Mission Autonomy team to support the reliability and operational excellence of autonomous systems used across cloud, hardware-in-the-loop, and air-gapped environments.

Ansible AWS Azure DNS Docker GCP Go HTTP Kubernetes Linux Load Balancing Puppet Python Splunk TCP/IP Terraform
6 hours, 34 minutes ago

Staff Software Engineer - Grafana Cloud k6 | Spain | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a remote, Spain-time-zone-based engineering leader for the Grafana Cloud k6 squad to strengthen reliability, operational excellence, and distributed test platform development at scale.

AWS Docker Go JavaScript K6 Kubernetes Python
12 hours, 21 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers