Civica

Civica

Civica is a global leader in public sector software, providing digital solutions and managed services to transform customer experience and operational efficiency for over 3,000 organizations worldwide.

Internet Software & Services
1K-5K
Founded 2002

Description

  • Monitor live environments using observability tools and respond to production alerts and incidents.
  • Triage issues quickly and coordinate with SRE and Platform teams to restore service.
  • Automate environment builds, deployments, and routine operational tasks to reduce manual work.
  • Support and maintain cloud and infrastructure environments across Azure, AWS, and VMware.
  • Work with containerised workloads and contribute to scaling and performance improvements.
  • Drive root cause analysis and preventative actions following incidents.
  • Refine alerting thresholds, deployment processes, and monitoring coverage.
  • Maintain runbooks and operational documentation to ensure knowledge is shared consistently.
  • Collaborate with engineers, support, and services teams on live issue resolution and operational improvements.

Requirements

  • Experience operating production systems in cloud or hybrid environments such as Azure, AWS, or similar.
  • Familiarity with Kubernetes, containerisation, and supporting tools such as Helm and ingress controllers.
  • Basic understanding of networking and infrastructure fundamentals, including DNS, load balancing, VPNs, and firewalls.
  • Ability to troubleshoot infrastructure issues, including using packet capture tools (pcap).
  • Hands-on experience with scripting or automation using PowerShell, Bash, Go, or Python.
  • Knowledge of CI/CD pipelines and version control tools such as GitHub Actions, Azure DevOps, or Jenkins.
  • Experience with monitoring and alerting tools such as Prometheus, Grafana, DataDog, Elastic, or Azure Monitor.
  • Strong analytical and problem-solving skills with the ability to stay calm during incidents.
  • Collaborative communicator who thrives in cross-functional, fast-paced environments.

Benefits

  • 25 days of annual leave plus bank holidays, with the option to buy up to 10 extra days.
  • Up to 3 additional days off for volunteering through the Days of Difference program.
  • 5% employer pension match.
  • Income protection covering up to 75% of salary for long-term illness.
  • Life assurance equal to 4x salary as a tax-free lump sum.
  • Critical illness cover of £25,000, extendable to dependents.
  • Private medical insurance, health cash plan, and dental insurance.
  • Employee affinity groups and a referral bonus for recommending a friend.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineering Manager

RapidSOS 51-250 Diversified Telecommunication Services

RapidSOS is seeking an SRE Manager to lead its SRE Operations team and own the reliability of critical cloud infrastructure that supports real-time emergency response.

Argo CD AWS Datadog GitHub Actions Helm Kubernetes Python RabbitMQ Terraform
7 hours ago

Site Reliability Engineer

Recorded Future 251-1K Professional Services

Recorded Future is hiring a Site Reliability Engineer to strengthen the reliability, scalability, and performance of its critical cloud systems in close partnership with engineering teams.

AWS Chef Elasticsearch ELK Stack Grafana Kafka Kibana Kubernetes Linux Logstash Microservices MongoDB OpenTelemetry Prometheus RabbitMQ Terraform
7 hours, 45 minutes ago

Senior Site Reliability Engineer (Remote - Brazil)

Loadsmart 251-1K Air Freight & Logistics

Loadsmart is hiring a Senior Site Reliability Engineer in Brazil to build and maintain its internal platform and ensure the reliability, safety, and operational excellence of critical engineering systems.

Ansible AWS Bash Chef CI/CD Docker Kubernetes PostgreSQL Python Terraform
7 hours, 45 minutes ago

Site Reliability Engineer

Alpaca 51-250 Capital Markets

Alpaca is hiring a Site Reliability Engineer to keep its brokerage platform reliable and operable across cloud, Kubernetes, observability, messaging, and database systems, with a strong focus on PostgreSQL reliability on the trading-critical path.

DNS GitOps Go Kafka Kubernetes Linux Load Balancing PostgreSQL Python RabbitMQ Secrets Management TLS
11 hours, 5 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers