Margo Bank

Margo Bank

Unlock excellence with MARGO Consulting: where ambition, expertise, and innovation drive the most complex tech challenges.

Professional Services
$8M raised

Description

  • Build and support large-scale AI infrastructure with monitoring, diagnosis, and remediation of production incidents.
  • Troubleshoot high-impact production issues in collaboration with other engineering teams.
  • Participate in an on-call rotation to handle incidents and ensure service continuity.
  • Implement and maintain observability solutions to monitor AI infrastructure and application health.
  • Contribute to AI infrastructure lifecycle management across different environments and countries.
  • Promote and apply best practices for stability, resiliency, scalability, and security.
  • Maintain clear technical documentation for tools and procedures.
  • Contribute to the evolution of systems and tools based on production feedback.
  • Collaborate closely with development teams to ensure infrastructure readiness.
  • Participate in team rituals and knowledge-sharing initiatives.

Requirements

  • Experience with Go or Python.
  • Strong scripting skills in Bash and Python.
  • Hands-on experience with Linux systems, especially Ubuntu/Debian.
  • Preferred hands-on experience with GPU and HPC infrastructure.
  • Knowledge of networking concepts such as VLAN/LAN, TCP/IP, DNS, BGP, load-balancing, and IPv6.
  • Familiarity with monitoring and logging tools such as Prometheus, Grafana, and Elastic.
  • Comfort with Infrastructure-as-Code tools such as Ansible, Salt, and AWX.
  • Experience managing relational databases, especially MariaDB.
  • Understanding of CI/CD pipelines, especially GitLab.
  • Comfortable communicating in English, both written and spoken.
  • Proactive and solution-oriented mindset.
  • Passion for automation and continuous improvement.
  • Strong collaboration and communication skills.
  • Ability to work independently and as part of a team.
  • Willingness to mentor others and share knowledge.

Benefits

  • Remote work arrangement.
  • Permanent contract or B2B contract option.
  • Hourly rate of 200 zł - 250 zł.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Manager, Engineering

Sumo Logic 251-1K Internet Software & Services

Sumo Logic is hiring a Senior Manager, Engineering for Application Security to lead global programs that improve product security, reliability, and operational efficiency across its cloud platform.

Agile AWS C++ Docker GCP Java Kafka Kubernetes OWASP Ruby Scala SIEM
14 hours, 45 minutes ago

Staff Software Engineer - Databases SRE | Sweden | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer, SRE to improve the reliability and scalability of Grafana Cloud’s database products for high-value customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Microservices Python Terraform
1 day, 14 hours ago

Senior Site Reliability Engineer (SRE)

Oowlish 51-250 Internet Software & Services

Oowlish is hiring a Senior Site Reliability Engineer to own the reliability and operational excellence of business-critical production systems for international clients in a remote, collaborative environment.

AWS Datadog Go Heroku Kubernetes PostgreSQL Python SQL Server TypeScript
1 day, 14 hours ago

Staff Software Engineer - Databases SRE | Spain | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer - SRE to strengthen the reliability of its cloud database products for high-SLA customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Python Terraform
1 day, 14 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers