Nova: Onshore and Nearshore Engineering Solutions

Nova: Onshore and Nearshore Engineering Solutions

Nova: Onshore and Nearshore Engineering Solutions specializes in providing onshore and nearshore software development services, focusing on delivering secure, scalable, and intelligent engineering solutions in areas such as AWS, cloud engineering, and ...

Internet Software & Services

Description

  • Design, build, maintain, and scale production services and server farms across multiple data centers for complex cloud services.
  • Improve software architecture to increase scalability, service reliability, capacity, and performance.
  • Write automation code for provisioning and operating infrastructure at massive scale.
  • Collaborate with development teams to ensure applications are designed for infrastructure fit, scalability, and reliability from the ground up.
  • Work with QA to build pipelines and automation for deploying applications to production.
  • Troubleshoot incidents, test hypotheses, and identify root causes for system failures and service issues.
  • Write postmortem reviews and remediation recommendations after incidents.
  • Monitor system alerts, identify bad trends early, and respond to incidents to restore normal operations.
  • Author and maintain high-quality documentation for specifications, systems, and procedures.
  • Support and comply with the company’s Quality Management System policies and procedures.

Requirements

  • Bachelor’s degree, or equivalent, in computer science or a related discipline.
  • Knowledge of infrastructure-as-code tools such as Terraform, Ansible, Puppet, or Chef.
  • Experience with Kubernetes for cluster creation and management.
  • Knowledge of cloud platforms and services including Microsoft Azure, AWS, and Google Cloud.
  • Understanding of Azure services, virtual machines in Azure, and virtual network configuration.
  • Knowledge of cloud architecture and design patterns such as IaaS, PaaS, and SaaS.
  • Knowledge of CI/CD practices and scripting.
  • Scripting knowledge with PowerShell.
  • Ability to program in one or more high-level languages such as Python, Java, C/C++, Ruby, or JavaScript.
  • Experience with distributed storage technologies such as NFS, HDFS, Ceph, or Amazon S3, and dynamic resource management frameworks such as Apache Mesos, Kubernetes, or Yarn.
  • Proactive approach to identifying problems, performance bottlenecks, and areas for improvement.

Benefits

  • Base salary and permanent contract directly with the company.
  • Continuous training plan with paid certifications.
  • Career plan aligned with your development and knowledge.
  • Benefits above the law, including 12 days of paid time off.
  • 30-day Christmas bonus.
  • Medical insurance, life insurance, and savings fund.
  • Groceries bonus and quarterly performance bonus.
  • Computer equipment provided for work.
  • Optional 100% home office arrangement.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer - Backstage

Spotify Media

Site Reliability Engineer for Spotify’s Backstage team in New York City, focused on building and operating cloud infrastructure for an external developer portal and internal AI-driven coding workflows.

AWS GCP Go Java LLM Microservices Python React Terraform TypeScript
18 minutes ago

Blockchain Site Reliability Engineer

InfStones 51-250 Internet Software & Services

InfStones is hiring a remote Blockchain Site Reliability Engineer in Dallas to ensure the reliability, availability, and performance of its blockchain node infrastructure.

Docker Ethereum Go Grafana JavaScript Kubernetes Linux Prometheus Python Rust Solana
1 hour, 3 minutes ago

Lead Engineer - Platform Performance & Reliability

HighLevel 251-1K Internet Software & Services

HighLevel is hiring a Lead Engineer for its Platform Performance & Reliability team to improve the speed, stability, and operational health of a high-traffic global SaaS platform.

AWS ClickHouse Firestore GCP Grafana Kubernetes Microservices MongoDB MySQL Node.js OpenTelemetry PostgreSQL Prometheus Redis
1 hour, 48 minutes ago

Senior Cluster Site Reliability Engineer

The Voleon Group 51-250 Capital Markets

Senior Cluster Site Reliability Engineer at Voleon, responsible for scaling and operating the company’s research compute cluster that supports machine learning research and investment management workloads across on-prem and cloud environments.

Ansible Apache Airflow Apache Spark AWS Docker GCP Grafana Kubeflow Kubernetes Machine Learning OpenTelemetry Podman Prometheus Python PyTorch Ruby TensorFlow Terraform
2 hours, 3 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers