Dev.Pro

Dev.Pro

Dev.Pro is a globally distributed software development partner, specializing in custom outsourced software development for innovative technology companies to scale their businesses efficiently.

Internet Software & Services
251-1K
Founded 2011

Description

  • Provide first-line operational support for a cloud-based production environment and respond to incidents promptly.
  • Monitor systems, troubleshoot production issues, and apply corrective actions to restore service.
  • Work with engineering teams on bug fixes, hotfixes, and escalations.
  • Administer MDM solutions and support remote software deployments.
  • Implement automated monitoring and alerting to improve incident detection and response.
  • Document operational processes, maintain knowledge bases, and create incident runbooks.
  • Participate in an on-call rotation to provide 24/7 critical incident coverage.
  • Contribute to post-incident reviews and improvements to monitoring, response, and resolution processes.
  • Build Node.js/TypeScript utilities to automate workflows, parse logs and JSON, and validate API payloads.
  • Troubleshoot REST/GraphQL integrations, analyze request/response traces, and support third-party API integrations.
  • Analyze system and application logs and telemetry to resolve issues.
  • Manage and administer system access.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related field.
  • 3+ years of experience supporting production systems, with a focus on incident response and resolution.
  • Strong experience in operational support or SRE roles in cloud environments.
  • Proficiency in Node.js, including debugging, error handling, and performance troubleshooting.
  • Experience with AWS, Azure, or GCP and monitoring/troubleshooting cloud-native applications.
  • Experience working with APIs and integrations.
  • Familiarity with logging and monitoring tools such as Winston, Bunyan, Datadog, ELK Stack, and CloudWatch.
  • Experience with CI/CD pipelines and automated deployments using Jenkins, GitLab CI, or AWS CodePipeline.
  • Strong problem-solving skills in high-pressure, time-sensitive situations.
  • Strong communication skills for structured incident reporting and documentation.
  • Effective cross-functional collaboration with development, DevOps, and product teams.
  • Upper-Intermediate+ English level.
  • Desirable: experience with containerization tools such as Docker and Kubernetes.
  • Desirable: knowledge of REST APIs, WebSockets, and microservices architecture.
  • Desirable: familiarity with incident management frameworks such as ITIL and SRE practices.
  • Desirable: understanding of cloud security best practices.
  • Desirable: experience with mobile POS platforms or mobile application environments.
  • Desirable: familiarity with mobile device management (MDM) solutions.

Benefits

  • 99.9% remote work with the ability to work from anywhere in the world.
  • 30 paid days off per year for vacations, holidays, or personal time.
  • 5 paid sick days, up to 60 days of medical leave, and up to 6 paid days off for major family events.
  • Partially covered health insurance after the probation period.
  • Wellness bonus for gym memberships, sports nutrition, and similar needs after 6 months.
  • Salary paid in U.S. dollars.
  • Approved overtime fully covered.
  • Access to English lessons, Dev.Pro University programs, and online team-building activities.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer (Senior or Staff), Atlas

MongoDB 1K-5K Internet Software & Services

MongoDB is hiring a Senior Site Reliability Engineer for its Atlas team to help support, maintain, and grow a multi-cloud platform for customer-facing production workloads.

AWS Azure DNS GCP Go HTTP Linux Python Ruby TLS
4 hours, 4 minutes ago

Manager, Software Engineering (Resilience Engineering)

Affirm 1K-5K Diversified Financial Services

Affirm is seeking an Engineering Manager to lead its Resilience Engineering team, building production load testing and chaos engineering capabilities that improve the safety and reliability of production systems.

AWS Java Kotlin Kubernetes Microservices Python
4 hours, 13 minutes ago

Site Reliability Engineer (Senior or Staff), Storage Layer Services (SLS)

MongoDB 1K-5K Internet Software & Services

MongoDB’s Storage Layer Services team is hiring a Site Reliability Engineer to help re-architect the cloud storage layer for Atlas and ensure the reliability and operational safety of its distributed storage infrastructure.

AWS Azure DNS GCP Go Kubernetes Linux Python TCP/IP TLS
5 hours, 1 minute ago

Manager, Software Engineering (Resilience Engineering)

Affirm 1K-5K Diversified Financial Services

Affirm is hiring an Engineering Manager to lead its Resilience Engineering team in building production load testing and chaos engineering capabilities that improve the safety and reliability of its production systems.

AWS Java Kotlin Kubernetes Python
7 hours, 17 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers