Tech Jobs for Talents without Borders
English-1st. Relocation-friendly. Curated daily by Imagine.
4,370 Jobs at 186 Companies

DevOps & Site Reliability Engineer

Rayyan Systems

Rayyan Systems

Software Engineering
Remote
Posted 6+ months ago

This is a remote position.

We are looking for a DevOps/Site Reliability Engineer (SRE) to join our team. As a DevOps/SRE, you will be responsible for the automation, deployment, maintenance, and monitoring of our web, mobile, and API applications. You will be responsible for ensuring the reliability, availability, and performance of our production systems while continuously improving the build/test/deploy process, identifying bottlenecks, and making it faster, upgrading components, solving issues, and making it more cost-effective.

Responsibilities:

  • Develop and maintain automation tools for building, testing, and deploying software applications and services.
  • Deploy all web, mobile, and API applications in production, plan their releases, ensure consistency, and follow up on testing.
  • Work closely with developers, QA, and product teams to ensure timely and high-quality releases.
  • Develop and maintain monitoring and alerting systems to ensure high availability and performance of applications and services.
  • Monitor metrics and logs from all infrastructure and app components, writing integrations if necessary, and creating dashboards to observe the production systems.
  • Create alert triggers and monitor performance for all components to identify bottlenecks and modify auto-scaling rules if necessary.
  • Upgrade infrastructure resources and respond to cloud vendor recommendations of rotating secrets, upgrading databases, and machine clusters.
  • Continuously evaluate the cost of cloud services and ensure we are not paying expenses unnecessarily.
  • Troubleshoot and resolve issues related to infrastructure, deployment, and application performance.
  • Work with third-party vendors to integrate with their services for observability, monitoring, and error reporting.
  • Develop disaster recovery plans and participate in their execution during disaster recovery events.


Requirements

  • Bachelor's degree in Computer Science or related field.
  • At least 3 years of experience in a DevOps/SRE or related role.
  • Strong experience in deploying web, mobile, and API applications in production.
  • Strong experience in monitoring and observability tools, such as NewRelic, Datadog, or Prometheus/Grafana.
  • Strong experience with CI/CD pipelines and associated tools such as Azure Pipelines, Jenkins, or CircleCI.
  • Strong experience with containerization technologies such as Docker, Kubernetes and Helm
  • Experience with cloud infrastructure such as Azure, AWS, or GCP.
  • Experience with scripting languages such as Bash.
  • Experience with incident response and disaster recovery planning.
  • Excellent communication and collaboration skills.