Tech Jobs for Talents without Borders
English-1st. Relocation-friendly. Curated daily by Imagine.
4,634 Jobs at 192 Companies

DevOps Developer



Software Engineering
United States
Posted on Monday, September 25, 2023
At IBM, work is more than a job – it’s a calling: To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate. Not just to do something better, but to attempt things you’ve never thought possible. Are you ready to lead in this new era of technology and solve some of the world’s most challenging problems? If so, lets talk.

Your Role and Responsibilities

The developer and Site Reliability Engineer (SRE) teams both care about reliability, availability, performance, scalability, efficiency, and feature and launch velocity. However, SRE’s operate under different incentives, mainly favoring service long-term viability over new feature launches. SRE’s are responsible for ensuring services are resilient, responsive and have an up time appropriate to customer’s needs whilst controlling capacity and performance. Additionally, improving these services in a highly dynamic environment.
In summary, SRE is an engineering discipline that combines software, infrastructure and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. Day-to-day, SRE’s use automation to limit time spent on operational work and proactively identify potential risk factors and convert them into actionable improvements.
• Build automation to reduce toil and engineer solutions to reliability
• Take ownership of the monitoring of applications, services, and infrastructure
• Ensure consistent and thorough observability and monitoring across all environments (development, beta, production)
• Work closely with development teams to capture meaningful and detailed heuristics to measure the health of each application
• Design and implement monitoring checks for new services prior to launch
• Apply continuous improvement to removing noise from alerting systems
• Work with others across the team (Developers, DevOps Engineers, Sys Admins and the Release Manager) during software releases
• Champion the testability of the monitoring system

Required Technical and Professional Expertise

• Background in software engineering (projects and experience in Javascript, C#, Java, Go)
• Experience automating problems or tasks to reduce toil (Powershell, shell, python etc.)
• Knowledge of building and using observability, defining metrics or measures and dashboards, use of observability tools (Sysdig, Kibana, Prometheus, Grafana, Zabbix)
• Experience with a logging and analytics framework (Splunk, LogDNA, or ELK stack)
• System design knowledge (cloud-native architectures, best practices for availability and resiliency, practices and methods for problem isolation)
• Experience with pipeline tools for deploying and managing applications (Travis, Jenkins)
• Confident with infrastructure-as-code tools (Ansible, Terraform, Blueprints)
• Confident with source control (Github, perforce)
• Experience with cloud services and platforms (IBM Cloud, AWS, GCP, MS Azure)
• General Linux knowledge
• Network and security knowledge
• Happy working using Agile practices, and JIRA

Preferred Technical and Professional Expertise

Familiarity with React or similar UI frameworks
• 1.5+ years hands on industry experience building Kubernetes operators
• 2+ years years hands on industry experience working with Kubernetes
• 1.5+ years hands on industry experience with Golang or similar
• 2+ years hands on industry experience with Node.JS or similar
• 2+ years hands on industry experience with cloud platforms (e.g. IBM Cloud, AWS, GCP)
• 2+ years hands on industry experience using CI/CD processes
• 2+ years hands on industry experience using Git & GitHub, or similar
• Openshift experience
• Experience standing up and monitoring cloud services