Service Availability DevOps Engineer

  • Full Time
  • Nairobi

Safaricom Kenya

Job Description

Reporting to the Engineering Lead – Service Availability, the position holder will be tasked with monitoring & Observability and improving the operational aspects of all systems in scope within DIT. Drive automation and Dev-ops across the different domains. Foster service monitoring through proactive initiatives like AIOPs, machine learning among other available channels.

Responsibilities

  • Proactively building and implementing monitoring services, including end to end monitoring, scripting and automation, modern tooling and maintenance software.
  • Use of AI and Machine learning to perform log analysis and create predictive models that will assist in identifying potential failures.
  • Developing and executing automation scripts and maintenance jobs.
  • Developing automation around monitoring.
  • Onboarding DIT systems to the service monitoring tools (APMs like ELK).
  • Clearly document any monitoring gaps noted and collaborate with the relevant teams to ensure timely closure.
  • Performance of Applications error analysis and follow-up to ensure optimal customer experience.
  • Deployment of planned & operational changes on systems in scope.
  • Support all Digital squads to ensure new products are monitored.
  • Support in Zero touch Operations initiatives.
  • Support in development of collectors and agents

Qualifications

  • Bachelor’s Degree in either Computer Science or Information Technology, Electrical and communication engineering or Business Information Systems or in a relevant field in telecommunication.
  • Domain knowledge in at least 2 of the following areas , Sysadmin especially Linux, Orchestration (Kubernetes), Linux Kernel, Open telemetry.
  • Good understanding of back-end programming such us Python & RUST
  • Technical understanding of SRE concepts & DevOps Practices with respect to providing stable services to customers and adhering to availability KPIs, Service Level Objectives,
  • Service Level Indicators & conforming to target monthly error budget.
  • Be well versed with one or more modern monitoring tools such as ELK, Prometheus, Dynatrace, AppDynamics, New Relic, Splunk etc.
  • Good understanding of the micro service architecture & appreciation of the traditional/classic SOA
  • Ability to manage a team having leadership skills, ownership of issues been analytical and a problem solver.Being able to implement strict change management policy.
  • Conversant with agile ways of working.

To apply for this job please visit egjd.fa.us6.oraclecloud.com.