English
  • English
  • Dutch
  • German
Site Reliability Engineer

Description

Urgently seeking a Site Reliabilty Engineer,

Initial 3 Month contract and the position will require weekly visits to site in London.

Site Reliability Engineers (SREs) are responsible for keeping all user-facing services and other DSX production systems running smoothly. SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our environments.

As an SRE you will:

  • Be on a PagerDuty rotation to respond to availability incidents and provide support for service engineers with customer incidents.
  • Use your on-call shift to prevent incidents from happening.
  • Run our infrastructure with Terraform and Kubernetes.
  • Use monitoring and alerting to alert on symptoms not outages.
  • Document every action so that your findings turn into repeatable actions (playbooks) and then into automation.

You may be a fit for this role if you:

  • Think about systems, and particularly edge cases and failure modes.
  • Know your way around Linux and the Unix Shell.
  • Have strong programming skills--preferably Nodejs, but it could be Python, Go, .NET or even Ruby.
  • Have an urge for delivering quickly and iterating fast.
  • Have experience with Nginx, Docker, Kubernetes, Terraform.
  • Have good experience with GitHub.

Michael Bailey International is acting as an Employment Business in relation to this vacancy.

Site Reliability Engineer