Ansible reboot Debian/Ubuntu systems in sequence (update 08/18/2018)

About 18 months ago I was working on a project that required reasonable uptime (100% of course) but I need to install updates so I needed to do automated deployment, updates, and then rebooting in a sequence that didn’t break down the cluster I had built.

This Ansible playbook was the result.

Line 43 of the included example hosts file is what really concerns us for the sequence as it is a group of the 4 cluster groups (clstr-servers-all).

This playbook is generic in that it will reboot any/all hosts that are not included in the mentioned group.

Any hosts included in the mentioned group are singly rebooted and the playbook waits until the SSH port is available before moving on to the next host.

By using the ‘-l’ flag with ansible-playbook, you can limit which hosts to affect:

For example: “ansible-playbook -l clstr-servers-all reboot-servers.yml

Or, after you have run updates you can reboot all the servers via

ansible-playbook -l all reboot-servers.yml

and rest assured that while you are rebooting your infrastructure your sequenced servers are singly operated on and are online before initiating the next server reboot.

---

# reboot things normally, no pausing, no checking
#
# add your exclusions here as well
#
- hosts: all:!clstr-servers-all
  gather_facts: false
  become: yes
  tasks:
    - name: reboot | any/all systems but not sequenced
      command: /sbin/reboot

# add the excluded items from above here to ease the
# sequenced rebooting. Wait for SSH availability then
# sleep for 30 seconds to make sure processes have settled.
#
- hosts: clstr-servers-all
  gather_facts: false
  become: yes
  serial: 1
  ignore_errors: True
  tasks:
   - name: reboot | {{ inventory_hostname }} check reboot need
     shell: "[ -f /var/run/reboot-required ]"
     failed_when: False
     register: reboot_required
     changed_when: reboot_required.rc == 0
   - name: reboot | {{ inventory_hostname }} initiate
     ignore_errors: True
     shell: "sleep 5 && /sbin/reboot"
     async: 1
     when: reboot_required.rc == 0
   - name: reboot | {{ inventory_hostname }} sleeping
     pause: >
       seconds=10
     when: reboot_required.rc == 0
   - name: reboot | {{ inventory_hostname }} polling
     local_action: >
       wait_for host={{ inventory_hostname }}
       state=started
       port=22
       delay=15
     become: false
     when: reboot_required.rc == 0
   - name: reboot | {{ inventory_hostname }} settling
     pause: >
       seconds=15
     when: reboot_required.rc == 0
# cluster #1, local
[clstr01-local]
clstr01-01.local
clstr01-02.local
clstr01-03.local

# cluster #2, local
[clstr02-local]
clstr02-01.local
clstr02-02.local
clstr02-03.local

# group: cluster, local
[clstr-local:children]
clstr01-local
clstr02-local

# group: cluster, local, variables
[clstr-local:vars]
ansible_ssh_user=ansible
ansible_ssh_private_key_file=~/.ssh/ansible

[clstr01-useast]
172.27.18.5
172.27.19.5
172.27.20.5

[clstr02-useast]
172.27.28.5
172.27.29.5
172.27.30.5

[clstr-useast:children]
clstr01-useast
clstr02-useast

[clstr-useast:vars]
ansible_ssh_user=ubuntu
ansible_ssh_private_key_file=~/.ssh/cluster-useast.pem

# ... you get the drift

[clstr-servers-all:children]
clstr-local
clstr-useast