About 18 months ago I was working on a project that required reasonable uptime (100% of course) but I need to install updates so I needed to do automated deployment, updates, and then rebooting in a sequence that didn’t break down the cluster I had built.
This Ansible playbook was the result.
Line 43 of the included example hosts file is what really concerns us for the sequence as it is a group of the 4 cluster groups (clstr-servers-all).
This playbook is generic in that it will reboot any/all hosts that are not included in the mentioned group.
Any hosts included in the mentioned group are singly rebooted and the playbook waits until the SSH port is available before moving on to the next host.
By using the ‘-l’ flag with ansible-playbook, you can limit which hosts to affect:
For example: “ansible-playbook -l clstr-servers-all reboot-servers.yml
”
Or, after you have run updates you can reboot all the servers via
ansible-playbook -l all reboot-servers.yml
and rest assured that while you are rebooting your infrastructure your sequenced servers are singly operated on and are online before initiating the next server reboot.
--- # reboot things normally, no pausing, no checking # # add your exclusions here as well # - hosts: all:!clstr-servers-all gather_facts: false become: yes tasks: - name: reboot | any/all systems but not sequenced command: /sbin/reboot # add the excluded items from above here to ease the # sequenced rebooting. Wait for SSH availability then # sleep for 30 seconds to make sure processes have settled. # - hosts: clstr-servers-all gather_facts: false become: yes serial: 1 ignore_errors: True tasks: - name: reboot | {{ inventory_hostname }} check reboot need shell: "[ -f /var/run/reboot-required ]" failed_when: False register: reboot_required changed_when: reboot_required.rc == 0 - name: reboot | {{ inventory_hostname }} initiate ignore_errors: True shell: "sleep 5 && /sbin/reboot" async: 1 when: reboot_required.rc == 0 - name: reboot | {{ inventory_hostname }} sleeping pause: > seconds=10 when: reboot_required.rc == 0 - name: reboot | {{ inventory_hostname }} polling local_action: > wait_for host={{ inventory_hostname }} state=started port=22 delay=15 become: false when: reboot_required.rc == 0 - name: reboot | {{ inventory_hostname }} settling pause: > seconds=15 when: reboot_required.rc == 0
# cluster #1, local [clstr01-local] clstr01-01.local clstr01-02.local clstr01-03.local # cluster #2, local [clstr02-local] clstr02-01.local clstr02-02.local clstr02-03.local # group: cluster, local [clstr-local:children] clstr01-local clstr02-local # group: cluster, local, variables [clstr-local:vars] ansible_ssh_user=ansible ansible_ssh_private_key_file=~/.ssh/ansible [clstr01-useast] 172.27.18.5 172.27.19.5 172.27.20.5 [clstr02-useast] 172.27.28.5 172.27.29.5 172.27.30.5 [clstr-useast:children] clstr01-useast clstr02-useast [clstr-useast:vars] ansible_ssh_user=ubuntu ansible_ssh_private_key_file=~/.ssh/cluster-useast.pem # ... you get the drift [clstr-servers-all:children] clstr-local clstr-useast