Diskless management =================== Image generation ---------------- Thanks to Ocean Stack's architecture, diskless image are simply virtual machine images that are exported through iSCSI. See :ref:`diskless blueprint` for details about diskless architecture. Here, we will only document the image generation procedure. The compute node configuration management is out of scope. First, to generate a diskless image, use the :ref:`Add a new service VM` procedure to add a new VM that will be our compute image. This reference VM will be designated as ``COMPUTE_VM``. The procedure for generating a complete image is the following: * If present, backup the old reference image : .. code-block:: shell mv /volspoms1/pcocc/persistent_drives/${COMPUTE_VM}.qcow2 /volspoms1/pcocc/persistent_drives/${COMPUTE_VM}.qcow2.$(date +%F) * Create a new reference image file : .. code-block:: console # prepare-ocean-image.sh ${COMPUTE_VM} Formatting '/volspoms1/pcocc/persistent_drives/${COMPUTE_VM}.qcow2', fmt=qcow2 size=53687091200 backing_file='/volspoms1/pcocc/persistent_drives/rhel.latest.qcow2' encryption=off cluster_size=65536 lazy_refcounts=off * On a working hypervisor, launch the reference VM : .. code-block:: console # . /etc/sysconfig/pcocc-vm-${COMPUTE_VM} # pcocc alloc ${COMPUTE_VM} * Follow the bootstrap process using the **pcocc** cli : .. code-block:: console (pcocc/XXXXX) # pcocc console [...] [ 48.578821] cloud-init[902]: + cloud-init-per instance distro_sync yum distribution-synchronization -y [ 48.822081] cloud-init[902]: Loaded plugins: priorities, search-disabled-repos [ 53.809157] cloud-init[902]: 437 packages excluded due to repository priority protections [ 58.596408] cloud-init[902]: Resolving Dependencies [ 58.598198] cloud-init[902]: --> Running transaction check [ 58.599632] cloud-init[902]: ---> Package bind-libs-lite.x86_64 32:9.9.4-74.el7_6.1 will be updated [ 58.854444] cloud-init[902]: ---> Package bind-libs-lite.x86_64 32:9.9.4-74.el7_6.2 will be an update [ 58.873073] cloud-init[902]: ---> Package bind-license.noarch 32:9.9.4-74.el7_6.1 will be updated [...] .. note:: Please note that the VM might not be reachable immediately. This is because of the time between the boot and the effective configuration of the SSH daemon. Be patient and check the console output for any error that could prevent SSH from listening correctly. * Poll the VM for **cloud-init** completion, if the ``/run/cloud-init/result.json`` file is present, **cloud-init** process is complete : .. code-block:: console (pcocc/XXXXX) # pcocc ssh -J ${COMPUTE_VM} -p 422 vm0 cat /run/cloud-init/result.json { "v1": { "datasource": "DataSourceNoCloud [seed=/dev/sr0][dsmode=net]", "errors": [ "('users-groups', TypeError(\"Can not create sudoers rule addition with type u'bool'\",))", "('scripts-user', RuntimeError('Runparts: 1 failures in 1 attempted commands',))" ] } * Do a first *sanity* reboot, to make sure that the correct kernel is booted. .. code-block:: console (pcocc/XXXXX) # pcocc ssh -J ${COMPUTE_VM} -p 422 vm0 reboot * The VM may boot using the DisklessTrap_ initramfs image. To jump out of the trap, exit the shell present on the console. .. code-block:: console (pcocc/XXXXX) # pcocc console root@${COMPUTE_VM}_DisklessTrap:/root# exit [...] * Apply, once again, a puppet run to make sure that kernel-related changes are correctly applied on the current kernel. .. code-block:: console (pcocc/XXXXX) # pcocc ssh -J ${COMPUTE_VM} -p 422 vm0 puppet-apply [...] Notice: Applied catalog in 121.63 seconds * Rebuild the initramfs, then reboot .. code-block:: console (pcocc/XXXXX) # pcocc ssh -J ${COMPUTE_VM} -p 422 vm0 puppet-apply # dracut -fMv [...] *** Creating initramfs image file '/boot/initramfs-3.10.0-957.35.2.el7.x86_64.img' done *** # reboot * Again, the VM will boot using the DisklessTrap_ initramfs image. To jump out of the trap, exit the shell present on the console. .. code-block:: console (pcocc/XXXXX) # pcocc console root@${COMPUTE_VM}_DisklessTrap:/root# exit [...] * Extract the initramfs and vmlinuz files from the image .. code-block:: console (pcocc/XXXXX) # pcocc ssh -p 422 vm0 "tar -C /boot -czO initramfs-$(uname -r).img vmlinuz-$(uname -r)" | tar -C /volspoms1/pub/boot/diskless/ --transform 's/$/.new/' -xzf - * Shutdown the VM .. code-block:: console (pcocc/XXXXX) # ^D Terminating the cluster... .. _image_variables: * Define destination image and key .. code-block:: shell export KEY=/volspoms1/diskless/keys/stacker-image-$(date +%F).key export RAW_IMG=/volspoms1/diskless/images/raw/stacker-image-$(date +%F).raw export ENC_IMG=/volspoms1/diskless/images/encrypted/stacker-image-$(date +%F).img * Copy the qcow2 image into a raw image using **qemu-img** .. code-block:: console # qemu-img convert -f qcow2 -O raw gluster://top1/volspoms1/pcocc/persistent_drives/${COMPUTE_NODE}.qcow2 gluster://top1${RAW_IMG} [2020-01-14 14:43:29.803547] E [MSGID: 108006] [afr-common.c:5214:__afr_handle_child_down_event] 0-volspoms1-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up. [2020-01-14 14:43:29.803936] E [MSGID: 108006] [afr-common.c:5214:__afr_handle_child_down_event] 0-volspoms1-replicate-2: All subvolumes are down. Going offline until atleast one of them comes back up. [2020-01-14 14:43:29.804297] E [MSGID: 108006] [afr-common.c:5214:__afr_handle_child_down_event] 0-volspoms1-replicate-3: All subvolumes are down. Going offline until atleast one of them comes back up. [2020-01-14 14:43:29.804641] E [MSGID: 108006] [afr-common.c:5214:__afr_handle_child_down_event] 0-volspoms1-replicate-4: All subvolumes are down. Going offline until atleast one of them comes back up. [2020-01-14 14:43:29.804978] E [MSGID: 108006] [afr-common.c:5214:__afr_handle_child_down_event] 0-volspoms1-replicate-5: All subvolumes are down. Going offline until atleast one of them comes back up. [...] * Finally, encrypt (while copying) the image. .. code-block:: console # stacker lio encrypt -k ${KEY} -s ${RAW_IMG} -d ${ENC_IMG} .. note:: *Stacker* will create `${KEY}` and encrypt `${RAW_IMG}` with it. Exporting image to compute nodes -------------------------------- Once diskless image generated, export it to the nodes * Define the nodes .. code-block:: shell export COMPUTE_NODES=ocean[1-1000] * Export previously generated image .. code-block:: shell IMG_NAME="compute_img-$(date +%F)" clush -S -bw iscsi_srv[1-2] stacker lio export -n ${IMG_NAME} -W ${IMG_NAME} -d ${ENC_IMG} -w ${COMPUTE_NODES} clush -S -bw iscsi_srv[1-2] stacker lio config --save .. note:: Keyfile and image are accessed by the VM through GlusterFS. They are defined in :ref:`image and key variables definition`. Accessing image on the compute node ----------------------------------- As said in previous sections, compute node boot on DisklessTrap_. After the boot process node need to be configured to mount the exported compute image. * Configure *iSCSI* client .. code-block:: shell cat << EOF | clush -bw ${COMPUTE_NODES} cat > /etc/iscsi/iscsid.conf << EO_ISCSI iscsid.startup = /bin/systemctl start iscsid.socket iscsiuio.socket node.startup = automatic node.leading_login = No node.session.timeo.replacement_timeout = 15 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.logout_timeout = 15 node.session.err_timeo.abort_timeout = 15 node.session.err_timeo.lu_reset_timeout = 30 node.session.err_timeo.tgt_reset_timeout = 30 node.session.initial_login_retry_max = 8 node.session.cmds_max = 128 node.session.queue_depth = 32 node.session.xmit_thread_priority = -20 node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144 node.conn[0].iscsi.MaxXmitDataSegmentLength = 0 discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768 node.conn[0].iscsi.HeaderDigest = None node.session.nr_sessions = 1 node.session.iscsi.FastAbort = Yes node.session.scan = auto discovery.sendtargets.auth.authmethod = CHAP discovery.sendtargets.auth.username = disco_user discovery.sendtargets.auth.password = disco_pass discovery.sendtargets.auth.username_in = disco_mutual_user discovery.sendtargets.auth.password_in = disco_mutual_pass node.session.auth.authmethod = CHAP node.session.auth.username_in = node_mutual_user node.session.auth.password_in = node_mutual_pass node.session.auth.username = node_user node.session.auth.password = node_pass EO_ISCSI EOF * Discover iscsi server targets .. code-block:: shell for server in $(nodeset -e iscsi_srv[1-2]) do iscsi_prefix=$(ssh server "awk -F= '/^wwn_target_prefix/ {print \$2}' /etc/stacker/stacker.conf) clush -bw ${COMPUTE_NODES} iscsiadm -m discovery -t st -p ${server} clush -bw ${COMPUTE_NODES} iscsiadm -m node -T ${iscsi_prefix}${IMG_NAME} -p ${server}:3260 -l done .. note:: The default port for iSCSI server is 3260. .. note:: `iscsi_srv[1-2]` are iscsi servers serving ${COMPUTE_NODES}. This list should be adapted regarding the cluster architecture. * Configure and launch multipath .. code-block:: shell cat << EOF | clush -bw ${COMPUTE_NODES} cat > /etc/multipath.conf << EO_MULTIPATH defaults { polling_interval 10 failback immediate no_path_retry queue user_friendly_names yes find_multipaths yes prio random uid_attribute ID_FS_UUID } blacklist { devnode "^zram.*" } EO_MULTIPATH multipathd EOF * Copy the image key to the nodes .. code-block:: shell clush -bw ${COMPUTE_NODES} --copy ${KEY} --dest /dev/shm/luksKey .. _open_luks_device: * Open luks device .. code-block:: shell cat < /sysroot/etc/hostname EOF * Launch boot sequence .. code-block:: shell clush -bw ${COMPUTE_NODES} systemctl stop dracut-emergency.service Deactivate multipath ******************** In previous section access to iscsi server is made with multipath configured, this section describe how to deactivate this feature. * iSCSI configuration .. code-block:: shell cat << EOF | clush -bw ${COMPUTE_NODES} cat > /etc/iscsi/iscsid.conf << EO_ISCSI iscsid.startup = /bin/systemctl start iscsid.socket iscsiuio.socket node.startup = automatic node.leading_login = No node.session.timeo.replacement_timeout = 600 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.logout_timeout = 15 node.conn[0].timeo.noop_out_interval = 0 node.conn[0].timeo.noop_out_timeout = 0 node.session.err_timeo.abort_timeout = 15 node.session.err_timeo.lu_reset_timeout = 30 node.session.err_timeo.tgt_reset_timeout = 30 node.session.initial_login_retry_max = 8 node.session.cmds_max = 128 node.session.queue_depth = 32 node.session.xmit_thread_priority = -20 node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144 node.conn[0].iscsi.MaxXmitDataSegmentLength = 0 discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768 node.conn[0].iscsi.HeaderDigest = None node.session.nr_sessions = 1 node.session.iscsi.FastAbort = Yes node.session.scan = auto EO_ISCSI EOF * Remove multipath configuration .. code-block:: shell cat << EOF | clush -bw ${COMPUTE_NODES} rm /etc/multipath.conf pkill multipathd EOF * Modify :ref:`this step` when decrypting luks device with: .. code-block:: shell clush -bw ${COMPUTE_NODES} cryptsetup luksOpen -d /dev/shm/luksKey /dev/sda .. note:: The device used here is `/dev/sda` and should be modified regarding server exported images and node configuration. DisklessTrap initramfs ---------------------- .. _DisklessTrap: We provide a dracut module to manage diskless boot. It generates an initramfs to *Trap* the node boot proccess. Once the node in this state, it will be accessed though *ssh* with needed tools to boot with any diskless method supported by Ocean. Installation is done automaticaly by puppet during the VM boot. It can be done manualy by installing ``dracut-ccc-modules`` package. .. code-block:: shell dnf install -y dracut-ccc-modules Puppet will configure *DisklessTrap* in order to generate a full featured diskless initramfs image. To update initramfs image content and behaviour check ``/etc/dracut-ccc-modules.conf``, then update it with: ``dracut -fMv``. More information here ``man DisklessTrap``.