Installation methodology ======================== .. todo:: Offsite, Onsite and production phases, (message d'origine) Offsite, Onsite and production phases ------------------------------------- La phase **Offsite** comprend * la prise en compte de l'implentation en salle machine du calculateur * l'intégratrion des données Constructeur correspondant aux matériels à installer et à configurer * la mise en place d'une plateforme et solution logicielle en charge de l'enregistrement et la vie des informations précédentes - machine ou vm supportant la pile logicielle Ocean - paquets logiciels innovants pour: + intégrer rapidement les évolutions structurelles du calculateur + interfacer les données de différents constructeurs dans un format unifié et standardisé YAML + faciliter l'exploitation au quotidien et au maintien en conditions opérationnelles * la création d'un média DVD ou USB amorçant le déploiement le premier noeud du calculateur * la configuration du réseau et plan d'adressage * la virtualisation telle qu'elle va être utilisée dans cette pile logicielle * la configuration et l'organisation du stockage et des couches logicielles associées La phase **Onsite** constitue l'installation initiale de l'ensemble des éléments constitutant ce calculateur. On y décrit toutes les étapes de mises en place des sous-ensembles noeuds et infrastructure réseau nécessaires au bon fonctionnement de l'ensemble. Dans cette phase, on y trouve également l'intégration de la plateforme précédente. La phase **Production** est l'exploitation de la phase Onsite. Associée aux sections 'Administration', 'HandBooks' et 'CookBooks', elle permet le maintien en conditions opérationnelles du calculateur. Elle intégre naturellement les outils de gestion d'administration, de 'Monitoring' et les automates d'actions auto-correctives afin d'assurer une haute disponibilité. La **Formation** aux outils et des dispositions fonctionnelles constitue un élément essentiel au travail collaboratif entre les équipes. Offsite preparation =================== Données Constructeur -------------------- Cette étape va permettre de récupérer et croiser les différentes informations : * l'emplacement physique (topologie en salle machine) fournie par le Client * la constitution physique du calculateur (fournie par le Constructeur) - le nombre d'éléments 'rack' et leur hauteur - les sous-ensembles constituant chaque rack - les capacités de connectique de chaque sous-ensemble - les liaisons entre connectiques des différents sous-ensembles - l'adressage MAC d'une partie des équipements (top et worker), et pour le reste le support de l'option 82 (dhcp) Tous ces éléments vont être consolidés dans une base de données. Cette base, dans un premier temps externe au calculateur sera ré-introduite lors de la phase **Onsite**. Elle sera utilisée tout au long de la vie du calculateur, entre autres, pour les interventions matérielles. Les éléments Constructeur sont fournis sous forme de 'netlist'. Une compréhension de cette dernier voire une évolution de sa présentation pourra être demandée à celui-ci. Plateforme et logiciels pour la connaissance structurelle --------------------------------------------------------- La mise en place des outils Ocean pour la gestion et vie de l'environnement demande la mise en place d'un système. Le choix de l'utilsation d'une VM donne plus de souplesse, de la mobilité et de sécurité (sauvegarde d'un seul fichier qcow) VM SitePrep ^^^^^^^^^^^ Installation et configuration Pcocc pour une vm siteprep .. todo:: Recuperation des notes de la mise en place de la vm mtest Installation de Racktables, hwdb et netcc * RackTables: - Installation .. code-block:: shell yum install -y RackTables mariadb-server mariadb - Apache integration .. code-block:: shell mkdir /var/www/html/racktables ln -s /usr/share/RackTables/wwwroot/index.php /var/www/html/racktables - Starting services and following instructions displayed on (%HOST%/racktables) .. code-block:: shell systemctl start httpd mariadb # Step 1: login and password are described in secret.php touch '/etc/RackTables/secret.php'; chmod a=rw '/etc/RackTables/secret.php' # Step 3 mysql << EOF CREATE DATABASE racktables_db CHARACTER SET utf8 COLLATE utf8_general_ci; CREATE USER racktables_user@localhost IDENTIFIED BY 'MY_SECRET_PASSWORD'; GRANT ALL PRIVILEGES ON racktables_db.* TO racktables_user@localhost; EOF # Step 4 chmod 440 /etc/RackTables/secret.php chown apache:apache /etc/RackTables/secret.php * hwdb (inside package confiture): * Row insertion: .. code-block:: shell hwdb obj add -t Row A hwdb obj list * Rack by row: .. code-block:: shell hwdb obj add -t Rack --container A A[3-7] hwdb obj list * Restauration des types, models, ports et compatiblités .. todo:: Uniformiser les options de restauration ! .. code-block:: shell hwdb port type restore rt_dumps/ptypes.dump hwdb port compat restore --csv rt_dumps/pcompat.dump hwdb obj model restore rt_dumps/models.dump * Insertion des Sequana2 .. code-block:: shell hwdb cell add --rack A6 --prefix s20 templates/sequana2.hw.yaml hwdb cell add --rack A7 --prefix s22 templates/sequana2.hw.yaml * Insert switchs, diskarrays, top and worker (example) .. code-block:: shell # inserts 2 servers (2U) at base level 15 and 17 for rack A4 hwdb obj add -t server --container A4 --label top --slots 15,17 --size 2 top[1,2] hwdb obj update --model "SuperMicro 2U" x430 top[1-3] hwdb obj update --model "SuperMicro 2U" x430 worker[1-3] # inserts 2 x nexus 9364c in rack A3 at base level 6 and 9 hwdb obj add -t 'network switch' --container A3 --label nexus-9364c \ --slots 6,9 --size 2 esw[1-2] # inserts 4 x 3650 in rack A4 between level 6 and 9 hwdb obj update --model "Nexus" 9364c esw[1-2] hwdb obj add -t 'network switch' --container A4 --label 3650 \ --slots 6,7,8,9 --size 1 esw[3-6] hwdb obj update --model "Cisco" 3650 esw[3-6] # inserts jbod in rack A5 at base level 6 and 9 hwdb obj add -t DiskArray --container A6 --label jbod-r6 \ --slots 3 --size 2 --model "SuperMicro 2U" x430 yyy # Insert Colddoor hwdb obj add -t PDU --container A3 --label cooldoor --slots 1 --size 1 --subcontainer rear i0r0cooldoor0 * Inserts links .. code-block:: shell hwdb port add --label master -t hardwired 1000Base-T node180 Ethernet1 hwdb port add --label slave -t hardwired 1000Base-T node180 Ethernet2 # And links hwdb port link i10esw1 Ethernet4 node180 Ethernet1 hwdb port link i10esw2 Ethernet4 node180 Ethernet2 hwdb port update --label 'master opt82 shared=BMC nolag' node180 Ethernet1 hwdb port update --label 'slave opt82 shared=BMCslave nolag' node180 Ethernet2 # hwdb port compat add --from 1000Base-T --to 'empty SFP+' # Uplinks hwdb port add -t QSFP+ 'empty QSFP' esw[1-2] Ethernet[1-48] hwdb link esw1 Ethernet[1-4] esw2 Ethernet[1-4] hwdb port link i10esw1 Ethernet53 esw1 Ethernet5 hwdb port link i10esw2 Ethernet53 esw2 Ethernet5 # lags and speed hwdb port update --label 'speed=40000 lag=vpc10' i10esw[1-2] Ethernet53 hwdb port update --label 'speed=40000 lag=vpc10' esw[1-2] Ethernet5 * A tool which reads Provider netlist and converts all entries in hwdb commands .. note:: A structured csv file with only descriptions usable in a specific sheet could be provided to reduce the disparities between the different manufacturers and allow a simplified production of hwdb commands .. todo:: écrire un cahier des charges du besoin lié à netcc Installation et configuration de confiture * Installation .. code-block:: shell yum install -y confiture git emacs-nox vim vim-enhanced * Bootstrap confiture .. code-block:: shell git init cluster cp -aR /usr/share/doc/confiture*/examples/* cluster/ * Configurez l'URL vers la DB dans le ``confiture.yaml``, les paths sont relatifs à l'emplacement de ``confiture.yaml`` .. code-block:: shell # Starting configuration (/path/confiture/confiture.yaml) common: hiera_conf: hirea.yaml template_dir: templates/ output_dir: output/ dhcp: conf_name: dhcpd.conf dns: conf_name: named.conf racktables: url: 'mysql://racktables_user:MY_SECRET_PASSWORD@localhost/racktables_db' * Confiture Network Range Dans le fichier network.yaml, on définit les sous-réseaux associés à chacun des équipements: - 1 réseau ``bbone`` pour l'accès backbone pour les top et worker: A.B.C.0/24 - 1 réseau ``eq`` pour l'accès et la surveillance des équipements: E.4.0.0/23 - 1 réseau ``adm`` pour l'accès admin des noeuds de management: E.1.0.0/24 - 1 réseau ``data`` pour l'accès aux données dans glusterfs: E.5.0.0/24 - 1 réseau ``ipmi`` pour l'accès admin des noeuds de management: E.4.0.0/24 .. todo:: vérifier les définitions réseaux .. code-block:: yaml networks: # TOP Bbone network bbone: range: A.B.C.0/24 interface: 'enp130s0f0' nameservers: - "${address('top1-bone')}" tftpservers: - "${address('top1-bone')}" bmgrservers: - "${address('top1-bone')}" # Vlan 1 ? eq: range: X.0.0.0/23 interface: 'eno2' nameservers: - "${address('top1-eq')}" ntpservers: - "${address('top1-eq')}" tftpservers: - "${address('top1-eq')}" bmgrservers: - "${address('top1-eq')}" # Administration network # Vlan 1 ? adm: range: X.1.0.0/24 interface: 'ens1' bmgrservers: - "${address('top1-adm')}" nameservers: - "${address('top1-adm')}" ntpservers: - "${address('top1-adm')}" tftpservers: - "${address('top1-adm')}" # Vlan 1 ? data: range: X.5.0.0/24 interface: 'enp130s0f0' bmgrservers: - "${address('top1-adm')}" nameservers: - "${address('top1-adm')}" ntpservers: - "${address('top1-adm')}" tftpservers: - "${address('top1-adm')}" # BMC: physical network # Vlan 104 ? ipmi range: X.4.0.0/24 interface: 'enp130s0f0' bmgrservers: - "${address('top1-adm')}" nameservers: - "${address('top1-adm')}" ntpservers: - "${address('top1-adm')}" tftpservers: - "${address('top1-adm')}" [...] Dans le fichier addresses.yaml, on associe des IP aux réseaux précédents. Exemple: .. code-block:: yaml addresses: top[1-3]: default: [adm,eq,bone,data,ipmi] bbone: A.B.C.[1-3] eq X.0.0.[1-3] adm: X.1.0.[1-3] data: X.5.O.[1-3] ipmi A.B.C.[128-130] worker[1-3]: default: [adm,eq,bone,data,ipmi] bbone: A.B.C.[4-6] eq X.0.0.[4-6] adm: X.1.0.[4-6] data: X.5.O.[4-6] ipmi A.4.0.[128-130] esw[1-2]: default: [adm] adm: A.4.$(islet-id).[1-2] esw[4-6]: default: [adm] adm: A.4.$(islet-id).[4-6] .. todo:: A compléter avec les swicths et les islets? Installation media preparation ------------------------------ On-site installation will require a traditional installation using external installation media. To guaranty that this media's content match what we intend to install, we will generate it. To do this, we must have a node (virtual or not) running the OS we want to install. Here, we will use the latest ``cloud-ocean`` pcocc image available. Other means can be used to launch the very same image (VirtualBox, libvirt, ...). Boot image ^^^^^^^^^^ First step is to generate a boot image using `lorax`. This image will include a minimal OS and the `anaconda` installer. No other content (RPMs for instance) is included. Install lorax: .. code-block:: shell yum install -y lorax Generation currently requires the CentOS-os, CentOS-updates, CentOS-extras and Ocean repositories. Collect the required repository URLs: .. code-block:: shell yum repolist -v | grep baseurl Launch lorax generation: .. code-block:: shell # Ocean major.minor version (2.x) oswanted=2.6 # URL Ocean repo yumsrv="http://pkg/mirror/pub/linux/ocean/" lorax --isfinal -p Ocean -v ${oswanted} -r 1 \ -s ${yumsrv}/${oswanted}/ocean/x86_64 \ -s ${yumsrv}/${oswanted}/centos-os/x86_64 \ -s ${yumsrv}/${oswanted}/centos-updates/x86_64 \ -s ${yumsrv}/${oswanted}/centos-extras/x86_64 \ -s ${yumsrv}/${oswanted}/epel/x86_64 \ -s ${yumsrv}/${oswanted}/ocean/x86_64 \ -s ${yumsrv}/${oswanted}/greyzone/x86_64 \ /tmp/lorax_image Installation repos ^^^^^^^^^^^^^^^^^^ Now we have to include some content into the generated image. First, gather all the packages that might be required during the kickstart using ``yum``: .. code-block:: shell mkdir -p /tmp/ocean_media/Packages/ yum install -y --installroot=/tmp/ocean_media/Packages/ --downloadonly --downloaddir=/tmp/ocean_media/Packages/ @core @base @anaconda-tools anaconda puppet puppet4 bridge-utils lsof minicom strace tcpdump vim emacs-nox bind-utils crash yum-utils rm -Rf /tmp/ocean_media/Packages/var If any other package is required it should be included here. Recreate the yum groups using the CentOS's `comps.xml`: .. code-block:: shell createrepo -g /dev/shm/packages/ocean_centos/comps.xml /tmp/ocean_media/ .. note:: CentOS `comps.xml` is available here : http://mirror.centos.org/centos/7/os/x86_64/repodata/aced7d22b338fdf7c0a71ffcf32614e058f4422c42476d1f4b9e9364d567702f-c7-x86_64-comps.xml Media metadata ^^^^^^^^^^^^^^ Mount and copy the content of the generated boot image: .. code-block:: shell mkdir /mnt/lorax_image /tmp/lorax_image_content mount -o loop /tmp/lorax_image/images/boot.iso /mnt/lorax_image rsync -avr /mnt/lorax_image/ /tmp/lorax_image_content rm /tmp/lorax_image_content/isolinux/boot.cat And now that we have all the bits to make the media, assemble everything: .. code-block:: shell mkisofs -o /tmp/ocean.iso -b isolinux/isolinux.bin -c isolinux/boot.cat -boot-load-size 4 -boot-info-table -no-emul-boot -eltorito-alt-boot -e images/efiboot.img -no-emul-boot -R -V "Ocean ${oswanted} x86_64" -T -graft-points isolinux=/tmp/lorax_image_content/isolinux images/pxeboot=/tmp/lorax_image_content/images/pxeboot LiveOS=/tmp/lorax_image_content/LiveOS EFI/BOOT=/tmp/lorax_image_content/EFI/BOOT images/efiboot.img=/tmp/lorax_image_content/images/efiboot.img .discinfo=/tmp/lorax_image/.discinfo .treeinfo=/tmp/lorax_image/.treeinfo Packages=/tmp/ocean_media/Packages repodata=/tmp/ocean_media/repodata isohybrid --uefi /tmp/ocean.iso implantisomd5 /tmp/ocean.iso checkisomd5 /tmp/ocean.iso Finally, try it out on a machine with qemu installed and X11 access: .. code-block:: shell qemu-system-x86_64 -m 1024 -smp 1 -cdrom ./ocean.iso When validated, burn it on a DVD or on USB storage RAID Configuration on top and worker ------------------------------------ Using console plugged on each node, create: * 1 RAID1 named 'system' with the two first drives * 1 RAID10 named 'data' with all other drives Initialize all RAID drives Network definition ------------------ .. todo:: Guide on how to do RackTable insertion. Vlan & IP Partioning design. The result should be configuration files generated by confiture (DNS, DHCP, Switches). Vlan and IP design guide ^^^^^^^^^^^^^^^^^^^^^^^^ In an Ocean Stack cluster, the first requirement is that each islet must be in a independent set of VLANs. This is a requisite for three reasons : scalability, reliability and management ease. This is because a cluster evolves. Adding or removing nodes must not affect the operational state of the cluster. Because of this, the ethernet fabric design should be able to route between those VLANs in an effective way. The current best-pratice (documented :ref:`N9K L3 Fabric architecture`), uses a L3 fabric and the BGP protocol as a way to dynamically route IP traffic between islet. A second requirement is to do an clear separation of IP Subnets that depends on node or equipment types. For example, in a compute islet, compute nodes and their related BMC should be in separate IP subnets. The same thing should be done for administrative nodes versus service IPs. A best-practice is to do a hierarchical allocation of IP subnets that respect CIDR subnets. This makes the design of ACLs easier. For ex. having all the administrative allocations in the first "/13" subnets and all the service nodes in the second "/13" subnets. An example of IP allocation could be : | 10.0.0.0/16 | ├── 10.0.0.0/13 Management nodes and services | │ ├── 10.1.0.0/24 Central mngt servers | │ ├── 10.1.10.0/24 Islet10 mngt IPs | │ ├── 10.1.20.0/24 Islet20 mngt IPs | │ └── 10.3.0.0/24 Service IPs | ├── 10.8.0.0/13 Service nodes and related equipments | │ ├── 10.8.0.0/24 Central service nodes | │ └── 10.8.20.0/24 Islet 20 service nodes | ├── 10.16.0.0/13 User nodes and related equipments | │ └── 10.16.20.0/24 Islet 20 compute nodes | └── 10.32.0.0/11 Cluster equipments | └── 10.32.20.0/24 Islet 20 compute node BMCs And with a VLAN mapping that ensures no equipments can spoof another equipment of another type: .. list-table:: VLANs mapping * - VLAN - IP Subnet * - A - 10.1.0.0/24, 10.3.0.0/24 * - B - 10.8.0.0/24 * - C - 10.1.10.0/24 * - D - 10.1.20.0/24 * - E - 10.8.20.0/24 * - F - 10.16.20.0/24,10.32.20.0/24 Virtual machine definition -------------------------- .. todo:: Guide on how to do VM definition (Pcocc + Puppet) with ready-to-use examples for mandatory services. Storage definition ------------------ .. todo:: Guide on how to design the GlusterFS cluster. May be limited to our way to use gluster (blocks of 3 servers) Onsite Installation =================== Overview -------- The installation process is roughtly the following: - Install the base system on the first management node - Configure this node with all the components needed to deploy the other management nodes - Deploy the other management nodes using the management network - Deploy the Ethernet fabric (administration network) - Install and configure the Ocean components on those nodes using the temporary infrasctrure of the first node - Validate the final infrastructure - Redeploy and integrate the first node Note that most configuration files will be already generated using `confiture`, the Ocean's configuration generator. When these step are all done, diskless or diskfull compute nodes can be deployed. Compute node hardware specifics are out-of scope of this document but some advises might be present. .. _requirements: Requirements ------------ Management nodes should be configured with storage system ready to use. The name of thoses disks (viewed from the OS) will be required by BMGR for the kickstart process. We advise a minimum of 60Gb RAID1 storage for the management node system. Data storage will depend on your hardware but hardware RAID controller are preffered over software ones. .. note:: The top management nodes of our test-bed got 2 SATA-DOM in RAID1 (Intel Rapid Storage) and 10 disks in RAID10 (+2 Hot Spares), respectively viewed as ``Volume0_0`` and ``sdc`` drives. Default BIOS configuration will be just fine on most cases, we just need the following features to be activated (or deactivated): - SRIOV support activated - AES-NI support activated (not mandatory but advised) - Legacy boot only - BMC configured with DHCP (if they are cabled inside the cluster, at your discretion if not). - Energy saving features disabled (Fan profile, CPU profile, Energy efficient features, ...) - Boot order: Network, CD/DVD, USB, system hard drives - Network boot devices, this setting might be handled by an option ROM: - Interface cabled onto bbone network for top worker nodes - Interface cabled onto management network for the other management nodes - Interface cabled onto administration network Moreover, network switchs should be in factory configuration. .. note:: To factory reset Cisco switch, in the management shell : ``erase startup-config`` and ``reload`` .. note:: To factory reset Arista switch: in the Aboot shell (at boot time) : ``mv /mnt/flash/startup-config /mnt/flash/startup-config.old`` ``reboot`` or in a priviledged shell : ``erase startup-config`` and ``reload`` .. warning:: Some switch have their ports disabled while port are going up (SpanningTree-related). Moreover, DHCP snooping may be enabled by default. To mitigate both issues, set the DHCP server are a trusted source port (``ip dhcp snooping trust``) and set server-facing ports as edge ports (or Cisco's ``portfast``, ``spanning-tree portfast``) This installation method also requires that Ocean's repositories are reachable. First node deployment --------------------- System installation ^^^^^^^^^^^^^^^^^^^ With the Ocean's installation media burned on a USB key or DVD, boot the first node. Graphical installer is not available here as the textual installer is easier to document and to use in a console installation context. If you have never done it, we advise to check the media using the "Test this media & Install Ocean ${oswanted}" boot option. It might take some time but gives confidence. When the installer is launched and prompts you the main menu, you can now proceed with the configuration : 1) Language setting: English (United Stated) 2) Timezone : Europe/Paris 3) Installation source : Local media (auto detected) 4) Software selection : Minimal Install 5) Installation destination: Use the whole system disk with LVM. The partitioning scheme doesn't really matter here as we'll reinstall this node soon. 6) KDump: Enabled 7) Network configuration: Configure the backbone interface in order to get a remote access. Also configure nameservers and hostnames. 8) Root password: Configure a temporary root password 9) User creation: No system user should be created System pre-configuration """""""""""""""""""""""" If you have anything to do after the installation but before rebooting, you can modify configuration in anaconda's shell (switch with Alt+Tab). System is installed within ``/mnt/sysimage``. For instance, here we disable ``firewalld`` and ``SElinux`` and change ssh default port: .. code-block:: shell systemctl --root /mnt/sysimage disable firewalld sed -i 's/^SELINUX=enforcing/SELINUX=disabled/' /mnt/sysimage/etc/selinux/config sed -i 's/^#Port 22/Port 422/' /mnt/sysimage/etc/ssh/sshd_config After the installation is complete, make sure the node is booting on the system disks and open a remote shell onto it. System configuration ^^^^^^^^^^^^^^^^^^^^ Anaconda installations enables some unwanted features like SELinux and firewalld. Make them inactive: .. code-block:: shell systemctl disable --now firewalld setenforce Permissive sed -i 's/^SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config For security purposes, we strongly recommends the usage of the 422 port for SSH. To do so: .. code-block:: shell sed -i 's/^#Port 22/Port 422/' /etc/ssh/sshd_config systemctl restart sshd The installation process requires complete Ocean repos, configure them manually on the first node. You should have the following repos configured: - Ocean - Ocean-updates - Greyzone - Greyzone-updates - CentOS - CentOS-updates - CentOS-extras - EPEL - Ocean missing - Gluster .. note:: An ``ocean.repo`` may be available at the root of your package repositories. You may have to disable included repos using official CentOS mirrors to make ``yum`` work. Use the ``--disablerepo`` option to do so: .. code-block:: shell yum --disablerepo base,extras,updates makecache Install some packages, disable permanently CentOS official repos and synchronize the system with the available packages: .. code-block:: shell yum -y --disablerepo base,extras,updates install yum-utils yum-plugin-priorities yum-config-manager --disable base,extras,updates yum distribution-synchronization systemctl disable --now NetworkManager yum remove -y NetworkManager\* .. _deployment_net_interface_cfg: Network configuration ^^^^^^^^^^^^^^^^^^^^^ This node is connected to all available networks (backbone, management and administration). Backbone was configured by you in Anaconda's text UI. Now **configure all the internal networks** using the following template and the addressing scheme designed in the off-site step: .. code-block:: bash # /etc/sysconfig/network-scripts/ifcfg-eno2 # Here eno2 is the management network, and is 10.0.0.1 DEVICE=eno2 BOOTPROTO=static BROADCAST=10.0.0.255 IPADDR=10.0.0.1 NETMASK=255.255.255.0 NETWORK=10.0.0.0 ONBOOT=yes You will most probably require some IP routes to be configured, if so don't forget to set those in ``/etc/sysconfig/network-scripts/route-INTERFACE_NAME`` .. _mellanox_ethernet_setup: Mellanox cards """""""""""""" If you're using Mellanox VPI cards for 40G/50G/100G ethernet links, install the Mellanox OFED and load the drivers: .. code-block:: shell yum install -y mlnx-ofa_kernel kmod-mlnx-ofa_kernel ocean-fw-mlnx-hca infiniband-diags mstflint kmod-kernel-mft-mlnx unzip systemctl start openibd If needed, use the firmwares present in ``/usr/share/ocean-fw-mlnx-hca/firmware/`` and the ``mstflint`` tool to burn your firmware: .. code-block:: shell unzip %FIRMWARE%.bin.zip mstflint -d 81:00.0 -i %FIRMWARE%.bin burn Methods to get the card PSID and OPN can be found in ``/usr/share/ocean-fw-mlnx-hca/release_notes/README.txt``. If needed and using the ``mstconfig`` tool, verify and set the link type to Ethernet (a link type of 2 means Ethernet): .. code-block:: shell mstconfig -d 81:00.0 query | grep LINK_TYPE mstconfig -y -d 81:00.0 set LINK_TYPE_P1=2 After configuing Mellanox card to Eternet, Flexboot mecanism is activated and may take a long time to initialize 40G links. To deactivate Flexboot: .. code-block:: shell mstconfig -d 81:00.0 q LEGACY_BOOT_PROTOCOL EXP_ROM_PXE_ENABLE mstconfig -y -d 81:00.0 set LEGACY_BOOT_PROTOCOL=NONE EXP_ROM_PXE_ENABLE=0 After a reboot, the card should appear as a ``ensX`` network device, and can be configured like the other interfaces. MAC addresses gathering """"""""""""""""""""""" If this is not done yet, here is a method to collect MAC addresses on the management network. We assume here that BMC are auto-configuring using DHCP. Remember that some switches have some requirements (especially spanning-tree related) that have to be met. See :ref:`requirements` for details. Using SSH or a console cable, open a shell to the management network switch and display the ARP table. Here we're using a USB console cable on a Cisco Catalyst switch: .. code-block:: console screen /dev/ttyACM1 Switch> show mac address-table Using the displayed MAC/Port mapping, match them with the expected cabling (``hwdb port list --local esw2``), insert them into confiture's data files and re-generate the DHCP configuration. The ``shut/no shut`` trick may be applied on a switch port to force the equipment to relaunch the DHCP phase. .. note:: Catalyst's management interface don't do DHCP by default, to activate it add ``ip address dhcp`` the management interface configuration (``fastethernet0`` in our case). Get the interface's MAC with ``show interface fastethernet0`` For the backbone network, you may not have access to the switch. As there is only 3 nodes that will boot over it a simple ``tcpdump`` while booting the node will do the job. DHCP & Named installation """"""""""""""""""""""""" Using the node used for the off-site preparation phase, update confiture's data with the discovered MACs, re-generate the dhcp configuration and import the dhcpd and named configuration files. Put them in the right place and start bind and dhcpd. .. note:: Some adjustements may have to be done on generated configuration. As a general rule, don't modify generated files, modify templates and import back the generated files. .. code-block:: bash yum install -y dhcp bind bind-utils systemctl enable --now named dhcpd Now, configure the `resolv.conf` with yourself as a nameserver and verify that all BMC are now reachable. LAN MACs gathering """""""""""""""""" Now gather the management node LAN interface MACs. To do so, either : * Make them boot on the network and collect the MACs: * Make sure that the interface is used by the BIOS for PXE (setting in BIOS menu or Option ROM) * Using IPMI set the next boot device to PXE: .. code-block:: bash ipmitool -U %USER% -P %PASSWORD% -H %HOSTNAME% chassis power off ipmitool -U %USER% -P %PASSWORD% -H %HOSTNAME% chassis bootdev pxe ipmitool -U %USER% -P %PASSWORD% -H %HOSTNAME% chassis power on * Collectl the MACs on the switch or use tcpdump to capture DHCP requests: .. code-block:: console $ screen /dev/ttyACM1 > show mac address-table * Use the BMC web interface on get the system's LAN MAC address. * Use the BIOS or Option ROMs informations * On `SuperMicro` hardware, you can get the first LAN MACs by issuing the following IPMI raw command: .. code-block:: bash ipmitool -U %USER% -P %PASSWORD% -H %HOSTNAME% raw 0x30 0x21 | tail -c 18 | tr ' ' ':' With those MACs gathered, update confiture's data and update the DHCP configuration with the freshly generated configuration. BMGR installation ^^^^^^^^^^^^^^^^^ Install the BMGR tool: .. code-block:: shell yum install -y bmgr Start and initialize the database: .. code-block:: shell systemctl enable --now mariadb mysql << EOF grant all privileges on bmgr.* to bmgr_user@'localhost' identified by 'bmgr_pass'; create database bmgr; EOF FLASK_APP=bmgr.app flask initdb Add the WSGI entrypoint into Apache's configuration file: .. code-block:: shell echo 'WSGIScriptAlias /bmgr "/var/www/bmgr/bmgr.wsgi"' >> /etc/httpd/conf/httpd.conf systemctl enable --now httpd Test with the CLI: .. code-block:: shell bmgr host list Configuration """"""""""""" Create node profiles and assign weight to them: .. code-block:: shell bmgr profile add -w 0 ocean_mngt bmgr profile add -w 5 ocean_mngt_top bmgr profile add -w 10 ocean_mngt_top_1 bmgr profile add -w 5 ocean_mngt_worker bmgr profile add -w 5 ocean_mngt_islet_worker Add the cluster nodes and associated profiles into bmgr: .. code-block:: shell bmgr host add --profiles ocean_mngt,ocean_mngt_top,ocean_mngt_top_1 top[1-3] bmgr host add --profiles ocean_mngt,ocean_mngt_worker worker[1-3] bmgr host add --profiles ocean_mngt,ocean_mngt_islet_worker islet[10-11,20-21,...] Add profile specificities: .. code-block:: shell # The names of the network interface are given as configuration examples (see section 'Network Configuration') bmgr profile update ocean_mngt_top_1 -a netdev enp130s0f0 -a ks_drive Volume0_0 bmgr profile update ocean_mngt_worker -a netdev enp3s0f0 -a ks_drive Volume0_0 bmgr profile update ocean_mngt_islet_worker -a netdev eno1 -a ks_drive Volume0_0 bmgr profile update ocean_mngt -a console ttyS1,115200 -a ks_selinux_mode disabled -a ks_firewall_mode disabled -a ks_rootpwd root -a kickstart http://top1-mngt/bmgr/api/v1.0/resources/kickstart/ .. note:: This strongly depends on your hardware specificities, it may be convenient to create additionnal profiles. For example, Cisco Nexus 9K Zero-touch provisionning can use bmgr features to autoconfigure itself. It is up to administrators to design profiles hierarchy and attributes. This is only an example used in our test bed. Moreover, to help you bmgr can assign weights to individual profiles, giving them a higher priority. Deployment server ^^^^^^^^^^^^^^^^^ Lorax image """"""""""" Kickstart process will use a custom boot image, this image will be generated with the ``lorax`` tool. Install ``lorax``: .. code-block:: shell yum install -y lorax Launch the build process, with the package repo URLs defined in the repo file: .. code-block:: shell lorax -p Ocean -v ${oswanted} -r 1 $(sed -ne 's/^baseurl=/-s /p' /etc/yum.repos.d/ocean.repo) /var/www/html/boot Configure ``bmgr`` accordingly: .. code-block:: shell bmgr profile update ocean_mngt -a initrd http://top1-mngt/boot/images/pxeboot/initrd.img -a kernel http://top1-mngt/boot/images/pxeboot/vmlinuz -a install_tree http://top1-mngt/boot .. note:: As `top` nodes may be deployed on a different physical network (backbone instead of internal network), `bmgr` and other configuration item may have to be duplicated between profiles. For example, for `top` nodes: .. code-block:: shell bmgr profile update ocean_mngt_top -a initrd http://top1-bbone/boot/images/pxeboot/initrd.img -a kernel http://top1-bbone/boot/images/pxeboot/vmlinuz -a install_tree http://top1-bbone/boot Repositories """""""""""" Kickstart process requires local repos, using ``reposync`` and ``createrepo`` create a temporary clone of CentOS repositories: .. code-block:: shell yum install -y createrepo reposync -p /var/www/html/boot/packages -r centos-updates -r centos-os -r ocean -r ocean-updates -r ocean-missing -n -m createrepo -g /var/www/html/boot/packages/centos-os/comps.xml /var/www/html/boot .. warning:: Repository names (``-r`` arguments) may differ .. warning:: This will roughly use 12Gb in the ``/var`` filesystem Package repository proxy """""""""""""""""""""""" Using Apache, configure a proxy to your package repository: .. code-block:: shell cat > /etc/httpd/conf.d/mirror.conf << EOF ProxyPass /mirror http://yumsrv.ccc.cea.fr/ ProxyPassReverse /mirror http://yumsrv.ccc.cea.fr/ EOF systemctl reload httpd .. warning:: Adapt the content of ``mirror.conf`` with your repository URL. This should point to some URL where all the repos are available as subdirectories. Configure ``bmgr`` accordingly: .. code-block:: shell echo ${oswanted} bmgr profile update ocean_mngt -a ks_repos http://top1-mngt/mirror/ocean/${oswanted}/ocean/x86_64,http://top1-mngt/mirror/ocean/${oswanted}/ocean-updates/x86_64,http://top1-mngt/mirror/ocean/${oswanted}/centos-os/x86_64,http://top1-mngt/mirror/ocean/${oswanted}/centos-update/x86_64,http://top1-mngt/mirror/ocean/${oswanted}/centos-extras/x86_64,http://top1-mngt/mirror/ocean/${oswanted}/epel/x86_64,http://top1-mngt/mirror/ocean/${oswanted}/greyzone/x86_64,http://top1-mngt/mirror/ocean/${oswanted}/greyzone-updates/x86_64 Admin SSH key """"""""""""" Generate a SSH key, this one will be used after the kickstart process is finished (as no password will be set): .. code-block:: shell ssh-keygen -b 4096 cp ~/.ssh/id_rsa.pub /var/www/html/authorized_keys cp ~/.ssh/id_rsa.pub /root/.ssh/authorized_keys Configure ``bmgr`` accordingly: .. code-block:: shell bmgr profile update ocean_mngt -a ks_authorized_keys_url http://top1-mngt/authorized_keys TFTP server """"""""""" A TFTP server is required for PXE chainloading. Install a TFTP server: .. code-block:: shell yum install -y xinetd tftp-server tftp systemctl enable --now xinetd tftp And make iPXE network boot loader images available through TFTP: .. code-block:: shell yum install -y ipxe-bootimgs ln /usr/share/ipxe/{undionly.kpxe,ipxe.efi} /var/lib/tftpboot/ .. warning:: Symbolic links are not followed by TFTP server. Only use hardlinks or copy the file you want to serve. DHCP update """"""""""" Update the DHCP template and confiture's data with deployment server specifics: * BMGR server URL * TFTP server IP * iPXE ROM name. * DNS IPs Apply the configuration and restart the dhcp server. .. note:: Some equipments may only support EFI rom, modify template to reflect this. Worker nodes deployment ----------------------- Now that we have everything required to kickstart a node, try to deploy the second node: * Double check that iPXE script and kickstart file are correct: .. code-block:: shell bmgr resource render ipxe_deploy_boot top2 bmgr resource render kickstart top2 .. note:: Kickstart template may be modified, it is located in ``/etc/bmgr/templates/ks_rhel7.jinja`` * Configure the BIOS with the settings mentioned above. Make sure that RAID device are present and correctly defined in kickstart file. * Set the next bootdev to PXE: .. code-block:: shell ipmitool -U %USER% -P %PASS% -H %BMC% chassis bootdev pxe * Enable deploy mode in bmgr: .. code-block:: shell bmgr alias override -o ipxe_boot ipxe_deploy_boot top2 * Start it and monitor the process with a remote console (either SOL or console redirection): .. code-block:: shell ipmitool -U %USER% -P %PASS% -H %BMC% chassis power on When the node is fully kickstarted, it will be in a state where: * A minimal set of packages is installed * Proxied repos are configured * The interface used for deployment is configured. The other ones are not. * A ssh daemon is running * Root's authorized_keys is deployed (with the given URL) If you have Mellanox cards as a multi-gigabyte Ethernet card, you may have to flash and configure the same way as the first node, see :ref:`mellanox_ethernet_setup`. Make sure the storage you intend to use as a `GlusterFS` brick is available and ready-to-use. We strongly recommend a filesystem label to be set on the gluster block device. Use ``xfs_admin -L`` to set a label on a XFS filesystem. Ethernet fabric configuration ----------------------------- Switch configuration ^^^^^^^^^^^^^^^^^^^^ The Ethernet fabric configuration may be configured by 2 different methods: * Manual initial configuration and generated configuration deployment * Zero touch provisioning (ZTP for Arista, POAP for Cisco Nexus) Zero touch provisionning in very specific to your hardware and may require third-party tools or servers. We will only document manual process in this general-purpose installation guide. .. note:: Cisco POAP is documented in this annex: :ref:`Cisco PowerOn Auto Provisioning` This process requires a manual step for the initial switch configuration. Connect to each switch using a serial console and set up remote access. This usually includes: * IP Address assignement on the management interface * Administrative user creation * Priviledged shell (aka `enable` mode) password setup. * Testing from a remote host Using the configuration file generated with confiture, test the configuration bits with the real-world switch. If everything seems good deploy it entirely using the already deployed TFTP server or HTTP server. .. note:: This step might be iterative : test on the switch, fix the confiture template, redeploy and so on. Node configuration ^^^^^^^^^^^^^^^^^^ .. todo:: Regarder sur les différents clusters si une information est utile ici Management stack deployment ---------------------------- Puppet server installation ^^^^^^^^^^^^^^^^^^^^^^^^^^ Now install puppet server and all required components on the first node: .. code-block:: shell yum install -y puppet4 puppetserver puppet-global puppet-extras puppet-addons git rubygem-r10k rubygem-hocon emacs-nox emacs-yaml-mode vim Create puppet's required git repos : .. code-block:: shell git clone --mirror /usr/share/puppet-global /var/lib/puppet-global git init --bare /var/lib/puppet-cccenv echo 'ref: refs/heads/production' > /var/lib/puppet-cccenv/HEAD git init --bare /var/lib/puppet-domain echo 'ref: refs/heads/production' > /var/lib/puppet-domain/HEAD Clone them locally: .. code-block:: shell mkdir /root/puppet cd /root/puppet git clone /var/lib/puppet-global global git clone /var/lib/puppet-cccenv cccenv git clone /var/lib/puppet-domain domain And bootstrap ``cccenv`` and ``domain`` repos: .. code-block:: shell cd /root/puppet/cccenv mkdir -p modules/empty/manifests files hieradata touch modules/empty/manifests/empty.pp git add . git commit -m 'Initial commit' git branch -m master production git push -u origin HEAD:production .. code-block:: bash cd /root/puppet/domain mkdir -p files/$(facter domain)/{all-nodes,nodes,hieradata} ln -sf ../files/$(facter domain)/hieradata hieradata/$(facter domain) git add . git commit -m 'Initial commit' git branch -m master production git push -u origin HEAD:production Set the `upstream` origin in case of `puppet-global` update: .. code-block:: shell cd /root/puppet/global git remote add upstream /usr/share/puppet-global Set the commiter's name and email for each repo: .. code-block:: shell git --git-dir /root/puppet/global/.git config --local user.name "Super Admin" git --git-dir /root/puppet/global/.git config --local user.mail "super.admin@ocean" git --git-dir /root/puppet/cccenv/.git config --local user.name "Super Admin" git --git-dir /root/puppet/cccenv/.git config --local user.mail "super.admin@ocean" git --git-dir /root/puppet/domain/.git config --local user.name "Super Admin" git --git-dir /root/puppet/domain/.git config --local user.mail "super.admin@ocean" Configure ``r10k`` manually, insert the following in ``/etc/puppetlabs/r10k/r10k.yaml``: .. code-block:: yaml --- :cachedir: /var/cache/r10k :sources: :global: remote: /var/lib/puppet-global basedir: /etc/puppetlabs/code/environments :deploy: purge_whitelist: [ ".resource_types/*", ".resource_types/**/*" ] Deploy the repos with ``r10k``: .. code-block:: shell r10k deploy environment -pv Configure master's ENC in ``/etc/puppetlabs/puppet/puppet.conf``: .. code-block:: ini [master] node_terminus = exec external_nodes = /sbin/puppet-external Start the ``puppetserver``: .. code-block:: shell systemctl enable --now puppetserver Set the current node (the first node) profile in ``/etc/puppet/puppet-groups.yaml``: .. code-block:: yaml environments: production: 'top1' roles: puppetserver: 'top1' Test and then apply this profile: .. code-block:: shell puppet-check -v --server $(facter fqdn) puppet-apply -v --server $(facter fqdn) .. note:: This will manage all the files and components required to launch a puppet server. The only unmanaged thing are the 3 repos in ``/var/lib/``. .. note:: Some warnings about missing ``augeas`` lenses may appear in ``puppet-check`` output, you can safely ignore them: .. code-block:: none [...] Augeas didn't load ... with Trapperkeep.lns [...] You now have a working puppet server. Profile setup ^^^^^^^^^^^^^ Ocean's includes a set a basic profiles that configures the management stack. Many of them requires configuration. Available profile are present in the ``hieradata/global`` folder of the ``global`` repo. .. warning:: **Following configuration files are only examples, adapt them with your deployment specificities** ClusterShell groups configuration """"""""""""""""""""""""""""""""" To have a convenient way to define nodes roles, define ClusterShell groups configuration this way: .. code-block:: shell-session sed -i -e 's/^default:.*/default: cluster/' /etc/clustershell/groups.conf cat >/etc/clustershell/groups.d/cluster.yaml < iburst # Record the rate at which the system clock gains/losses time. driftfile /var/lib/chrony/drift # Enable kernel RTC synchronization. rtcsync # In first three updates step the system clock instead of slew # if the adjustment is larger than 10 seconds. makestep 10 3 # Allow NTP client access from local network. allow logdir /var/log/chrony acquisitionport 123 Do the first sync with ``ntpdate``: .. code-block:: shell clush -bw top[2-3],worker[1-3] yum install -y ntpdate chrony clush -bw top[2-3],worker[1-3] ntpdate top1-mngt.$(facter domain) Configure ``chrony`` on nodes: .. code-block:: shell cat >/tmp/chrony.conf <`_. .. todo:: - Proposer un ordre d'installation de VM ? - Ne pas oublier dans ntp de supprimer top1 comme serveur de référence si cela a été fait .. todo:: Réintégrer ce qui suit dans une partie plus générale concernant Puppet. Classes principales affectant les différentes VMs admin1 admin2 batch1 batch2 db1 i0con1 i0conf2 i0log1 infra1 infra2 lb1 lb2 monitor1 ns1 ns2 ns3 nsrelay1 webrelay1 webrelay2 ns:: dns_client dns_server gluster_client ldap_server log_client log_client_islet0 mail_client monitored_server ntp_client nsrelay:: dns_client gluster_client ldap_fuse log_client mail_client monitored_server ntp_client webrelay:: dns_client gluster_client log_client log_client_islet0 mail_client monitored_server ntp_client webrelay ilog:: conman_server conman_server_islet0 dns_client gluster_client log_client log_client_islet0 mail_client monitored_server ntp_client infra:: dhcp_server dns_client gluster_client log_client log_client_islet0 mail_client monitored_server ntp_client tftp_server db:: auks_server clary_server dns_client gluster_client ldap_client log_client log_client_islet0 mail_client monitored_server ntp_client slurm_db lb:: dns_client dns_server gluster_client haproxy_server haproxy_server_http haproxy_server_ldap haproxy_server_puppet log_client log_client_islet0 mail_client monitored_server ntp_client i0conf:: bmgr bmgr_server dns_cache dns_client dns_server gluster_client log_client log_client_islet0 mail_client monitored_server ntp_client puppetserver puppetserver_islet0 racktables webserver webserver_islet0 Compute node installation ------------------------- .. todo:: Flash compute racks, configure switches, generate diskless images Routers, logins & other service nodes ------------------------------------- .. todo:: Flash services nodes, configure additional switches, kickstart nodes Interconnect fabric configuration --------------------------------- .. todo:: OpenSM/BXI AFM installation and configuration