Virtual machines operations

Architecture

Ocean’s infrastructure services are all hosted inside virtual machines. Virtual machines drives are stored inside a sharded GlusterFS volume and launched using a particular mode of the Pcocc tool. Pcocc uses qemu to launch VMs and requires some components or service:

  • an etcd backend to store, among other thing, managed VMs states

  • a shared access to VM images

  • a bridged access to the required networks.

When virtual machines are launched with pcocc using systemd, an heartbeat is performed by pcocc at a 30 seconds interval using a qemu-agent connection (meaning that the agent must be running within the VM). The result of the heartbeat is given to systemd service watchdog. This means that VM heartbeat can be disabled or tuned per VM with some systemd configuration (WatchdogSec). This mechanism ensures VM availability.

Hypervisor-level availability is achieved using fleet. fleet can be seen as a cluster-wide systemd manager that reacts to hypervisor incidents:

+------------+------------+-------------+------------+
|    Agent   |    Agent   |    Agent    |    Agent   |
+------------+------------+-------------+------------+
|  Pcocc VM  |  Pcocc VM  |   Pcocc VM  |  Pcocc VM  |
+------------+------------+-------------+------------+
|        SystemD          |         SystemD          |
+-------------------------+--------------------------+
|                       Fleet                        |
+----------------------------------------------------+

No HA is provided within the virtual machines.

Incident response

The following table resumes which component reacts in case of a given incident.

Incident

Reactionner

Response

A service crashes

SystemD

None (Can be restarted automatically if configured)

Hang

Pcocc

Notifies underlying SystemD watchdog. See Missed heatbeat

Missed heatbeat

SystemD

Kill (SIGABRT) pcocc process, restarted automatically.

QEMU/Pcocc crash

SystemD

Restarts automatically process after pausing (15 secs)

Fleet crash

SystemD

Restarts automatically, if it takes less than agent ttl (30 secs) no reaction. If more than agent ttl, see hypervisor crash.

Fleet stopped

Fleet

Fleet stops all locally launched services and reschedules them

Network partition

Fleet

Fleet stops all locally launched services if it can’t report to etcd. Treated as a crash for other cluster members

Hypervisor crash

Fleet

Fleet re-schedules launched services on other available hypervisors

Monitoring

Pcocc VM

Using the pcocc ps you can list your launched VMs :

# pcocc ps
ID     NAME          USER    PARTITION    NODES       DURATION             TIMELIMIT
--     ----          ----    ---------    -----       --------             ---------
518    batch1        root    N/A          top1        4 days, 3:08:53      N/A
519    lb2           root    N/A          worker3     4 days, 3:08:52      N/A
520    i54dkless1    root    N/A          islet55     4 days, 3:08:52      N/A
521    ns2           root    N/A          worker1     4 days, 3:08:52      N/A
522    batch2        root    N/A          top3        4 days, 3:08:51      N/A
523    i0conf2       root    N/A          top2        4 days, 3:08:51      N/A
524    admin1        root    N/A          top1        4 days, 3:08:50      N/A
525    i0conf1       root    N/A          worker1     4 days, 3:08:50      N/A
526    i54log1       root    N/A          islet55     4 days, 3:08:49      N/A
527    nsrelay1      root    N/A          top3        4 days, 3:08:49      N/A
528    admin2        root    N/A          top2        4 days, 3:08:49      N/A
529    infra1        root    N/A          top1        4 days, 3:08:48      N/A
530    ns1           root    N/A          worker1     4 days, 3:08:47      N/A
531    i0log1        root    N/A          top2        4 days, 3:08:46      N/A
532    infra2        root    N/A          top1        4 days, 3:08:46      N/A
533    lb1           root    N/A          top3        4 days, 3:08:45      N/A
534    db1           root    N/A          top3        4 days, 3:08:41      N/A
535    ns3           root    N/A          top3        4 days, 3:08:39      N/A
536    webrelay1     root    N/A          top3        4 days, 3:08:36      N/A
545    irene271b     root    N/A          irene271    4 days, 2:31:23      N/A
546    irene271a     root    N/A          irene271    4 days, 2:31:19      N/A
547    i54conf2      root    N/A          islet55     4 days, 2:29:45      N/A
548    i54conf1      root    N/A          islet55     4 days, 2:29:43      N/A
549    i54dkless2    root    N/A          islet55     3 days, 20:47:32     N/A
294    siteprep2     root    N/A          top1        19 days, 23:26:38    N/A
551    i38dkless1    root    N/A          islet39     2 days, 16:15:06     N/A
550    i38dkless2    root    N/A          islet38     2 days, 16:15:07     N/A

To see if a VM is still alive, you can you either ping the agent within the VM using the job ID:

# pcocc agent ping -j 518
1 VMs answered in 0.30s

Or run a command using the pcocc agent:

# pcocc agent run -j 518 df
Filesystem              1K-blocks       Used  Available Use% Mounted on
/dev/sda1                20961280    3218320   17742960  16% /
devtmpfs                  1598056          0    1598056   0% /dev
tmpfs                     1621840          0    1621840   0% /dev/shm
tmpfs                     1621840      16892    1604948   2% /run
tmpfs                     1621840          0    1621840   0% /sys/fs/cgroup
/dev/mapper/system-var   25149444     827132   24322312   4% /var
top1-data:/volspoms2   5366088704   24639872 5341448832   1% /volspoms2
top1-data:/volspoms1   8049133056 2124307712 5924825344  27% /volspoms1
tmpfs                     1621840          0    1621840   0% /tmp

Use pcocc console to monitor the VM console, with the -l option to get history:

# pcocc console -J admin1 vm0


CentOS Linux 7 (Core)
Kernel 3.10.0-957.21.3.el7.x86_64 on an x86_64

admin1 login:
# pcocc console -J admin1 -l vm0
[...] LESS MODE [...]

Exiting the pcocc console is done by typing 3 Ctrl-C in less than 2 seconds.

Pcocc on SystemD

Ocean’s Pcocc deployment creates systemd services named by VM name prefixed by pcocc-vm-.

Even if services are managed using fleet, systemd services are still visible on the VM hypervisor.

# systemctl status pcocc-vm-infra1.service
  pcocc-vm-infra1.service - Fleet service for pcocc VM infra1
   Loaded: loaded (/run/fleet/units/pcocc-vm-infra1.service; linked-runtime; vendor preset: disabled)
   Active: active (running) since Thu 2019-10-03 07:34:54 CEST; 4 days ago
  Process: 44874 ExecStartPre=/usr/sbin/prepare-ocean-image.sh infra1 (code=exited, status=0/SUCCESS)
 Main PID: 44912 (pcocc)
   Status: "Watchdog successful at 2019-10-07 11:00:56.930856"
   CGroup: /system.slice/pcocc-vm-infra1.service
           ├─44912 /usr/bin/python /usr/bin/pcocc -vv alloc -E sleep infinity -c 4 -m 1000 -J infra1 infra1
           ├─45675 /usr/bin/python /usr/bin/pcocc -vv internal launcher -E sleep infinity infra1
           ├─45891 /usr/bin/python /usr/bin/pcocc -vv internal run
           ├─45901 sleep infinity
           └─46022 qemu-system-x86_64 -machine type=pc,accel=kvm -cpu host -S -rtc base=utc -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16 -device virtio-scsi-pci,id=scsi0 -object iothread,id=ioth-drive0 -device scsi-hd,id=scsi-hd-drive0,bus...

Oct 07 10:58:51 top1 pcocc[44912]: DEBUG:root:Sending agent sync {"execute":"guest-sync", "arguments": { "id": 430683190 }}
Oct 07 10:59:05 top1 pcocc[44912]: DEBUG:etcd.client:Writing  to key /pcocc/global/users/root/batch-local/heartbeat/529 ttl=60 dir=False append=False
Oct 07 10:59:22 top1 pcocc[44912]: DEBUG:root:Sending agent sync {"execute":"guest-sync", "arguments": { "id": 508227119 }}
Oct 07 10:59:35 top1 pcocc[44912]: DEBUG:etcd.client:Writing  to key /pcocc/global/users/root/batch-local/heartbeat/529 ttl=60 dir=False append=False
Oct 07 10:59:53 top1 pcocc[44912]: DEBUG:root:Sending agent sync {"execute":"guest-sync", "arguments": { "id": 143384388 }}
Oct 07 11:00:05 top1 pcocc[44912]: DEBUG:etcd.client:Writing  to key /pcocc/global/users/root/batch-local/heartbeat/529 ttl=60 dir=False append=False
Oct 07 11:00:24 top1 pcocc[44912]: DEBUG:root:Sending agent sync {"execute":"guest-sync", "arguments": { "id": 817870353 }}
Oct 07 11:00:35 top1 pcocc[44912]: DEBUG:etcd.client:Writing  to key /pcocc/global/users/root/batch-local/heartbeat/529 ttl=60 dir=False append=False
Oct 07 11:00:55 top1 pcocc[44912]: DEBUG:root:Sending agent sync {"execute":"guest-sync", "arguments": { "id": 236969055 }}
Oct 07 11:01:06 top1 pcocc[44912]: DEBUG:etcd.client:Writing  to key /pcocc/global/users/root/batch-local/heartbeat/529 ttl=60 dir=False append=False

systemctl status shows VM state, last pcocc logs and reports when was the last hearbeat (Status: line).

Simple monitoring can be done using systemctl is-failed or systemctl is-active commands:

# systemctl is-active pcocc-vm-admin1
active
# systemctl is-failed pcocc-vm-admin1
active

Global state of the systemd daemon can be given with the systemctl is-system-running command:

# systemctl is-system-running
running

Fleet cluster

Fleet cluster can be monitored with fleetctl list-* commands:

list-machines

Reports registered fleet members.

# fleetctl list-machines
MACHINE            HOSTNAME    IP         METADATA
21f839cc...        islet39     10.1.0.39  hostname=islet39,role=islet
24d45a17...        islet12     10.1.0.12  hostname=islet12,role=islet
29b0afbf...        top2        10.1.0.2   hostname=top2,role=top
3280f87e...        top3        10.1.0.3   hostname=top3,role=top
551bb3cf...        islet55     10.1.0.55  hostname=islet55,role=islet
5b285f6d...        islet13     10.1.0.13  hostname=islet13,role=islet
5e1a5a8f...        islet38     10.1.0.38  hostname=islet38,role=islet
70b8f750...        irene271    10.8.0.171 hostname=irene271,role=router
944cd548...        top1        10.1.0.1   hostname=top1,role=top
9dde1b72...        worker3     10.1.0.6   hostname=worker3,role=top
a1ff44e6...        worker1     10.1.0.4   hostname=worker1,role=top
a858697c...        islet54     10.1.0.54  hostname=islet54,role=islet
c8014c48...        worker2     10.1.0.5   hostname=worker2,role=top

list-unit-files

Reports registered services and their state.

# fleetctl list-unit-files
UNIT                        HASH    DSTATE   STATE    TARGET
pcocc-vm-admin1.service     7b97047 launched launched 944cd548.../top1
pcocc-vm-admin2.service     4faba09 launched launched 29b0afbf.../top2
pcocc-vm-batch1.service     f31950f launched launched 944cd548.../top1
pcocc-vm-batch2.service     aa142d2 launched launched 3280f87e.../top3
pcocc-vm-db1.service        a0820a4 launched launched 3280f87e.../top3
pcocc-vm-i0conf1.service    8a1579d launched launched a1ff44e6.../worker1
pcocc-vm-i0conf2.service    285ac8a launched launched 29b0afbf.../top2
pcocc-vm-i42conf2.service   0dfffef loaded   inactive -
pcocc-vm-i42dkless1.service 4dce60d loaded   inactive -
pcocc-vm-i42dkless2.service da95002 loaded   inactive -
pcocc-vm-i42log1.service    79a68f4 loaded   inactive -
[...]

Fleet service state (or destination state - DSTATE) can be the following:

inactive

Registered service but not scheduled anywhere

loaded

Scheduled on a cluster member

launched

Launched on a cluster member

list-units

Reports scheduled services and their execution state

# fleetctl list-units
UNIT                       MACHINE                ACTIVE   SUB
pcocc-vm-admin1.service    944cd548.../top1       active   running
pcocc-vm-admin2.service    29b0afbf.../top2       active   running
pcocc-vm-batch1.service    944cd548.../top1       active   running
pcocc-vm-batch2.service    3280f87e.../top3       active   running
pcocc-vm-db1.service       3280f87e.../top3       active   running
pcocc-vm-i0conf1.service   a1ff44e6.../worker1    active   running
pcocc-vm-i0conf2.service   29b0afbf.../top2       active   running
pcocc-vm-i0log1.service    29b0afbf.../top2       active   running
pcocc-vm-i12conf1.service  24d45a17.../islet12    inactive dead
pcocc-vm-i12conf2.service  5b285f6d.../islet13    inactive dead
[...]

Logs

Todo

Show how to interpret fleet and pcocc log files

Operations

Pcocc

Several operations are available on Pcocc VM like snapshoting, reset or command execution:

Reset a VM using pcocc reset
# pcocc reset -J admin1 vm0
1 VMs reset in 0.1s
Execute an arbitrary command inside the VM
# pcocc agent run -J admin1 hostname
admin1
Snapshot a VM (safe for the VM)
# pcocc save -J admin1 --dest /tmp/admin1.qcow2
Copying drive data...   8%  00:05:29  (4609.88MB / 51200.00MB)
[...]

Fleet services

Fleet-managed services can be operated using the fleetctl command:

Submit a new unit
# fleetctl submit /usr/share/doc/fleet-1.0.0_31_g58eadf1/examples/hello.service
Unit hello.service inactive
Schedule a unit
# fleetctl load hello.service
Unit hello.service loaded on a858697c.../islet54
Start a unit
# fleetctl start hello.service
Unit hello.service launched on a858697c.../islet54
Stop a unit
# fleetctl stop pcocc-vm-monitor1.service
Unit pcocc-vm-monitor1.service loaded on c8014c48.../worker2
Successfully stopped units [pcocc-vm-monitor1.service].
Destroy a unit (stop, unschedule and unload)
# fleetctl destroy hello.service
Destroyed hello.service

Fleet cluster

There is no way to directly change or mutate the fleet cluster state, if you need to evacuate an hypervisor you have to stop the fleetd daemon on the hypervisor. This will trigger a rescheduling of all locally launched services on the other hypervisors.

Note that, fleet will never rebalance the cluster by itself. Meaning that any hypervisor evacuation will unbalance the cluster and rebalancing is a manual process.

To do so, gather the current load of all hypervisors (based on the Weight configuration of each service).

# clush -Bw $(fleetctl list-machines --no-legend --fields hostname | nodeset -f) -R exec "fleetctl list-units --no-legend --fields hostname,unit | awk '/%h/ {print \$2}' | xargs -r -n 1 fleetctl cat | sed -n 's/Weight=\(.*\)/\1/p' | paste -s -d+ | bc"
---------------
islet[12,38],top3,worker1 (4)
---------------
24000
---------------
islet[13,39] (2)
---------------
20000
---------------
worker[2-3] (2)
---------------
4000
---------------
irene271
---------------
120000
---------------
islet55
---------------
44000
---------------
top1
---------------
28000
---------------
top2
---------------
40000

And if you want to move a resource, use fleetctl unload and fleetctl start to unload and load (and start) the resource.

# fleetctl unload pcocc-vm-i0log1.service
Triggered unit pcocc-vm-i0log1.service unload
Successfully unloaded units [pcocc-vm-i0log1.service].
# fleetctl start pcocc-vm-i0log1.service
Triggered unit pcocc-vm-i0log1.service start
Triggered unit pcocc-vm-i0log1.service start