Prometheus

This simple pack (based on a community pack) does 2 things:

  • It gives the means to query prometheus

  • It gives a webhook receiver directly usable by AlertManager.

Actions

prometheus.query

Executes an instant query using Prometheus query API

Requires a query parameters representing the query to execute.

Optionnaly, it can use the url parameter to change the URL where Prometheus API is reachable. This default to the url parameter of the pack’s configuration.

# st2 run prometheus.query query="ALERTS{}"
id: 5ea17fcd049f2e425784c8b6
status: succeeded
parameters:
  query: ALERTS{}
  url: http://monitor1.mg1.hpc.domain.fr/prometheus
result:
  exit_code: 0
  result:
    data:
      result:
      - metric:
          __name__: ALERTS
          alertname: Infiniband DF+ All-to-All missing link
          alertstate: firing
          cluster: ppi
          desc: i38r2isw3
          remote_desc: i46r2isw3
          service: Infiniband
          severity: warning
        value:
        - 1587642318.073
        - '1'
      resultType: vector
    status: success
  stderr: ''
  stdout: ''
prometheus.series

List available time series using Prometheus series API.

Requires a queries parameter representing the URL parameters as specified in Prometheus’s documentation

# st2 run prometheus.series queries="match[]=up"
id: 5ea18029049f2e425784c8b9
status: succeeded
parameters:
  queries:
    match[]: up
  url: http://monitor1.mg1.hpc.domain.fr/prometheus
result:
  exit_code: 0
  result:
    data:
    - __name__: up
      cluster: ppi
      fabric: hdr-compute
      instance: irene245:9199
      job: infiniband_compute
      service: infiniband
[...]
stderr: ''
stdout: ''

Rules

The pack is shipped with an example rule on how to setup a webhook sensor on StackStorm and how to configure it in Prometheus.

This rule uses the core.st2.webhook trigger type which creates on-demand generic sensor. This kind of rule allow an external program to post some arbitrary data at StackStorm webhook endpoint (https://HOST/api/v1/webhook/NAME, where NAME is the url parameter of the st2.core.webhook)

Thus, the following receiver can be added to AlertManager :

# /etc/alertmanager/alertmanager.yaml
[...]
receivers:
- name: admins
  email_configs:
  - to: admins@mg1.hpc.domain.fr
    send_resolved: true
    html: '{{ template "email.html" . }}'
    headers:
      Subject: '{{ template "ppi_subject" . }}'
  webhook_configs:
  - send_resolved: true
    url: https://auto1.mg1.hpc.domain.fr/api/v1/webhooks/prometheus_webhook
[...]

Which will emits trigger like the following :

# st2 trigger-instance get 5ea167c8049f2e427f080b0c -y
id: 5ea167c8049f2e427f080b0c
occurrence_time: '2020-04-23T10:02:48.000000Z'
payload:
    body:
      [...]
    headers:
        Accept: '*/*'
        Content-Length: '14'
        Content-Type: application/json
        Host: auto1.mg1.hpc.domain.fr,auto1.mg1.hpc.domain.fr
        User-Agent: curl/7.29.0
        X-Forwarded-For: 127.0.0.1
        X-Real-Ip: 127.0.0.1
        X-Request-Id: 766098ce-f8e6-4a21-9204-df63bbf3c1bb
status: processed
trigger: core.6023c805-8b30-4bf6-9fbc-e4f34a850a47

Where the body of the request is a JSON in the following format:

{
"version": "4",
"groupKey": <string>,    // key identifying the group of alerts (e.g. to deduplicate)
"status": "<resolved|firing>",
"receiver": <string>,
"groupLabels": <object>,
"commonLabels": <object>,
"commonAnnotations": <object>,
"externalURL": <string>,  // backlink to the Alertmanager.
"alerts": [
  {
    "status": "<resolved|firing>",
    "labels": <object>,
    "annotations": <object>,
    "startsAt": "<rfc3339>",
    "endsAt": "<rfc3339>",
    "generatorURL": <string> // identifies the entity that caused the alert
  },
  ...
]
}