For the production-grade cluster, we usually want to set up some kind of monitoring. At the date, there is not a specific way to monitor Docker service and tasks in Swarm mode. We did this for Swarm2k with Telegraf, InfluxDB, and Grafana.
InfluxDB is a time-series database, which is easy to install because of no dependency. InfluxDB is useful to store metrics, information about events, and use them for later analysis. For Swarm2k, we used InfluxDB to store information of cluster, nodes, events, and for tasks with Telegraf.
Telegraf is pluggable and has a certain number of input plugins useful to observe the system environment.
We developed a new plugin for Telegraf to store stats into InfluxDB. This plugin can be found at http://github.com/chanwit/telegraf. Data may contain values, tags, and timestamp. Values will be computed or aggregated based on timestamp. Additionally, tags will allow you to group these values together based on timestamp.
The Telegraf Swarm plugin collects data and creates the following series containing values, which we identified as the most interesting for Swarmk2, tags, and timestamp into InfluxDB:
swarm_node
: This series contains cpu_shares
and memory
as values and allow you to be grouped by node_id
and node_hostname
tags.swarm
: This series contains `n_nodes` for number of nodes, n_services
for number of services, and n_tasks
for number of tasks. This series does not contain tags.swarm_task_status
: This series contains number of tasks grouped by status at a time. Tags of this series are tasks status names, for example, Started, Running, and Failed.To enable the Telegraf Swarm plugin, we will need to tweak telegraf.conf
by adding the following configuration:
# Read metrics about swarm tasks and services [[inputs.swarm]] # Docker Endpoint # To use TCP, set endpoint = "tcp://[ip]:[port]" # To use environment variables (ie, docker-machine), set endpoint = "ENV" endpoint = "unix:///var/run/docker.sock" timeout = ā10sā
First, set up an instance of InfluxDB as follows:
$ docker run -d -p 8083:8083 -p 8086:8086 --expose 8090 --expose 8099 -e PRE_CREATE_DB=telegraf --name influxsrv tutum/influxdb
Then, set up an instance of Grafana, as follows:
docker run -d -p 80:3000 -e HTTP_USER=admin -e HTTP_PASS=admin -e INFLUXDB_HOST=$(belt ip influxdb) -e INFLUXDB_PORT=8086 -e INFLUXDB_NAME=telegraf -e INFLUXDB_USER=root -e INFLUXDB_PASS=root --name grafana grafana/grafana
After we setup an instance of Grafana, we can create the dashboard from the following JSON configuration:
https://objects-us-west-1.dream.io/swarm2k/swarm2k_final_grafana_dashboard.json
To connect the dashboard to InfluxDB, we will have to define the default data source and point it to the InfluxDB host port 8086
. Here's the JSON configuration to define the data source. Replace $INFLUX_DB_IP
with your InfluxDB instance.
{ "name":"telegraf", "type":"influxdb", "access":"proxy", "url":"http://$INFLUX_DB_IP:8086", "user":"root", "password":"root", "database":"telegraf", "basicAuth":true, "basicAuthUser":"admin", "basicAuthPassword":"admin", "withCredentials":false, "isDefault":true }
After linking everything together, we'll see a dashboard like this: