Monitoring Swarm2k

For the production-grade cluster, we usually want to set up some kind of monitoring. At the date, there is not a specific way to monitor Docker service and tasks in Swarm mode. We did this for Swarm2k with Telegraf, InfluxDB, and Grafana.

InfluxDB Time-Series Database

InfluxDB is a time-series database, which is easy to install because of no dependency. InfluxDB is useful to store metrics, information about events, and use them for later analysis. For Swarm2k, we used InfluxDB to store information of cluster, nodes, events, and for tasks with Telegraf.

Telegraf is pluggable and has a certain number of input plugins useful to observe the system environment.

Telegraf Swarm plugin

We developed a new plugin for Telegraf to store stats into InfluxDB. This plugin can be found at http://github.com/chanwit/telegraf. Data may contain values, tags, and timestamp. Values will be computed or aggregated based on timestamp. Additionally, tags will allow you to group these values together based on timestamp.

The Telegraf Swarm plugin collects data and creates the following series containing values, which we identified as the most interesting for Swarmk2, tags, and timestamp into InfluxDB:

  • Series swarm_node: This series contains cpu_shares and memory as values and allow you to be grouped by node_id and node_hostname tags.
  • Series swarm: This series contains `n_nodes` for number of nodes, n_services for number of services, and n_tasks for number of tasks. This series does not contain tags.
  • Series swarm_task_status: This series contains number of tasks grouped by status at a time. Tags of this series are tasks status names, for example, Started, Running, and Failed.

To enable the Telegraf Swarm plugin, we will need to tweak telegraf.conf by adding the following configuration:

# Read metrics about swarm tasks and services
[[inputs.swarm]]
  # Docker Endpoint
  #   To use TCP, set endpoint = "tcp://[ip]:[port]"
  #   To use environment variables (ie, docker-machine), set endpoint = 
      "ENV"
  endpoint = "unix:///var/run/docker.sock"
  timeout = ā€œ10sā€

First, set up an instance of InfluxDB as follows:

 $ docker run -d 
  -p 8083:8083 
  -p 8086:8086 
  --expose 8090 
  --expose 8099 
  -e PRE_CREATE_DB=telegraf 
  --name influxsrv
  tutum/influxdb

Then, set up an instance of Grafana, as follows:

docker run -d 
            -p 80:3000 
            -e HTTP_USER=admin 
            -e HTTP_PASS=admin 
            -e INFLUXDB_HOST=$(belt ip influxdb) 
            -e INFLUXDB_PORT=8086 
            -e INFLUXDB_NAME=telegraf 
            -e INFLUXDB_USER=root 
            -e INFLUXDB_PASS=root 
            --name grafana 
            grafana/grafana

After we setup an instance of Grafana, we can create the dashboard from the following JSON configuration:

https://objects-us-west-1.dream.io/swarm2k/swarm2k_final_grafana_dashboard.json

To connect the dashboard to InfluxDB, we will have to define the default data source and point it to the InfluxDB host port 8086. Here's the JSON configuration to define the data source. Replace $INFLUX_DB_IP with your InfluxDB instance.

{
      "name":"telegraf",
      "type":"influxdb",
      "access":"proxy",
      "url":"http://$INFLUX_DB_IP:8086",
      "user":"root",
      "password":"root",
      "database":"telegraf",
      "basicAuth":true,
      "basicAuthUser":"admin",
      "basicAuthPassword":"admin",
      "withCredentials":false,
      "isDefault":true
}

After linking everything together, we'll see a dashboard like this:

Telegraf Swarm plugin
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset