Metric aggregations

As explained in the previous sections, metric aggregations allow you to find out the statistical measurement of the data, which includes the following:

  • Computing basic statistics
    • Computing in a combined way: stats aggregation
    • Computing separately : min, max, sum, value_count, aggregations
  • Computing extended statistics: extended_stats aggregation
  • Computing distinct counts: cardinality aggregation

    Note

    Metric aggregations are fundamentally categorized in two forms:

    • single-value metric: min, max, sum, value_count, avg, and cardinality aggregations
    • multi-value metric: stats and extended_stats aggregations

Computing basic stats

The basic statistics include: min, max, sum, count, and avg. These statistics can be computed in the following two ways and can only be performed on numeric fields.

Combined stats

All the stats mentioned previously can be calculated with a single aggregation query.

Python example

query = {
 "aggs": {
   "follower_counts_stats": {
     "stats": {
       "field": "user.followers_count"
     }
   }
 }
}
res = es.search(index='twitter', doc_type='tweets', body=query)
print resp

The response would be as follows:

"aggregations": {
      "follower_counts_stats": {
         "count": 124,
         "min": 2,
         "max": 38121,
         "avg": 2102.814516129032,
         "sum": 260749
      }
   }

In the preceding response, count is the total values on which the aggregation is executed.

  • min is the minimum follower count of a user
  • max is the maximum follower count of a user
  • avg is the average count of followers
  • Sum is the addition of all the followers count

Java example

Note

In Java, all the metric aggregations can be created using the MetricsAggregationBuilder and AggregationBuilders classes. However, you need to import a specific package into your code to parse the results.

To build and execute a stats aggregation in Java, first do the following imports in the code:

import org.elasticsearch.search.aggregations.metrics.stats.Stats;

Then build the aggregation in the following way:

MetricsAggregationBuilder aggregation =
        AggregationBuilders
                .stats("follower_counts_stats")
                .field("user.followers_count");

This aggregation can be executed with the following code snippet:

SearchResponse response = client.prepareSearch(indexName).setTypes(docType).setQuery(QueryBuilders.matchAllQuery())
  .addAggregation(aggregation)
  .execute().actionGet();

The stats aggregation response can be parsed as follows:

Stats agg = sr.getAggregations().get("follower_counts_stats");
long min = agg.getMin();
long max = agg.getMax();
double avg = agg.getAvg();
long sum = agg.getSum();
long count = agg.getCount();

Computing stats separately

In addition to computing these basic stats in a single query, Elasticsearch provides multiple aggregations to compute them one by one. The following are the aggregation types that fall into this category:

  • value_count: This counts the number of values that are extracted from the aggregated documents
  • min: This finds the minimum value among the numeric values extracted from the aggregated documents
  • max: This finds the maximum value among the numeric values extracted from the aggregated documents
  • avg: This finds the average value among the numeric values extracted from the aggregated documents
  • sum: This finds the sum of all the numeric values extracted from the aggregated documents

To perform these aggregations, you just need to use the following syntax:

{
 "aggs": {
   "aggaregation_name": {
     "aggrigation_type": {
       "field": "name_of_the_field"
     }
   }
 }
}

Python example

query = {
 "aggs": {
   "follower_counts_stats": {
     "sum": {
       "field": "user.followers_count"
     }
   }
 },"size": 0
}
res = es.search(index='twitter', doc_type='tweets', body=query)

We used the sum aggregation type in the preceding query; for other aggregations such as min, max, avg, and value_count, just replace the type of aggregation in the query.

Java example

To perform these aggregations using the Java client, you need to follow this syntax:

MetricsAggregationBuilder aggregation =
        AggregationBuilders
                .sum("follower_counts_stats")
                .field("user.followers_count");

Note that in the preceding aggregation, instead of sum, you just need to call the corresponding aggregation type to build other types of metric aggregations such as, min, max, count, and avg. The rest of the syntax remains the same.

For parsing the responses, you need to import the correct package according to the aggregation type. The following are the imports that you will need:

  • For min aggregation:
    import org.elasticsearch.search.aggregations.metrics.min.Min;

    The parsing response will be as follows:

    Min agg = response.getAggregations().get("follower_counts_stats");
    double value = agg.getValue();
  • For max aggregation:
    import org.elasticsearch.search.aggregations.metrics.min.Max;

    The parsing response will be:

    Max agg = response.getAggregations().get("follower_counts_stats");
    double value = agg.getValue();
  • For avg aggregation:
    import org.elasticsearch.search.aggregations.metrics.min.Avg;

    The parsing response will be this:

    Avg agg = response.getAggregations().get("follower_counts_stats");
    double value = agg.getValue();
  • For sum aggregation:
    import org.elasticsearch.search.aggregations.metrics.min.Sum;

    This will be the parsing response:

    Sum agg = response.getAggregations().get("follower_counts_stats");
    double value = agg.getValue();

    Note

    Stats aggregations cannot contain sub aggregations. However, they can be a part of the sub aggregations of buckets.

Computing extended stats

The extended_stats aggregation is the extended version of stats aggregation and provides advanced statistics of the data, which include sum of square, variance, standard deviation, and standard deviation bounds.

So, if we hit the query with the extended_stats aggregation on the followers count field, we will get the following data:

 "aggregations": {
      "follower_counts_stats": {
         "count": 124,
         "min": 2,
         "max": 38121,
         "avg": 2102.814516129032,
         "sum": 260749,
         "sum_of_squares": 3334927837,
         "variance": 22472750.441402186,
         "std_deviation": 4740.543264374051,
         "std_deviation_bounds": {
            "upper": 11583.901044877135,
            "lower": -7378.272012619071
         }
      }
   }
}

Python example

query = {
     "aggs": {
       "follower_counts_stats": {
         "extended_stats": {
           "field": "user.followers_count"
         }
       }
     }
    },"size": 0
res = es.search(index='twitter', doc_type='tweets', body=query)

Java example

An extended aggregation is build using the Java client in the following way:

MetricsAggregationBuilder aggregation =
        AggregationBuilders
                .extendedStats("agg_name")
                .field("user.follower_count");

To parse the response of the extended_stats aggregation in Java, you need to have the following import statement:

import org.elasticsearch.search.aggregations.metrics.stats.extended.ExtendedStats;

Then the response can parsed in the following way:

ExtendedStats agg = response.getAggregations().get("agg_name");
double min = agg.getMin();
double max = agg.getMax();
double avg = agg.getAvg();
double sum = agg.getSum();
long count = agg.getCount();
double stdDeviation = agg.getStdDeviation();
double sumOfSquares = agg.getSumOfSquares();
double variance = agg.getVariance();

Finding distinct counts

The count of a distinct value of a field can be calculated using the cardinality aggregation. For example, we can use this to calculate unique users:

{
  "aggs": {
    "unique_users": {
      "cardinality": {
        "field": "user.screen_name"
      }
    }
  }
}

The response will be as follows:

"aggregations": {
      "unique_users": {
         "value": 122
      }
   }

Java example

Cardinality aggregation is built using the Java client in the following way:

MetricsAggregationBuilder aggregation =
        AggregationBuilders
                .cardinality("unique_users")
                .field("user.screen_name");

To parse the response of the cardinality aggregation in Java, you need to have the following import statement:

import org.elasticsearch.search.aggregations.metrics.cardinality.Cardinality;

Then the response can parsed in the following way:

Cardinality agg = response.getAggregations().get("unique_users");
long value = agg.getValue();
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset