As explained in the previous sections, metric aggregations allow you to find out the statistical measurement of the data, which includes the following:
stats
aggregationmin
, max
, sum
, value_count
, aggregationsextended_stats
aggregationcardinality
aggregationThe basic statistics include: min
, max
, sum
, count
, and avg
. These statistics can be computed in the following two ways and can only be performed on numeric fields.
All the stats mentioned previously can be calculated with a single aggregation query.
Python example
query = {
"aggs": {
"follower_counts_stats": {
"stats": {
"field": "user.followers_count"
}
}
}
}
res = es.search(index='twitter', doc_type='tweets', body=query)
print resp
The response would be as follows:
"aggregations": { "follower_counts_stats": { "count": 124, "min": 2, "max": 38121, "avg": 2102.814516129032, "sum": 260749 } }
In the preceding response, count is the total values on which the aggregation is executed.
min
is the minimum follower count of a usermax
is the maximum follower count of a useravg
is the average count of followersSum
is the addition of all the followers countJava example
To build and execute a stats
aggregation in Java, first do the following imports in the code:
import org.elasticsearch.search.aggregations.metrics.stats.Stats;
Then build the aggregation in the following way:
MetricsAggregationBuilder aggregation =
AggregationBuilders
.stats("follower_counts_stats")
.field("user.followers_count");
This aggregation can be executed with the following code snippet:
SearchResponse response = client.prepareSearch(indexName).setTypes(docType).setQuery(QueryBuilders.matchAllQuery()) .addAggregation(aggregation) .execute().actionGet();
The stats
aggregation response can be parsed as follows:
Stats agg = sr.getAggregations().get("follower_counts_stats"); long min = agg.getMin(); long max = agg.getMax(); double avg = agg.getAvg(); long sum = agg.getSum(); long count = agg.getCount();
In addition to computing these basic stats in a single query, Elasticsearch provides multiple aggregations to compute them one by one. The following are the aggregation types that fall into this category:
value_count
: This counts the number of values that are extracted from the aggregated documentsmin
: This finds the minimum value among the numeric values extracted from the aggregated documentsmax
: This finds the maximum value among the numeric values extracted from the aggregated documentsavg
: This finds the average value among the numeric values extracted from the aggregated documentssum
: This finds the sum of all the numeric values extracted from the aggregated documentsTo perform these aggregations, you just need to use the following syntax:
{ "aggs": { "aggaregation_name": { "aggrigation_type": { "field": "name_of_the_field" } } } }
Python example
query = {
"aggs": {
"follower_counts_stats": {
"sum": {
"field": "user.followers_count"
}
}
},"size": 0
}
res = es.search(index='twitter', doc_type='tweets', body=query)
We used the sum
aggregation type in the preceding query; for other aggregations such as min
, max
, avg
, and value_count
, just replace the type of aggregation in the query.
Java example
To perform these aggregations using the Java client, you need to follow this syntax:
MetricsAggregationBuilder aggregation =
AggregationBuilders
.sum("follower_counts_stats")
.field("user.followers_count");
Note that in the preceding aggregation, instead of sum, you just need to call the corresponding aggregation type to build other types of metric aggregations such as, min
, max
, count
, and avg
. The rest of the syntax remains the same.
For parsing the responses, you need to import the correct package according to the aggregation type. The following are the imports that you will need:
import org.elasticsearch.search.aggregations.metrics.min.Min;
The parsing response will be as follows:
Min agg = response.getAggregations().get("follower_counts_stats"); double value = agg.getValue();
import org.elasticsearch.search.aggregations.metrics.min.Max;
The parsing response will be:
Max agg = response.getAggregations().get("follower_counts_stats"); double value = agg.getValue();
import org.elasticsearch.search.aggregations.metrics.min.Avg;
The parsing response will be this:
Avg agg = response.getAggregations().get("follower_counts_stats"); double value = agg.getValue();
import org.elasticsearch.search.aggregations.metrics.min.Sum;
This will be the parsing response:
Sum agg = response.getAggregations().get("follower_counts_stats"); double value = agg.getValue();
The extended_stats
aggregation is the extended version of stats
aggregation and provides advanced statistics of the data, which include sum of square, variance, standard deviation, and standard deviation bounds.
So, if we hit the query with the extended_stats
aggregation on the followers count field, we will get the following data:
"aggregations": { "follower_counts_stats": { "count": 124, "min": 2, "max": 38121, "avg": 2102.814516129032, "sum": 260749, "sum_of_squares": 3334927837, "variance": 22472750.441402186, "std_deviation": 4740.543264374051, "std_deviation_bounds": { "upper": 11583.901044877135, "lower": -7378.272012619071 } } } }
Python example
query = {
"aggs": {
"follower_counts_stats": {
"extended_stats": {
"field": "user.followers_count"
}
}
}
},"size": 0
res = es.search(index='twitter', doc_type='tweets', body=query)
Java example
An extended aggregation is build using the Java client in the following way:
MetricsAggregationBuilder aggregation =
AggregationBuilders
.extendedStats("agg_name")
.field("user.follower_count");
To parse the response of the extended_stats
aggregation in Java, you need to have the following import
statement:
import org.elasticsearch.search.aggregations.metrics.stats.extended.ExtendedStats;
Then the response can parsed in the following way:
ExtendedStats agg = response.getAggregations().get("agg_name"); double min = agg.getMin(); double max = agg.getMax(); double avg = agg.getAvg(); double sum = agg.getSum(); long count = agg.getCount(); double stdDeviation = agg.getStdDeviation(); double sumOfSquares = agg.getSumOfSquares(); double variance = agg.getVariance();
The count of a distinct value of a field can be calculated using the cardinality aggregation. For example, we can use this to calculate unique users:
{
"aggs": {
"unique_users": {
"cardinality": {
"field": "user.screen_name"
}
}
}
}
The response will be as follows:
"aggregations": { "unique_users": { "value": 122 } }
Java example
Cardinality aggregation is built using the Java client in the following way:
MetricsAggregationBuilder aggregation =
AggregationBuilders
.cardinality("unique_users")
.field("user.screen_name");
To parse the response of the cardinality aggregation in Java, you need to have the following import
statement:
import org.elasticsearch.search.aggregations.metrics.cardinality.Cardinality;
Then the response can parsed in the following way:
Cardinality agg = response.getAggregations().get("unique_users"); long value = agg.getValue();