In the previous chapter, we got familiar with each component of ELK Stack—Elasticsearch, Logstash, and Kibana. We got the components installed and configured. In this chapter, we will build our first basic data pipeline using ELK Stack. This will help us understand how easy it is to get together the components of ELK Stack to build an end-to-end analytics pipeline.
While running the example in this chapter, we assume that you already installed Elasticsearch, Logstash, and Kibana as described in Chapter 1, Introduction to ELK Stack.
For our example, the dataset that we are going to use here is the daily Google (GOOG) Quotes price dataset over a 6 month period from July 1, 2014 to December 31, 2014. This is a good dataset to understand how we can quickly analyze simple datasets, such as these, with ELK.
The most significant fields of this dataset are Date
, Open Price
, Close Price
, High Price
, Volume
, and Adjusted Price
.
The following table shows some of the sample data from the dataset. The actual dataset is in the CSV format.
Date |
Open |
High |
Low |
Close |
Volume |
Adj Close |
---|---|---|---|---|---|---|
Dec 31, 2014 |
531.25 |
532.60 |
525.80 |
526.40 |
1,368,200 |
526.40 |
Dec 30, 2014 |
528.09 |
531.15 |
527.13 |
530.42 |
876,300 |
530.42 |
Dec 29, 2014 |
532.19 |
535.48 |
530.01 |
530.33 |
2,278,500 |
530.33 |
Dec 26, 2014 |
528.77 |
534.25 |
527.31 |
534.03 |
1,036,000 |
534.03 |
Dec 24, 2014 |
530.51 |
531.76 |
527.02 |
528.77 |
705,900 |
528.77 |
Dec 23, 2014 |
527.00 |
534.56 |
526.29 |
530.59 |
2,197,600 |
530.59 |
Dec 22, 2014 |
516.08 |
526.46 |
516.08 |
524.87 |
2,723,800 |
524.87 |
Dec 19, 2014 |
511.51 |
517.72 |
506.91 |
516.35 |
3,690,200 |
516.35 |
Dec 18, 2014 |
512.95 |
513.87 |
504.70 |
511.10 |
2,926,700 |
511.10 |
Dec 17, 2014 |
497.00 |
507.00 |
496.81 |
504.89 |
2,883,200 |
504.89 |
Dec 16, 2014 |
511.56 |
513.05 |
489.00 |
495.39 |
3,964,300 |
495.39 |
Dec 15, 2014 |
522.74 |
523.10 |
513.27 |
513.80 |
2,813,400 |
513.80 |
Dec 12, 2014 |
523.51 |
528.50 |
518.66 |
518.66 |
1,994,600 |
518.66 |
Dec 11, 2014 |
527.80 |
533.92 |
527.10 |
528.34 |
1,610,800 |
528.34 |
Dec 10, 2014 |
533.08 |
536.33 |
525.56 |
526.06 |
1,712,300 |
526.06 |
We need to put this data into a location from where ELK Stack can access it for further analysis.
We will look at some of the top entries of the CSV file using the Unix head
command as follows:
$ head GOOG.csv 2014-12-31,531.25244,532.60236,525.80237,526.4024,1368200,526.4024 2014-12-30,528.09241,531.1524,527.13239,530.42242,876300,530.42242 2014-12-29,532.19244,535.48242,530.01337,530.3324,2278500,530.3324 2014-12-26,528.7724,534.25244,527.31238,534.03247,1036000,534.03247 2014-12-24,530.51245,531.76141,527.0224,528.7724,705900,528.7724 2014-12-23,527.00238,534.56244,526.29236,530.59241,2197600,530.59241 2014-12-22,516.08234,526.4624,516.08234,524.87238,2723800,524.87238 2014-12-19,511.51233,517.72235,506.9133,516.35229,3690200,516.35229 2014-12-18,512.95233,513.87231,504.7023,511.10233,2926700,511.10233
Each row represents the Quote price data for a particular date separated by a comma.
Now, when we are familiar with the data, we will set up the ELK Stack where we can parse and process the data using Logstash, index it in Elasticsearch, and then build beautiful visualizations in Kibana.