Let's look at some of the basic concepts of Elasticsearch, which explain how it stores the indexed data.
Index in Elasticsearch is a collection of documents that share some common characteristics.
Each index contains multiple types, which in turn contains multiple documents, and each document contains multiple fields. An index consists of multiple JSON documents in Elasticsearch. There can be any number of indices in a cluster in Elasticsearch.
In ELK, when Logstash JSON documents are sent to Elasticsearch, they are sent as the default index pattern "logstash-%{+YYYY.MM.dd}"
. It partitions indices by day so that it can easily be searched and deleted if required. This pattern can be changed in the Logstash output plugin configuration.
The URL to search and query the indices looks like this:
http://localhost:9200/[index]/[type]/[operation]
A document in Elasticsearch is a JSON document stored in an index. Each document has a type and corresponding ID, which represents it uniquely.
For example, a document stored in Elasticsearch would look similar to this:
{ "_index" : "packtpub", "_type" : "elk", "_id" : "1", "_version" : 1, "found" : true, "_source":{ book_name : "learning elk" } }
A field is a basic unit inside a document. As in the preceding example, a basic field is a key value pair as follows:
book_name : "learning elk"
Type is used to provide a logical partition inside the indices. It basically represents a class of similar types of documents. An index can have multiple types and we can define them as per the context.
For example, the index for Facebook can have post
as one of the index types, comments
as another.
Mapping is used to map each field of the document with its corresponding data type, such as string
, integer
, float
, double
, date
, and so on. Elasticsearch creates a mapping for the fields automatically during index creation, and those mappings can be easily queried or modified based on specific types of needs.
A shard is the actual physical entity where the data for each index is stored. Each index can have a number of primary and replica shards where it stores the data. Shards are distributed among all the nodes in the cluster and can be moved from one node to another in case of node failures or the addition of new nodes.
Each document in an Elasticsearch index is stored on one primary shard and a number of replica shards. While indexing, the document is first stored on a primary shard and then on the corresponding replica shard. By default, the number of primary shards for each index is five and can be configured as per our needs.
Replica shards will typically reside on a different node than the primary shard and help in case of failover and load balancing to cater to multiple requests.
A cluster is a collection of nodes that stores the indexed data. Elasticsearch provides horizontal scalability with the help of data stored in the cluster. Each cluster is represented by a cluster name, which different nodes join. The cluster name is set by a property called cluster.name
in the Elasticsearch configuration elasticsearch.yml
, which defaults to "elasticsearch"
:
cluster.name: elasticsearch
A node is a single running instance of Elasticsearch, which belongs to one of the clusters. By default, every node in Elasticsearch joins the cluster named "elasticsearch"
. Each node can have its own configuration defined in elasticsearch.yml
, they can have different settings regarding memory and resource allocations.
In Elasticsearch, nodes can play three types of roles:
elasticsearch.yml
configuration for the node:node.master = false node.data=true
elasticsearch.yml
:node.master =true node.data=false
elasticsearch.yml
:node.master =false node.data=false