Elasticsearch basic concepts

Let's look at some of the basic concepts of Elasticsearch, which explain how it stores the indexed data.

Index

Index in Elasticsearch is a collection of documents that share some common characteristics.

Each index contains multiple types, which in turn contains multiple documents, and each document contains multiple fields. An index consists of multiple JSON documents in Elasticsearch. There can be any number of indices in a cluster in Elasticsearch.

In ELK, when Logstash JSON documents are sent to Elasticsearch, they are sent as the default index pattern "logstash-%{+YYYY.MM.dd}". It partitions indices by day so that it can easily be searched and deleted if required. This pattern can be changed in the Logstash output plugin configuration.

The URL to search and query the indices looks like this:

http://localhost:9200/[index]/[type]/[operation]

Document

A document in Elasticsearch is a JSON document stored in an index. Each document has a type and corresponding ID, which represents it uniquely.

For example, a document stored in Elasticsearch would look similar to this:

{
  "_index" : "packtpub",
  "_type" : "elk",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source":{
book_name : "learning elk"
}
}

Field

A field is a basic unit inside a document. As in the preceding example, a basic field is a key value pair as follows:

book_name : "learning elk"

Type

Type is used to provide a logical partition inside the indices. It basically represents a class of similar types of documents. An index can have multiple types and we can define them as per the context.

For example, the index for Facebook can have post as one of the index types, comments as another.

Mapping

Mapping is used to map each field of the document with its corresponding data type, such as string, integer, float, double, date, and so on. Elasticsearch creates a mapping for the fields automatically during index creation, and those mappings can be easily queried or modified based on specific types of needs.

Shard

A shard is the actual physical entity where the data for each index is stored. Each index can have a number of primary and replica shards where it stores the data. Shards are distributed among all the nodes in the cluster and can be moved from one node to another in case of node failures or the addition of new nodes.

Primary shard and replica shard

Each document in an Elasticsearch index is stored on one primary shard and a number of replica shards. While indexing, the document is first stored on a primary shard and then on the corresponding replica shard. By default, the number of primary shards for each index is five and can be configured as per our needs.

Replica shards will typically reside on a different node than the primary shard and help in case of failover and load balancing to cater to multiple requests.

Cluster

A cluster is a collection of nodes that stores the indexed data. Elasticsearch provides horizontal scalability with the help of data stored in the cluster. Each cluster is represented by a cluster name, which different nodes join. The cluster name is set by a property called cluster.name in the Elasticsearch configuration elasticsearch.yml, which defaults to "elasticsearch":

cluster.name: elasticsearch 

Node

A node is a single running instance of Elasticsearch, which belongs to one of the clusters. By default, every node in Elasticsearch joins the cluster named "elasticsearch". Each node can have its own configuration defined in elasticsearch.yml, they can have different settings regarding memory and resource allocations.

In Elasticsearch, nodes can play three types of roles:

  • Data node: Data nodes index documents and perform searches on indexed documents. It is always recommended to add more data nodes in order to increase performance or scale the cluster. A node can be made a data node by setting these properties in the elasticsearch.yml configuration for the node:
    node.master = false
    node.data=true
  • Master node: Master nodes are responsible for management of a cluster. For large clusters, it is recommended to have three dedicated master nodes (one primary and two backup), which only act as master nodes and do not store indices or perform searches. A node can be configured to be a dedicated master node with this configuration in elasticsearch.yml:
    node.master =true
    node.data=false
  • Routing node or load balancer node: These nodes do not play the role of either a master or data node, but just perform load balancing, or routing of requests for searches, or indexing the document to appropriate nodes. This is useful for high volume searches or index operations. A node can be configured to be a routing node with this configuration in elasticsearch.yml:
    node.master =false
    node.data=false
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset