0%

Book Description

End-to-end Search and Analytics

About This Book

  • Solve your data analytics problems with the Elastic Stack
  • Improve your user search experience with Elasticsearch and develop your own Elasticsearch plugins
  • Design your index, configure it, and distribute it - you'll also learn how it works

Who This Book Is For

This course is for anyone who wants to build efficient search and analytics applications. Some development experience is expected.

What You Will Learn

  • Install and configure Elasticsearch, Logstash, and Kibana
  • Write CRUDE operations and other search functionalities using the Elasticsearch Python and Java Clients
  • Build analytics using aggregations
  • Set up and scale Elasticsearch clusters using best practices
  • Master document relationships and geospatial data
  • Build your own data pipeline using Elastic Stack
  • Choose the appropriate amount of shards and replicas for your deployment
  • Become familiar with the Elasticsearch APIs

In Detail

Elasticsearch is a modern, fast, distributed, scalable, fault tolerant, open source search and analytics engine. It provides a new level of control over how you can index and search even huge sets of data. This course will take you from the basics of Elasticsearch to using Elasticsearch in the Elastic Stack and in production.

You'll start with the very basics: Elasticsearch terminology, installation, and configuring Elasticsearch. After this, you'll take a look at analytics and indexing, search, and querying. You'll learn how to create maps and visualizations. You'll also be briefed on cluster scaling, search and bulk operations, backups, and security.

Then you'll be ready to get into Elasticsearch's internal functionalities including caches, Apache Lucene library, and its monitoring capabilities. You'll learn about the practical usage of Elasticsearch configuration parameters and how to use the monitoring API. You'll discover how to improve the user search experience, index distribution, segment statistics, merging, and more.

Once you have mastered this, you'll dive into end-to-end visualize-analyze-log techniques with Elastic Stack (also known as the ELK stack). You'll explore Elasticsearch, Logstash, and Kibana and see how to make them work together to build fresh insights and business metrics out of data. You'll be able to use Elasticsearch with other de facto components in order to get the most out of Elasticsearch. By the end of this course, you'll have developed a full-fledged data pipeline.

This Learning Path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products:

  • Elasticsearch Essentials
  • Mastering Elasticsearch, Second Edition
  • Learning ELK Stack

Style and approach

This course aims to create a smooth learning path that will teach you how to effectively use Elasticsearch with other de facto components and get the most out of Elasticsearch. Through this comprehensive course, you'll learn the basics of Elasticsearch and progress to using Elasticsearch in the Elastic stack and in production.

Table of Contents

  1. Elasticsearch: A Complete Guide
    1. Table of Contents
    2. Elasticsearch: A Complete Guide
    3. Elasticsearch: A Complete Guide
    4. Credits
    5. Preface
      1. What this learning path covers
      2. What you need for this learning path
      3. Who this learning path is for
      4. Reader feedback
      5. Customer support
        1. Downloading the example code
        2. Errata
        3. Piracy
        4. Questions
    6. 1. Module 1
      1. 1. Getting Started with Elasticsearch
        1. Introducing Elasticsearch
          1. The primary features of Elasticsearch
          2. Understanding REST and JSON
            1. What is REST?
            2. What is JSON?
          3. Elasticsearch common terms
          4. Understanding Elasticsearch structure with respect to relational databases
        2. Installing and configuring Elasticsearch
          1. Installing Elasticsearch on Ubuntu through Debian package
          2. Installing Elasticsearch on Centos through the RPM package
          3. Understanding the Elasticsearch installation directory layout
          4. Configuring basic parameters
          5. Adding another node to the cluster
          6. Installing Elasticsearch plugins
            1. Checking for installed plugins
            2. Installing the Head plugin for Elasticsearch
            3. Installing Sense for Elasticsearch
        3. Basic operations with Elasticsearch
          1. Creating an Index
          2. Indexing a document in Elasticsearch
          3. Fetching documents
            1. Get a complete document
            2. Getting part of a document
          4. Updating documents
            1. Updating a whole document
            2. Updating documents partially
          5. Deleting documents
          6. Checking documents' existence
        4. Summary
      2. 2. Understanding Document Analysis and Creating Mappings
        1. Text search
          1. TF-IDF
          2. Inverted indexes
        2. Document analysis
          1. Introducing Lucene analyzers
          2. Creating custom analyzers
          3. Changing a default analyzer
          4. Putting custom analyzers into action
        3. Elasticsearch mapping
          1. Document metadata fields
          2. Data types and index analysis options
            1. Configuring data types
              1. String
              2. Number
              3. Date
              4. Boolean
              5. Arrays
              6. Objects
            2. Indexing the same field in different ways
          3. Putting mappings in an index
          4. Viewing mappings
          5. Updating mappings
        4. Summary
      3. 3. Putting Elasticsearch into Action
        1. CRUD operations using elasticsearch-py
          1. Setting up the environment
            1. Installing Pip
            2. Installing virtualenv
            3. Installing elasticsearch-py
          2. Performing CRUD operations
            1. Request timeouts
            2. Creating indexes with settings and mappings
            3. Indexing documents
            4. Retrieving documents
            5. Updating documents
              1. Replacing the value of a field completely
            6. Appending a value in an array
            7. Updates using doc
            8. Checking document existence
            9. Deleting a document
        2. CRUD operations using Java
          1. Connecting with Elasticsearch
          2. Indexing a document
          3. Fetching a document
          4. Updating a document
            1. Updating a document using doc
            2. Updating a document using script
          5. Deleting documents
        3. Creating a search database
        4. Elasticsearch Query-DSL
        5. Understanding Query-DSL parameters
          1. Query types
          2. Full-text search queries
            1. match_all
            2. match query
              1. Phrase search
            3. multi match
            4. query_string
          3. Term-based search queries
            1. Term query
            2. Terms query
            3. Range queries
            4. Exists queries
            5. Missing queries
          4. Compound queries
            1. Bool queries
            2. Not queries
        6. Search requests using Python
        7. Search requests using Java
          1. Parsing search responses
        8. Sorting your data
          1. Sorting documents by field values
          2. Sorting on more than one field
          3. Sorting multivalued fields
          4. Sorting on string fields
        9. Document routing
        10. Summary
      4. 4. Aggregations for Analytics
        1. Introducing the aggregation framework
          1. Aggregation syntax
          2. Extracting values
          3. Returning only aggregation results
        2. Metric aggregations
          1. Computing basic stats
            1. Combined stats
            2. Computing stats separately
          2. Computing extended stats
          3. Finding distinct counts
        3. Bucket aggregations
          1. Terms aggregation
          2. Range aggregation
          3. Date range aggregation
          4. Histogram aggregation
          5. Date histogram aggregation
          6. Filter-based aggregation
        4. Combining search, buckets, and metrics
        5. Memory pressure and implications
        6. Summary
      5. 5. Data Looks Better on Maps: Master Geo-Spatiality
        1. Introducing geo-spatial data
        2. Working with geo-point data
          1. Mapping geo-point fields
          2. Indexing geo-point data
          3. Querying geo-point data
            1. Geo distance query
            2. Geo distance range query
            3. Geo bounding box query
              1. Understanding bounding boxes
          4. Sorting by distance
        3. Geo-aggregations
          1. Geo distance aggregation
          2. Using bounding boxes with geo distance aggregation
        4. Geo-shapes
          1. Point
          2. Linestring
          3. Circles
          4. Polygons
          5. Envelops
          6. Mappings geo-shape fields
          7. Indexing geo-shape data
          8. Querying geo-shape data
        5. Summary
      6. 6. Document Relationships in NoSQL World
        1. Relational data in the document-oriented NoSQL world
          1. Managing relational data in Elasticsearch
        2. Working with nested objects
          1. Creating nested mappings
          2. Indexing nested data
          3. Querying nested type data
            1. Nested aggregations
            2. Nested aggregation
              1. Understanding nested aggregation syntax:
            3. Reverse nested aggregation
        3. Parent-child relationships
          1. Creating parent-child mappings
          2. Indexing parent-child documents
          3. Querying parent-child documents
            1. has_child query
            2. has_parent query
        4. Considerations for using document relationships
        5. Summary
      7. 7. Different Methods of Search and Bulk Operations
        1. Introducing search types in Elasticsearch
        2. Cheaper bulk operations
          1. Bulk create
          2. Bulk indexing
          3. Bulk updating
          4. Bulk deleting
        3. Multi get and multi search APIs
          1. Multi get
          2. Multi searches
        4. Data pagination
          1. Pagination with scoring
          2. Pagination without scoring
            1. Scrolling and re-indexing documents using scan-scroll
        5. Practical considerations for bulk processing
        6. Summary
      8. 8. Controlling Relevancy
        1. Introducing relevant searches
        2. The Elasticsearch out-of-the-box tools
          1. An example: why defaults are not enough
        3. Controlling relevancy with custom scoring
          1. The function_score query
            1. weight
            2. field_value_factor
            3. script_score
            4. Decay functions - linear, exp, and gauss
        4. Summary
      9. 9. Cluster Scaling in Production Deployments
        1. Node types in Elasticsearch
          1. Client node
          2. Data node
          3. Master node
        2. Introducing Zen-Discovery
          1. Multicasting discovery
          2. Unicasting discovery
            1. Configuring unicasting discovery
              1. Minimum number of master nodes: preventing split-brain
                1. An initial list of hosts to ping
                2. Ping timeout
                3. Node upgrades without downtime
                4. Upgrading Elasticsearch version
                5. Best Elasticsearch practices in production
                6. Creating a cluster
                7. Scaling your clusters
                  1. When to scale
                    1. Metrics to watch
                      1. CPU utilization
                      2. Memory utilization
                      3. Disk I/O utilization
                      4. Disk low watermark
                  2. How to scale
                8. Summary
              2. 10. Backups and Security
                1. Introducing backup and restore mechanisms
                  1. Backup using snapshot API
                    1. Creating an NFS drive
                      1. Configuring the NFS host server
                      2. Configuring client machines
                    2. Creating a snapshot
                      1. Registering the repository path
                      2. Registering the shared file system repository in Elasticsearch
                      3. Create your first snapshot
                      4. Getting snapshot information
                      5. Deleting snapshots
                  2. Restoring snapshots
                    1. Restoring multiple indices
                    2. Renaming indices
                    3. Partial restore
                    4. Changing index settings during restore
                    5. Restoring to a different cluster
                  3. Manual backups
                  4. Manual restoration
                2. Securing Elasticsearch
                  1. Setting up basic HTTP authentication
                  2. Setting up Nginx
                  3. Securing critical access
                    1. Restricting DELETE requests
                    2. Restricting endpoints
                  4. Load balancing using Nginx
                3. Summary
            2. II. Module 2
              1. 1. Introduction to Elasticsearch
                1. Introducing Apache Lucene
                  1. Getting familiar with Lucene
                  2. Overall architecture
                    1. Getting deeper into Lucene index
                      1. Norms
                      2. Term vectors
                      3. Posting formats
                      4. Doc values
                  3. Analyzing your data
                    1. Indexing and querying
                  4. Lucene query language
                    1. Understanding the basics
                    2. Querying fields
                    3. Term modifiers
                    4. Handling special characters
                2. Introducing Elasticsearch
                  1. Basic concepts
                    1. Index
                    2. Document
                    3. Type
                    4. Mapping
                    5. Node
                    6. Cluster
                    7. Shard
                    8. Replica
                  2. Key concepts behind Elasticsearch architecture
                  3. Workings of Elasticsearch
                    1. The startup process
                    2. Failure detection
                  4. Communicating with Elasticsearch
                    1. Indexing data
                    2. Querying data
                3. The story
                4. Summary
              2. 2. Power User Query DSL
                1. Default Apache Lucene scoring explained
                  1. When a document is matched
                  2. TF/IDF scoring formula
                    1. Lucene conceptual scoring formula
                    2. Lucene practical scoring formula
                  3. Elasticsearch point of view
                  4. An example
                2. Query rewrite explained
                  1. Prefix query as an example
                  2. Getting back to Apache Lucene
                  3. Query rewrite properties
                3. Query templates
                  1. Introducing query templates
                    1. Templates as strings
                  2. The Mustache template engine
                    1. Conditional expressions
                    2. Loops
                    3. Default values
                  3. Storing templates in files
                4. Handling filters and why it matters
                  1. Filters and query relevance
                  2. How filters work
                    1. Bool or and/or/not filters
                  3. Performance considerations
                  4. Post filtering and filtered query
                  5. Choosing the right filtering method
                5. Choosing the right query for the job
                  1. Query categorization
                    1. Basic queries
                    2. Compound queries
                    3. Not analyzed queries
                    4. Full text search queries
                    5. Pattern queries
                    6. Similarity supporting queries
                    7. Score altering queries
                    8. Position aware queries
                    9. Structure aware queries
                  2. The use cases
                    1. Example data
                    2. Basic queries use cases
                      1. Searching for values in range
                      2. Simplified query for multiple terms
                    3. Compound queries use cases
                      1. Boosting some of the matched documents
                      2. Ignoring lower scoring partial queries
                    4. Not analyzed queries use cases
                      1. Limiting results to given tags
                      2. Efficient query time stopwords handling
                    5. Full text search queries use cases
                      1. Using Lucene query syntax in queries
                      2. Handling user queries without errors
                    6. Pattern queries use cases
                      1. Autocomplete using prefixes
                      2. Pattern matching
                    7. Similarity supporting queries use cases
                      1. Finding terms similar to a given one
                      2. Finding documents with similar field values
                    8. Score altering queries use cases
                      1. Favoring newer books
                      2. Decreasing importance of books with certain value
                    9. Pattern queries use cases
                      1. Matching phrases
                      2. Spans, spans everywhere
                    10. Structure aware queries use cases
                      1. Returning parent documents having a certain nested document
                      2. Affecting parent document score with the score of nested documents
                6. Summary
              3. 3. Not Only Full Text Search
                1. Query rescoring
                  1. What is query rescoring?
                  2. An example query
                  3. Structure of the rescore query
                  4. Rescore parameters
                    1. Choosing the scoring mode
                  5. To sum up
                2. Controlling multimatching
                  1. Multimatch types
                    1. Best fields matching
                    2. Cross fields matching
                    3. Most fields matching
                    4. Phrase matching
                    5. Phrase with prefixes matching
                3. Significant terms aggregation
                  1. An example
                  2. Choosing significant terms
                  3. Multiple values analysis
                    1. Significant terms aggregation and full text search fields
                  4. Additional configuration options
                    1. Controlling the number of returned buckets
                    2. Background set filtering
                    3. Minimum document count
                    4. Execution hint
                    5. More options
                  5. There are limits
                    1. Memory consumption
                    2. Shouldn't be used as top-level aggregation
                    3. Counts are approximated
                    4. Floating point fields are not allowed
                4. Documents grouping
                  1. Top hits aggregation
                  2. An example
                    1. Additional parameters
                5. Relations between documents
                  1. The object type
                  2. The nested documents
                  3. Parent–child relationship
                    1. Parent–child relationship in the cluster
                  4. A few words about alternatives
                6. Scripting changes between Elasticsearch versions
                  1. Scripting changes
                    1. Security issues
                    2. Groovy – the new default scripting language
                    3. Removal of MVEL language
                  2. Short Groovy introduction
                    1. Using Groovy as your scripting language
                    2. Variable definition in scripts
                    3. Conditionals
                    4. Loops
                    5. An example
                    6. There is more
                  3. Scripting in full text context
                    1. Field-related information
                    2. Shard level information
                    3. Term level information
                      1. More advanced term information
                  4. Lucene expressions explained
                    1. The basics
                    2. An example
                    3. There is more
                7. Summary
              4. 4. Improving the User Search Experience
                1. Correcting user spelling mistakes
                  1. Testing data
                  2. Getting into technical details
                  3. Suggesters
                    1. Using the _suggest REST endpoint
                    2. Understanding the REST endpoint suggester response
                    3. Including suggestion requests in query
                    4. The term suggester
                      1. Configuration
                      2. Common term suggester options
                      3. Additional term suggester options
                    5. The phrase suggester
                      1. Usage example
                      2. Configuration
                      3. Basic configuration
                      4. Configuring smoothing models
                      5. Configuring candidate generators
                      6. Configuring direct generators
                    6. The completion suggester
                      1. The logic behind the completion suggester
                      2. Using the completion suggester
                      3. Indexing data
                      4. Querying data
                      5. Custom weights
                      6. Additional parameters
                2. Improving the query relevance
                  1. Data
                  2. The quest for relevance improvement
                    1. The standard query
                    2. The multi match query
                    3. Phrases comes into play
                    4. Let's throw the garbage away
                    5. Now, we boost
                    6. Performing a misspelling-proof search
                    7. Drill downs with faceting
                3. Summary
              5. 5. The Index Distribution Architecture
                1. Choosing the right amount of shards and replicas
                  1. Sharding and overallocation
                  2. A positive example of overallocation
                  3. Multiple shards versus multiple indices
                  4. Replicas
                2. Routing explained
                  1. Shards and data
                  2. Let's test routing
                    1. Indexing with routing
                  3. Routing in practice
                    1. Querying
                  4. Aliases
                  5. Multiple routing values
                3. Altering the default shard allocation behavior
                  1. Allocation awareness
                    1. Forcing allocation awareness
                  2. Filtering
                    1. What include, exclude, and require mean
                  3. Runtime allocation updating
                    1. Index level updates
                    2. Cluster level updates
                  4. Defining total shards allowed per node
                  5. Defining total shards allowed per physical server
                    1. Inclusion
                    2. Requirement
                    3. Exclusion
                    4. Disk-based allocation
                4. Query execution preference
                  1. Introducing the preference parameter
                5. Summary
              6. 6. Low-level Index Control
                1. Altering Apache Lucene scoring
                  1. Available similarity models
                  2. Setting a per-field similarity
                  3. Similarity model configuration
                  4. Choosing the default similarity model
                    1. Configuring the chosen similarity model
                      1. Configuring the TF/IDF similarity
                      2. Configuring the Okapi BM25 similarity
                      3. Configuring the DFR similarity
                      4. Configuring the IB similarity
                      5. Configuring the LM Dirichlet similarity
                      6. Configuring the LM Jelinek Mercer similarity
                2. Choosing the right directory implementation – the store module
                  1. The store type
                    1. The simple filesystem store
                    2. The new I/O filesystem store
                    3. The MMap filesystem store
                    4. The hybrid filesystem store
                    5. The memory store
                      1. Additional properties
                    6. The default store type
                    7. The default store type for Elasticsearch 1.3.0 and higher
                    8. The default store type for Elasticsearch versions older than 1.3.0
                3. NRT, flush, refresh, and transaction log
                  1. Updating the index and committing changes
                    1. Changing the default refresh time
                  2. The transaction log
                    1. The transaction log configuration
                  3. Near real-time GET
                4. Segment merging under control
                  1. Choosing the right merge policy
                    1. The tiered merge policy
                    2. The log byte size merge policy
                    3. The log doc merge policy
                  2. Merge policies' configuration
                    1. The tiered merge policy
                    2. The log byte size merge policy
                    3. The log doc merge policy
                  3. Scheduling
                    1. The concurrent merge scheduler
                    2. The serial merge scheduler
                    3. Setting the desired merge scheduler
                5. When it is too much for I/O – throttling explained
                  1. Controlling I/O throttling
                  2. Configuration
                    1. The throttling type
                    2. Maximum throughput per second
                    3. Node throttling defaults
                    4. Performance considerations
                    5. The configuration example
                6. Understanding Elasticsearch caching
                  1. The filter cache
                    1. Filter cache types
                    2. Node-level filter cache configuration
                    3. Index-level filter cache configuration
                  2. The field data cache
                    1. Field data or doc values
                    2. Node-level field data cache configuration
                    3. Index-level field data cache configuration
                    4. The field data cache filtering
                      1. Adding field data filtering information
                      2. Filtering by term frequency
                      3. Filtering by regex
                      4. Filtering by regex and term frequency
                      5. The filtering example
                    5. Field data formats
                      1. String-based fields
                      2. Numeric fields
                      3. Geographical-based fields
                    6. Field data loading
                  3. The shard query cache
                    1. Setting up the shard query cache
                  4. Using circuit breakers
                    1. The field data circuit breaker
                    2. The request circuit breaker
                    3. The total circuit breaker
                  5. Clearing the caches
                  6. Index, indices, and all caches clearing
                    1. Clearing specific caches
                7. Summary
              7. 7. Elasticsearch Administration
                1. Discovery and recovery modules
                  1. Discovery configuration
                    1. Zen discovery
                      1. Multicast Zen discovery configuration
                      2. The unicast Zen discovery configuration
                  2. Master node
                    1. Configuring master and data nodes
                      1. Configuring data-only nodes
                      2. Configuring master-only nodes
                      3. Configuring the query processing-only nodes
                    2. The master election configuration
                      1. Zen discovery fault detection and configuration
                    3. The Amazon EC2 discovery
                      1. The EC2 plugin installation
                      2. The EC2 plugin's generic configuration
                      3. Optional EC2 discovery configuration options
                      4. The EC2 nodes scanning configuration
                    4. Other discovery implementations
                  3. The gateway and recovery configuration
                    1. The gateway recovery process
                    2. Configuration properties
                    3. Expectations on nodes
                    4. The local gateway
                    5. Low-level recovery configuration
                      1. Cluster-level recovery configuration
                      2. Index-level recovery settings
                  4. The indices recovery API
                2. The human-friendly status API – using the Cat API
                  1. The basics
                  2. Using the Cat API
                    1. Common arguments
                  3. The examples
                    1. Getting information about the master node
                    2. Getting information about the nodes
                3. Backing up
                  1. Saving backups in the cloud
                    1. The S3 repository
                    2. The HDFS repository
                    3. The Azure repository
                4. Federated search
                  1. The test clusters
                  2. Creating the tribe node
                    1. Using the unicast discovery for tribes
                  3. Reading data with the tribe node
                    1. Master-level read operations
                  4. Writing data with the tribe node
                    1. Master-level write operations
                  5. Handling indices conflicts
                  6. Blocking write operations
                5. Summary
              8. 8. Improving Performance
                1. Using doc values to optimize your queries
                  1. The problem with field data cache
                  2. The example of doc values usage
                2. Knowing about garbage collector
                  1. Java memory
                    1. The life cycle of Java objects and garbage collections
                  2. Dealing with garbage collection problems
                    1. Turning on logging of garbage collection work
                    2. Using JStat
                    3. Creating memory dumps
                    4. More information on the garbage collector work
                    5. Adjusting the garbage collector work in Elasticsearch
                      1. Using a standard start up script
                      2. Service wrapper
                  3. Avoid swapping on Unix-like systems
                3. Benchmarking queries
                  1. Preparing your cluster configuration for benchmarking
                  2. Running benchmarks
                  3. Controlling currently run benchmarks
                4. Very hot threads
                  1. Usage clarification for the Hot Threads API
                  2. The Hot Threads API response
                5. Scaling Elasticsearch
                  1. Vertical scaling
                  2. Horizontal scaling
                    1. Automatically creating replicas
                    2. Redundancy and high availability
                    3. Cost and performance flexibility
                    4. Continuous upgrades
                    5. Multiple Elasticsearch instances on a single physical machine
                      1. Preventing the shard and its replicas from being on the same node
                    6. Designated nodes' roles for larger clusters
                      1. Query aggregator nodes
                      2. Data nodes
                      3. Master eligible nodes
                  3. Using Elasticsearch for high load scenarios
                    1. General Elasticsearch-tuning advices
                      1. Choosing the right store
                      2. The index refresh rate
                      3. Thread pools tuning
                      4. Adjusting the merge process
                      5. Data distribution
                    2. Advices for high query rate scenarios
                      1. Filter caches and shard query caches
                      2. Think about the queries
                      3. Using routing
                      4. Parallelize your queries
                      5. Field data cache and breaking the circuit
                      6. Keeping size and shard_size under control
                    3. High indexing throughput scenarios and Elasticsearch
                      1. Bulk indexing
                      2. Doc values versus indexing speed
                      3. Keep your document fields under control
                      4. The index architecture and replication
                      5. Tuning write-ahead log
                      6. Think about storage
                      7. RAM buffer for indexing
                6. Summary
              9. 9. Developing Elasticsearch Plugins
                1. Creating the Apache Maven project structure
                2. Understanding the basics
                  1. The structure of the Maven Java project
                  2. The idea of POM
                  3. Running the build process
                  4. Introducing the assembly Maven plugin
                3. Creating custom REST action
                  1. The assumptions
                  2. Implementation details
                    1. Using the REST action class
                      1. The constructor
                      2. Handling requests
                      3. Writing response
                    2. The plugin class
                    3. Informing Elasticsearch about our REST action
                    4. Time for testing
                    5. Building the REST action plugin
                    6. Installing the REST action plugin
                    7. Checking whether the REST action plugin works
                4. Creating the custom analysis plugin
                  1. Implementation details
                    1. Implementing TokenFilter
                    2. Implementing the TokenFilter factory
                    3. Implementing the class custom analyzer
                    4. Implementing the analyzer provider
                    5. Implementing the analysis binder
                    6. Implementing the analyzer indices component
                    7. Implementing the analyzer module
                    8. Implementing the analyzer plugin
                    9. Informing Elasticsearch about our custom analyzer
                  2. Testing our custom analysis plugin
                    1. Building our custom analysis plugin
                    2. Installing the custom analysis plugin
                    3. Checking whether our analysis plugin works
                5. Summary
            3. III. Module 3
              1. 1. Introduction to ELK Stack
                1. The need for log analysis
                  1. Issue debugging
                  2. Performance analysis
                  3. Security analysis
                  4. Predictive analysis
                  5. Internet of things and logging
                2. Challenges in log analysis
                  1. Non-consistent log format
                    1. Tomcat logs
                    2. Apache access logs – combined log format
                    3. IIS logs
                  2. Variety of time formats
                    1. Decentralized logs
                  3. Expert knowledge requirement
                3. The ELK Stack
                  1. Elasticsearch
                  2. Logstash
                  3. Kibana
                4. ELK data pipeline
                5. ELK Stack installation
                  1. Installing Elasticsearch
                  2. Running Elasticsearch
                  3. Elasticsearch configuration
                    1. Network Address
                    2. Paths
                    3. The cluster name
                    4. The node name
                  4. Elasticsearch plugins
                  5. Installing Logstash
                  6. Running Logstash
                  7. Logstash with file input
                  8. Logstash with Elasticsearch output
                  9. Configuring Logstash
                  10. Installing Logstash forwarder
                  11. Logstash plugins
                    1. Input plugin
                    2. Filters plugin
                    3. Output plugin
                  12. Installing Kibana
                  13. Configuring Kibana
                  14. Running Kibana
                  15. Kibana interface
                    1. Discover
                    2. Visualize
                    3. Dashboard
                    4. Settings
                6. Summary
              2. 2. Building Your First Data Pipeline with ELK
                1. Input dataset
                  1. Data format for input dataset
                2. Configuring Logstash input
                3. Filtering and processing input
                4. Putting data to Elasticsearch
                5. Visualizing with Kibana
                  1. Running Kibana
                  2. Kibana visualizations
                  3. Building a line chart
                  4. Building a bar chart
                  5. Building a Metric
                  6. Building a data table
                6. Summary
              3. 3. Collect, Parse and Transform Data with Logstash
                1. Configuring Logstash
                2. Logstash plugins
                  1. Listing all plugins in Logstash
                  2. Data types for plugin properties
                    1. Array
                    2. Boolean
                    3. Codec
                    4. Hash
                    5. String
                    6. Comments
                    7. Field references
                  3. Logstash conditionals
                  4. Types of Logstash plugins
                    1. Input plugins
                      1. file
                        1. Configuration options
                          1. add_field
                          2. codec
                          3. delimiter
                          4. exclude
                          5. path
                          6. sincedb_path
                          7. sincedb_write_interval
                          8. start_position
                          9. tags
                          10. type
                      2. stdin
                        1. Configuration options
                          1. add_field
                          2. codec
                          3. tags
                          4. type
                      3. twitter
                        1. Configuration options
                          1. add_field
                          2. codec
                          3. consumer_key
                          4. consumer_secret
                          5. full_tweet
                          6. keywords
                          7. oauth_token
                          8. oauth_token_secret
                          9. tags
                          10. type
                      4. lumberjack
                        1. Configuration options
                          1. add_field
                          2. codec
                          3. host
                          4. port
                          5. ssl_certificate
                          6. ssl_key
                          7. ssl_key_passphrase
                          8. tags
                          9. type
                      5. redis
                        1. Configuration options
                          1. add_field
                          2. codec
                          3. data_type
                          4. host
                          5. key
                          6. password
                          7. port
                    2. Output plugins
                      1. csv
                        1. Configuration options
                          1. codec
                          2. csv_options
                          3. fields
                          4. gzip
                          5. path
                      2. file
                        1. Configuration options
                      3. email
                        1. Configuration options
                          1. attachments
                          2. body
                          3. cc
                          4. from
                          5. to
                          6. htmlbody
                          7. replyto
                          8. subject
                      4. elasticsearch
                        1. Configuration options
                      5. ganglia
                        1. Configuration options
                          1. metric
                          2. unit
                          3. value
                      6. jira
                        1. Configuration options
                      7. kafka
                        1. Configuration options
                          1. topic_id
                      8. lumberjack
                        1. Configuration options
                          1. hosts
                          2. port
                          3. ssl_certificate
                      9. redis
                        1. Configuration options
                      10. rabbitmq
                      11. stdout
                      12. mongodb
                        1. Configuration options
                          1. collection
                          2. database
                          3. uri
                    3. Filter plugins
                      1. csv
                        1. Configuration options
                      2. date
                        1. Configuration options
                      3. drop
                        1. Configuration options
                      4. geoip
                        1. Configuration options
                          1. source
                      5. grok
                        1. Custom grok patterns
                      6. mutate
                        1. Configuration options
                      7. sleep
                    4. Codec plugins
                      1. json
                      2. line
                      3. multiline
                      4. plain
                      5. rubydebug
                3. Summary
              4. 4. Creating Custom Logstash Plugins
                1. Logstash plugin management
                2. Plugin lifecycle management
                  1. Installing a plugin
                  2. Updating a plugin
                  3. Uninstalling a plugin
                3. Structure of a Logstash plugin
                  1. Required dependencies
                  2. Class declaration
                  3. Configuration name
                  4. Configuration options setting
                  5. Plugin methods
                    1. Input plugin
                    2. Filter plugin
                    3. Output plugin
                    4. Codec plugin
                  6. Writing a Logstash filter plugin
                  7. Building the plugin
                4. Summary
              5. 5. Why Do We Need Elasticsearch in ELK?
                1. Why Elasticsearch?
                2. Elasticsearch basic concepts
                  1. Index
                  2. Document
                  3. Field
                  4. Type
                  5. Mapping
                  6. Shard
                  7. Primary shard and replica shard
                  8. Cluster
                  9. Node
                3. Exploring the Elasticsearch API
                  1. Listing all available indices
                  2. Listing all nodes in a cluster
                  3. Checking the health of the cluster
                    1. Health status of the cluster
                  4. Creating an index
                  5. Retrieving the document
                  6. Deleting documents
                  7. Deleting an index
                4. Elasticsearch Query DSL
                5. Elasticsearch plugins
                  1. Bigdesk plugin
                  2. Elastic-Hammer plugin
                  3. Head plugin
                6. Summary
              6. 6. Finding Insights with Kibana
                1. Kibana 4 features
                  1. Search highlights
                  2. Elasticsearch aggregations
                  3. Scripted fields
                  4. Dynamic dashboards
                2. Kibana interface
                  1. Discover page
                    1. Time filter
                      1. Quick time filter
                      2. Relative time filter
                      3. Absolute time filter
                      4. Kibana Auto-refresh setting
                  2. Querying and searching data
                    1. Freetext search
                      1. AND
                      2. OR
                      3. NOT
                      4. Groupings
                      5. Wildcard searches
                    2. Field searches
                    3. Range searches
                    4. Special characters escaping
                    5. New search
                    6. Saving the search
                    7. Loading a search
                    8. Field searches using field list
                3. Summary
              7. 7. Kibana – Visualization and Dashboard
                1. Visualize page
                  1. Creating a visualization
                  2. Visualization types
                  3. Metrics and buckets aggregations
                    1. Buckets
                      1. Date Histogram
                      2. Histogram
                      3. Range
                      4. Date Range
                      5. Terms
                    2. Metrics
                      1. Count
                      2. Average, Sum, Min, and Max
                      3. Unique Count
                    3. Advanced options
                  4. Visualizations
                    1. Area chart
                    2. Data table
                    3. Line chart
                    4. Markdown widget
                    5. Metric
                    6. Pie chart
                    7. Tile map
                    8. Vertical bar chart
                2. Dashboard page
                  1. Building a new dashboard
                  2. Saving and loading a dashboard
                  3. Sharing a dashboard
                3. Summary
              8. 8. Putting It All Together
                1. Input dataset
                2. Configuring Logstash input
                  1. Grok pattern for access logs
                3. Visualizing with Kibana
                  1. Running Kibana
                  2. Searching on the Discover page
                  3. Visualizations – charts
                  4. Building a Line chart
                  5. Building an Area chart
                  6. Building a Bar chart
                  7. Building a Markdown
                  8. Dashboard page
                4. Summary
              9. 9. ELK Stack in Production
                1. Prevention of data loss
                2. Data protection
                3. System scalability
                4. Data retention
                5. ELK Stack implementations
                  1. ELK Stack at LinkedIn
                    1. Problem statement
                    2. Criteria for solution
                    3. Solution
                    4. Kafka at LinkedIn
                    5. Operational challenges
                    6. Logging using Kafka at LinkedIn
                6. ELK at SCA
                  1. How is ELK used in SCA?
                  2. How is it helping in analytics?
                  3. ELK for monitoring at SCA
                7. ELK at Cliffhanger Solutions
                8. Kibana demo – Packetbeat dashboard
                9. Summary
              10. 10. Expanding Horizons with ELK
                1. Elasticsearch plugins and utilities
                  1. Curator for index management
                    1. Curator commands
                    2. Curator installation
                  2. Shield for security
                    1. Shield installation
                    2. Adding users and roles
                    3. Using Kibana4 on shield protected Elasticsearch
                  3. Marvel to monitor
                    1. Marvel installation
                    2. Marvel dashboards
                2. ELK roadmap
                  1. Elasticsearch roadmap
                  2. Logstash roadmap
                    1. Event persistence capability
                    2. End-to-end message acknowledgement
                    3. Logstash monitoring and management API
                  3. Kibana roadmap
                3. Summary
              11. A. Bibliography
            4. Index