Index
Symbols
- $DRILL_SITE variable, Creating a Site Directory
- 7-zip, Special Configuration Instructions for Windows Installations
- ? (question mark), parameter substitution with, Using drillpy to Query Drill
- @JsonProperty annotations, Creating the Regex Plug-in Configuration Class
- @Output annotations, Writing Aggregate User-Defined Functions
- @Param annotations, Defining input parameters
- @Workspace annotations, Writing Aggregate User-Defined Functions
- ` ` (backticks)
- { } (curly braces), map notation in JSON, Accessing Maps (Key–Value Pairs) in Drill
- { } (square brackets), referencing individual array items, Arrays in Drill
A
- absolute references in Drill UDFs, Accessing data in holder objects
- access credentials for Amazon S3, Getting access credentials for S3
- access keys for Amazon S3, in Hadoop, Working with Amazon S3
- admission control, Admission Control
- aggregate functions, Summarizing Data with Aggregate Functions-Summarizing Data with Aggregate Functions
- all-text mode, Data types in JSON files, JSON Lists in Drill, JSON Summary, Capturing Schema Mapping in Views
- ALTER SESSION statement, Drill’s REST Interface
- Amazon EC2, Drill running in, Elements of a Drill System
- Amazon S3, Connecting Drill to Cloud Storage, Data Engineering with Drill
- Amazon Simple Storage Service (see Amazon S3)
- Amazon Web Services (see AWS)
- analyzing complex and nested data (see querying complex and nested data)
- ANSI standards
- Apache Calcite (see Calcite)
- Apache Drill (see Drill)
- Apache Hadoop (see Hadoop)
- Apache Hive (see Hive)
- Apache Impala (see Impala)
- Apache Pig (see Pig)
- Apache RAT (see RAT, configuring)
- Apache Spark (see Spark)
- Apache Superset (see Superset)
- Apache Zeppelin (see Zeppelin)
- arrays, Arrays and Maps-Arrays in Drill
- Arrow, Data representation
- AS clause, Accessing Columns in a Query
- attacks against web servers, identifying, Analyzing URLs and query strings
- authentication
- autorestart, setting Drillbit to, Starting Drill in Distributed Mode
- AWS (Amazon Web Services), Hadoop
- Azure (see Microsoft Azure)
C
- Calcite, Statement Preparation, Logical and physical plans
- cardinality of datasets, TypeOf functions, JSON Lists in Drill, Drill’s Columnar Structure, Data Conversion Functions
- case sensitivity in Drill, Specifying a Default Data Source
- CASE statement, Data types in JSON files
- CAST function, Understanding Drill Data Types, Complex Data Conversion Functions, Data Conversion Functions
- cgroups, support in Drill 1.14, Controlling CPU Usage
- characters, extraneous, removing from data, Understanding Drill Data Types
- client applications for Drill, Elements of a Drill System, Drill Components
- cloud providers
- cloud storage, Connecting Drill to Data Sources
- clush (CLUster SHell), Distributing Drill Binaries and Configuration
- cluster coordinators, The Apache Hadoop Ecosystem
- cluster-id, Preparing Your Cluster for Drill
- clusters
- code format (regex format plug-in example), Copyright Headers and Code Format
- code generation, Code generation
- columns, Data representation
- accessing in a Drill SQL query, Accessing Columns in a Query
- column aliases in Drill vs. relational databases, Reserved Words in Column Names
- column aliases not support by GROUP BY in Drill, Summarizing Data with Aggregate Functions
- column names in JSON, JSON column names
- column projection accounting, Column Projection Accounting
- delimited data with column headers, Delimited Data with Column Headers
- Drill's columnar structure, Drill’s Columnar Structure
- illegal characters in column headers, Illegal Characters in Column Headers
- implicit columns in log format plug-in, Other Log Analysis with Drill
- nested and flat in Drill datasets, Accessing Columns in a Query
- reserved words in column names, Reserved Words in Column Names
- spaces in column names, Spaces in Column Names
- warning about incorrect names in Drill, Summarizing Data with Aggregate Functions
- ComplexWriter object, The ComplexWriter-Writing Aggregate User-Defined Functions
- compute engines, The Apache Hadoop Ecosystem
- configuration
- connecting Drill to data sources, Connecting Drill to Data Sources
- (see also querying multiple data sources)
- connecting to Drill, Connecting to Drill-Conclusion
- connection object, creating in Python, Using drillpy to Query Drill, Connecting to Drill Using pydrill
- connection strings
- context switches, Monitoring the Drill Process
- CONVERT_FROM function, Querying data from HBase, Data Conversion Functions
- CONVERT_TO function, Data Conversion Functions
- coordination tools in Hadoop ecosystem, The Apache Hadoop Ecosystem
- core-site.xml file, Querying Minio datastores from drill, Standalone Drill
- cp (classpath) storage plug-in, Choosing a Data Source, Configuring a New Storage Plug-in, Data Life Cycle: Data Exploration to Production
- CPU usage, controlling in Drill production deployment, Controlling CPU Usage-Controlling CPU Usage
- crash error in Drill 1.3, Full HDFS integration
- CREATE TABLE AS (CTAS) statement, PARTITION BY clause, Partitioning Data Directories
- CREATE TEMPORARY TABLE AS (CTTAS) command, SQL Session State
- CREATE VIEW statement, Creating Views
- Create, Read, Update and Delete (see CRUD operations)
- credentials, shared, caution with, Connecting Drill to a Relational Database
- credit card numbers (valid), finding and filtering, writing Drill UDF for, Use Case: Finding and Filtering Valid Credit Card Numbers
- CRUD operations, Drill Is a Query Engine, Not a Database
- cryptological and hashing functions, Cryptological and Hashing Functions
- CSV (comma-separated values) files, Querying Delimited Data
- arrays in, Arrays in Drill
- Drill configured with .csvh file type to accept files with headers, Delimited Data with Column Headers
- format variations and many different standards, File Format Variations
- querying a CSV file in Drill, Choosing a Data Source
- querying in a directory, Querying Directories
- schema, The SQL Relational Model
- splitting and reading with record reader, File Chunks
- summary of schema inference, CSV Summary-Explicit projection
- with header, schema inference for, CSV with header
- .csvh file type, Delimited Data with Column Headers, CSV with header
- cursor object, creating in Python, Using drillpy to Query Drill
D
- data affinity, Distribution
- data analysis using Drill, Data Analysis Using Drill-Common Problems in Querying Delimited Data
- data conversion functions, reference, Data Conversion Functions
- data definition language (DDL), Drill Is a Query Engine, Not a Database
- data engineering with Drill, Data Engineering with Drill-Conclusion
- aligning schemata across files, Aligning Schemas Across Files-Aligning Schemas Across Files
- data source inference, Data Source Inference-Default Schema
- distributed file scans, Distributed File Scans-Null versus missing values in JSON output
- file type inference, File Type Inference-File Format Variations
- JSON objects, JSON Objects-JSON Summary
- partitioning data directories, Partitioning Data Directories-Defining a Table Workspace
- schema inference overview, Schema Inference Overview-Schema Inference Overview
- schema-on-read, Schema-on-Read-Schema Inference
- using Drill with Parquet file format, Using Drill with the Parquet File Format
- working with queries in production, Working with Queries in Production-Running Challenging Queries in Scripts
- data formats, Introduction to Apache Drill
- data lakes, Introduction to Apache Drill
- data locality, Installing Drill
- data manipulation language (DML), Drill Is a Query Engine, Not a Database
- data partitioning (see partitioning)
- data representation by Drill, Data representation
- data shuffles (see shuffles)
- data source inference, Data Source Inference-Default Schema
- data sources, Connecting Drill to Data Sources
- data stores, Introduction to Apache Drill, What Is Apache Drill?
- data types
- data types (Drill), Understanding Drill Data Types-Cleaning and Preparing Data Using String Manipulation Functions
- data types (in JSON files), Data types in JSON files
- data.world dataset, querying, Other uses of the drill JDBC storage plug-in
- database-like storage engines, The Apache Hadoop Ecosystem
- databases, Introduction to Apache Drill
- dates and times
- Dean, Jeffrey, A Very Brief History of Big Data
- debugging
- default schema, Default Schema
- delimited data, Querying Delimited Data
- dependencies
- deploying Drill in production, Deploying Drill in Production-Conclusion
- DESCRIBE SCHEMA query, Defining a Workspace
- design-time configuration, Drill Module Configuration
- developer guidelines for Drill, Installing the IDE
- development environment, setting up, Setting Up Your Development Environment-Conclusion
- dfs storage plug-in, Choosing a Data Source, Other Log Analysis with Drill, Configuring a New Storage Plug-in, Storage Plug-ins
- dir variables, Querying Directories
- direct memory, Configuring Memory, Monitoring the Drill Process
- directories
- directory functions, Directory functions
- disk spilling (see spill-to-disk capabilities for operators)
- distance between strings, functions for, String Distance Functions
- distributed file scans, Distributed File Scans-Null versus missing values in JSON output
- distributed filesystems, Installing and Running Drill, Preparing Your Cluster for Drill, Drill Operation: The 30,000-Foot View, Querying Data in Hadoop from Drill, Writing a Format Plug-in
- distributed mode, Installing and Running Drill
- distributing Drill files, Distributing Drill files
- distribution of physical plan, Distribution
- Docker, The Apache Hadoop Ecosystem
- DOUBLE type, Data types in JSON files
- downstream, Logical and physical plans
- Drill
- additional resources, Comparing Drill with Similar Tools
- Apache Drll source code, Creating the Drill Build Environment
- as query engine, not a database, Drill Is a Query Engine, Not a Database
- comparing with similar tools, Comparing Drill with Similar Tools
- in the big data ecosystem, Drill in the Big Data Ecosystem
- operation, high-level view of, Drill Operation: The 30,000-Foot View
- overview of Drill in Hadoop ecosystem, Elements of a Drill System
- source style templates or IDE formatters, Installing the IDE
- testing Drill build in Maven, Building Drill from Source
- Drill Explorer, Other Interfaces
- Drill shell
- drill-client module (Node.js), Querying Drill Using Node.js
- drill-env.sh file, Creating a Site Directory, Creating a Site Directory
- drill-localhost script, Connecting to the Cluster
- drill-module.conf file, Create a Plug-In Project
- Drill-on-YARN, Drill-on-YARN
- drill-override.conf file, Creating a Site Directory
- Drillbits, Installing and Running Drill, Elements of a Drill System
- drillpy, Using drillpy to Query Drill
- DrillSimpleFunc interface, The Simple Function API
- dw storage plug-in, configuring, Other uses of the drill JDBC storage plug-in
- Dynamic UDF, User-Defined Functions and Custom Plug-ins
- dynamically installing Drill UDFs, Dynamically Installing a UDF
E
- ease of use (Drill), Drill Is Easy to Use
- Eclipse, Installing Maven
- embedded mode, Installing and Running Drill, Querying Delimited Data
- environment variables, Configuring ODBC on Linux or macOS, Configuring Memory
- DRILL_HOME, Installing Drill in Embedded Mode on macOS or Linux, Specifying a Default Data Source, Connecting Drill to a Relational Database, Default Schema, Creating a Site Directory, Configuring Logging, Testing the Configuration
- EXTN_CLASSPATH, Troubleshooting
- for distribution directory, Testing the Configuration
- for Drill installation on Windows, Special Configuration Instructions for Windows Installations
- setting on Mac OS X and Linux for ODBC config files, Configuring ODBC on Linux or macOS
- error handling, Starting the Drill Cluster
- crash error in Drill 1.13, Full HDFS integration
- data source errors, Logging Levels
- Error tab in query profile, Troubleshooting
- errors from lists of different types, JSON Lists in Drill
- errors from non-null values in arrays or maps, JSON Lists in Drill
- errors from spaces or reserved words in column names, Spaces in Column Names, Reserved Words in Column Names
- errors from Windows newlines, Illegal Characters in Column Headers
- errors with storage plug-ins, Choosing a Data Source
- floating-point numbers including decimal point, JSON Summary
- for record reader in regex format plug-in, Error Handling
- format inference and, Format Inference
- schema changes, Schema Inference Overview
- setting logging level to error, Configuring Logging
- SQL syntax errors, Parsing and semantic analysis
- tool or Drill operator strict about data types, Missing values
- Excel files, drilling, Drilling Excel Files-Using the Excel Format Plug-in
- exception handling
- Exchangeable Image File (EXIF) metadata, analysis of, Finding Photos Taken Within a Geographic Region
- exec.enable_union_type option, setting to true, Data types in JSON files
- explicit projection, Explicit projection, Explicit projection
- extension field in storage plug-in, Other Log Analysis with Drill
- external systems, Drill query capabilities, Drill Is Versatile
- EXTN_CLASSPATH enironment variable, Troubleshooting
- EXTRACT function, Date Arithmetic and Manipulation
- extract, transform, and load (ETL) process, Introduction to Apache Drill, Data Life Cycle: Data Exploration to Production
- extraneous characters, removing from data, Understanding Drill Data Types
F
- fields field in storage plug-in, Other Log Analysis with Drill
- file formats
- file storage plug-in, Storage Configurations
- file type inference, Data Source Inference, File Type Inference-File Format Variations
- filesystems
- filling empty values, Loading Data into Vectors
- filters, Accessing Columns in a Query, Complex Data Conversion Functions
- finding and filtering valid credit card numbers, writing Drill UDF for, Use Case: Finding and Filtering Valid Credit Card Numbers
- FLATTEN function, Using the FLATTEN() function to query split JSON files
- FoodMart sample dataset, Data Life Cycle: Data Exploration to Production
- Foreman (Drill server), Drill Components
- form submissions, analyzing for malicious activity, Analyzing URLs and query strings
- format configurations, Format Plug-ins and Format Configuration
- format field in storage plug-in, Other Log Analysis with Drill
- format inference, Format Inference
- format plug-ins, Format Plug-ins and Format Configuration, Writing a Format Plug-in-Conclusion
- additional details for advanced cases, Additional Details-Create a Plug-In Project
- creating Easy format plug-in, Creating the “Easy” Format Plug-in-Cautions Before Getting Started
- creating regex plug-in configuration class, Creating the Regex Plug-in Configuration Class-Creating the Format Plug-in Class
- creating regex plug-in format class, Creating the Format Plug-in Class-How Drill Finds Your Plug-in
- example regex plug-in, The Example Regex Format Plug-in
- Excel plug-in, The pom.xml File-Using the Excel Format Plug-in
- record reader for regex format plug-in, The Record Reader-Testing the Reader
- column projection accounting, Column Projection Accounting
- columnar structure in Drill, Drill’s Columnar Structure
- defining column names, Defining Column Names
- defining vectors, Defining Vectors
- error handling, Error Handling
- loading data into vectors, Loading Data into Vectors
- opening a file as an input stream, Opening the File
- project all, Project All
- project none, Project None
- project some, Project Some
- projection, Projection
- reading data, Reading Data
- record batches, Record Batches
- regex parsing, Regex Parsing
- releasing resources, Releasing Resources
- setup, Setup
- testing the reader shell, Testing the Reader Shell
- testing the record reader, Testing the Reader-Scaling Up
- format strings
- FormatCreator class, How Drill Finds Your Plug-in
- FormatPlugin class, How Drill Finds Your Plug-in
- fragments, Controlling CPU Usage
- FROM clause, Choosing a Data Source
- fs.default.name setting, Connecting Drill to Hive
- fs.defaultFS property, Full HDFS integration
- functions
H
- H20 platform, Making Predictions Within Drill
- Hadoop, A Very Brief History of Big Data
- Amazon S3 access keys, Access keys with Hadoop
- configuration directory, adding to Drill classpath, Full HDFS integration
- core-site.xml configuration file , Standalone Drill
- ecosystem, The Apache Hadoop Ecosystem-Drill Is a Query Engine, Not a Database
- file chunks, File Chunks
- joining data store to MySQL in Drill query, Querying Multiple Data Sources
- querying data from Drill, Querying Data in Hadoop from Drill
- tools attempting to provide SQL layer on, Comparing Drill with Similar Tools
- Hadoop Distributed File System (see HDFS)
- hash aggregator, disabling, Writing Aggregate User-Defined Functions
- hashing and cryptological functions, Cryptological and Hashing Functions
- HBase
- hbase storage plug-in, Connecting to and Querying HBase from Drill
- HBaseStorageHandler, Connecting to Hive with a remote metastore
- HDFS (Hadoop Distributed File System), Preparing Your Cluster for Drill, The Apache Hadoop Ecosystem, Data Engineering with Drill, Working with Apache Hadoop HDFS-Full HDFS integration
- hdfs storage plug-in, Other Log Analysis with Drill
- headers (column) in delimited data, Delimited Data with Column Headers
- heap memory, Configuring Memory, Monitoring the Drill Process
- Hive, Hadoop, Drill Is a Low-Latency Query Engine
- hive storage plug-in
- HiveQL, Querying Hive Data from Drill
- HOCON configuration system, Drill Module Configuration
- Holder objects
- HTTPD web server logs, configuring drill to read, Configuring Drill to Read HTTPD Web Server Logs
I
- I/O, monitoring in Drill production deployment, Monitoring the Drill Process
- IDEs (integrated development environments), Installing Maven
- illegal characters in column headers, Illegal Characters in Column Headers
- IMAXDIR function, Directory functions
- IMINDIR function, Directory functions
- Impala, Comparing Drill with Similar Tools, Drill Is a Low-Latency Query Engine
- impersonation (of Drill users), Security
- implicit projection, Explicit projection
- indexes, Drill Is a Query Engine, Not a Database
- info logging level, Configuring Logging, Logging Levels
- INNER JOIN, Specifying a Default Data Source
- input parameters, defining for Drill simple UDF, Defining input parameters
- installation
- installing Drill, Installing Drill-Starting the Drill Cluster
- configuring logging, Configuring Logging
- configuring memory, Configuring Memory
- configuring ZooKeeper, Configuring ZooKeeper
- creating a site directory, Creating a Site Directory
- distributing Drill binaries and configuration, Distributing Drill Binaries and Configuration
- in distributed mode on macOS or Linux, Installing Drill in Distributed Mode on macOS or Linux-Starting Drill in Distributed Mode
- in embedded mode on macOS or Linux, Installing Drill in Embedded Mode on macOS or Linux-Starting Drill on macOS or Linux in Embedded Mode
- in embedded mode on Windows, Installing Drill on Windows-Starting Drill on a Windows Machine
- preparing your machine for Drill, Preparing Your Machine for Drill-Installing Drill on Windows
- prerequisites, Prerequisites
- production installation, Production Installation
- starting the Drill cluster, Starting the Drill Cluster
- testing the installation, Testing the Installation
- IntelliJ, Installing Maven
- interactivity, adding in Zeppelin, Adding interactivity in Zeppelin
- interfaces (Drill), Other Interfaces, Understanding Drill’s Interfaces-Connecting to Drill with Python
- Interval data type, Date Arithmetic and Manipulation
- IP addresses, analyzing, Networking Functions
- ISO 8601 format for dates, Converting Strings to Dates
J
- Jackson serialized class, Creating the Regex Plug-in Configuration Class
- JAR (Java Archive) files
- Java
- classpath, Choosing a Data Source
- code generation by Drill on each Drillbit for each query, Code generation
- connecting to Drill with, Connecting to Drill Using Java-Querying Drill with PHP
- integrated development environments (IDEs), Creating the Drill Build Environment
- Java 8 Date Time format, Working with Dates and Times in Drill
- Java 8 prerequitiste for Drill installation, Prerequisites
- Javabean naming conventions, Creating the Regex Plug-in Configuration Class
- memory management in, Monitoring the Drill Process
- Open Office XML format, libraries to parse, Drilling Excel Files
- plug-ins, Storage Configurations
- java -version command, Special Configuration Instructions for Windows Installations
- java-exec package, Creating the Plug-in Package
- javax.jdo.option.ConnectionURL option, Connecting Drill to Hive
- JAVA_HOME environment variable, Special Configuration Instructions for Windows Installations
- JDBC (Java Database Connectivity), Data representation, Connecting Drill to Data Sources
- JDK (Java Development Kit), Prerequisites
- JDK 8, Preparing Your Machine for Drill, Preparing Your Cluster for Drill
- JMX (Java Management eXtensions), using JMX monitoring, Monitoring JMX Metrics
- Joda date/time formatting characters, Working with Dates and Times in Drill
- Joda date/time formatting strings, Drill Formatting Strings
- joins
- JSON (JavaScript Object Notation), Analyzing Complex and Nested Data
L
- launch command, custom, creating for Drill, Creating a Site Directory
- Lightbend HOCON configuration system, Drill Module Configuration
- Linux
- cgroups, support in Drill 1.14, Controlling CPU Usage
- configuring ODBC on, Configuring ODBC on Linux or macOS
- installing clush (CLUster SHell), Distributing Drill Binaries and Configuration
- installing Drill in distributed mode, Installing Drill in Distributed Mode on macOS or Linux-Starting Drill in Distributed Mode
- installing Drill in embedded mode, Installing Drill in Embedded Mode on macOS or Linux
- installing Maven, Installing Maven
- installing Superset on, Exploring Data with Apache Superset
- newline character (
) for line endings, Illegal Characters in Column Headers
- starting Drill on, Starting Drill on macOS or Linux in Embedded Mode
- lists (JSON), in Drill, JSON Lists in Drill
- little endian and big endian data, Querying data from HBase
- local storage configuration, Storage Configurations
- log files, analyzing with Drill, Analyzing Log Files with Drill-Other Log Analysis with Drill
- Logback
- LogFormat configuration option, Configuring Drill to Read HTTPD Web Server Logs
- logging
- logical plan for queries, Logical and physical plans
- low-latency features, Low-Latency Features-Network exchanges
- Luhn algorithm, Use Case: Finding and Filtering Valid Credit Card Numbers
M
- machine learning pipeline, using Drill in, Using Drill in a Machine Learning Pipeline-Conclusion
- macOS
- Macs
- major fragments, Logical and physical plans, Schema Inference Overview, Distributed File Scans, Controlling CPU Usage
- MapR
- MapR File System (MFS), Data Engineering with Drill
- MapR ODBC Drivers for Drill, Other Interfaces, Configuring ODBC on Linux or macOS
- MapR XD, Data Engineering with Drill
- MapReduce, The Apache Hadoop Ecosystem, Distributed Processing with HDFS
- MapReduce: Simplified Data Processing on Large Clusters (Dean and Ghemawat), A Very Brief History of Big Data
- maps
- Matcher class, Regex Parsing
- mathematical functions, Data Analysis Using Drill-Summarizing Data with Aggregate Functions, Math and Trigonometric Functions
- Maven
- Maven Old Java Objects (MOJOs), Making Predictions Within Drill
- MAXDIR function, Directory functions
- maxErrors field in storage plug-in, Other Log Analysis with Drill
- MBeans, Drill metrics available as, Monitoring JMX Metrics
- memory
- messages (Kafka), Connecting to and Querying Streaming Data with Drill and Kafka
- metadata
- metastore (Hive), Querying Hive Data from Drill
- Microsoft Azure, Connecting Drill to Cloud Storage
- Microsoft SQL Server, Connecting Drill to a Relational Database
- MINDIR function, Directory functions
- Minio data stores, querying from Drill, Querying Minio datastores from drill
- minor fragments, Distribution, Schema Inference Overview, Distributed File Scans, Controlling CPU Usage
- missing vs. null values in JSON output, Null versus missing values in JSON output
- MOJOs (Maven Old Java Objects), Making Predictions Within Drill
- MongoDB, Introduction to Apache Drill, Drill Is Versatile, Analyzing Complex and Nested Data
- monitoring, Monitoring-Monitoring Queries
- Mutator class, Column Projection Accounting
- MySQL, Connecting Drill to a Relational Database
N
- name node URL for HDFS, Full HDFS integration
- namespaces
- nested data structures, Drill support for, Accessing Columns in a Query
- nested data, querying, Querying Nested Data-Querying column-oriented JSON files with KVGEN()
- network exchanges, Network exchanges
- network packet analysis (PCAP) with Drill, Network Packet Analysis (PCAP) with Drill-Automating the process using an aggregate function
- networking functions in Drill, Networking Functions-Null Handling Functions
- newline encodings on different operating systems, Illegal Characters in Column Headers
- Node.js, querying Drill with, Querying Drill Using Node.js
- non-splittable files, Distributed File Scans, Distributed File Scans
- NoSQL data stores, Connecting to and Querying MongoDB from Drill
- NOT NULL cardinality, Casts to specify types
- null data
- aggregate functions handling nulls, Writing Aggregate User-Defined Functions
- dealing with NULL values in UDF, The Function File
- functions handling, Null Handling Functions
- JSON files containing, JSON scalar types
- leading null values in JSON files, Missing values, JSON Summary
- mapping of JSON scalar types to Drill types, JSON scalar types
- non-nullable VARCHAR, CSV with header, CSV Summary
- NOT NULL cardinality, Casts to specify types
- null strings in querry of URL metadata, Analyzing URLs and query strings
- Null type in JSON, Schema Inference for JSON
- NULL values, Data representation, TypeOf functions, Explicit projection, Explicit projection
- NULL values for missing JSON data, Missing values
- null values in JSON lists, JSON Lists in Drill
- null vs. missing values in JSON output, Null versus missing values in JSON output
- nullable INT, Summarizing Data with Aggregate Functions, Explicit projection, Missing values
- NULLABLE or NOT NULL cardinality, TypeOf functions
- nullable VARCHAR, Explicit projection, Project Some
- NullableVarChar, JSON reading all scalar fields as, Mixed string and number types
- NULLIF function, Null Handling Functions
- numbers
- numeric data type, converting stings to, Understanding Drill Data Types
O
- ODBC (Open Database Connectivity), Data representation, ODBC and Drill
- ODBC/JDBC interface, Installing and Running Drill, Elements of a Drill System, Other Interfaces, Connecting to Drill
- Office Open XML format, Java libraries for, Drilling Excel Files
- Open Database Connectivity (see ODBC; ODBC/JDBC interface)
- OpenTSDB, Querying Time Series Data from Drill and OpenTSDB-Conclusion
- operation overview (Drill), Drill Operation Overview-Network exchanges
- operators, Logical and physical plans
- OPTIONAL cardinality, Drill’s Columnar Structure
- Oracle, Connecting Drill to a Relational Database
- configuring Drill to query a database, Oracle
- Oracle Java SE Development Kit 8 (JDK 8), Preparing Your Machine for Drill, Preparing Your Cluster for Drill
- output value, defining for Drill UDFs, Setting the output value
- OutputMutator class, Defining Vectors
P
- Pandas DataFrame, Connecting to Drill with Python
- parallelizing of internal operators, Distribution
- parameter substitution in queries, Using drillpy to Query Drill
- Parquet format, Analyzing Complex and Nested Data
- parse_query function, Analyzing URLs and query strings
- parse_url function, Analyzing URLs and query strings
- parse_user_agent function, Accessing Maps (Key–Value Pairs) in Drill, Analyzing user agent strings
- parsing SQL statements, Parsing and semantic analysis
- PARTITION BY clause, Other analytic functions: Window functions, Partitioning Data Directories
- partitioning, Querying Directories, Data Life Cycle: Data Exploration to Production
- PATH environment variable, Special Configuration Instructions for Windows Installations
- Pattern class, Regex Parsing
- PCAP (Packet Capture), Network Packet Analysis (PCAP) with Drill
- (see also network packet analysis (PCAP) with Drill)
- PCAP Next Generation (PCAP-NG), Network Packet Analysis (PCAP) with Drill
- (see also network packet analysis (PCAP) with Drill)
- performance
- phonetic functions, Phonetic Functions
- PHP
- physical plan for queries, Logical and physical plans
- Pig, Hadoop
- Plain Old Java Objects (POJOs), Making Predictions Within Drill
- planner.cpu_load_average session option, Controlling CPU Usage
- POINT_AGGREGATE scope, Writing Aggregate User-Defined Functions
- pom.xml file (see Maven)
- PostgreSQL, Connecting Drill to a Relational Database
- preparation phase (SQL statements), Statement Preparation
- (see also statement prepartion by Drill)
- Presto, Comparing Drill with Similar Tools, Drill Is a Low-Latency Query Engine
- production, Deploying Drill in Production
- Project operator, Code generation
- projection, Projection
- projection push-down, Creating the Format Plug-in Class
- pull request, contributing code to Drill via, Contributing to Drill: The Pull Request
- Python, Drill Is Easy to Use
Q
- queries, working with in production, Working with Queries in Production-Running Challenging Queries in Scripts
- query engines, The Apache Hadoop Ecosystem
- query strings, analyzing with Drill, Analyzing URLs and query strings
- querying complex and nested data, Analyzing Complex and Nested Data-Conclusion
- querying delimited data, Querying Delimited Data-Conclusion
- querying multiple data sources, Querying Multiple Data Sources-Conclusion
- configuring new storage plug-in, Configuring a New Storage Plug-in
- connecting and querying Kudu from Drill, Connecting to and Querying Kudu
- connecting Drill to cloud storage, Connecting Drill to Cloud Storage-Querying Time Series Data from Drill and OpenTSDB
- connecting Drill to relational databases, Connecting Drill to a Relational Database-Querying Data in Hadoop from Drill
- connecting to and querying HBase from Drill, Connecting to and Querying HBase from Drill-Querying Hive Data from Drill
- connecting to and querying MongoDB from Drill, Connecting to and Querying MongoDB from Drill
- connecting to and querying streaming data from Drill and Kafka, Connecting to and Querying Streaming Data with Drill and Kafka-Improving the performance of Kafka queries
- querying Hadoop data from Drill, Querying Data in Hadoop from Drill
- querying Hive data from Drill, Querying Hive Data from Drill-Connecting to and Querying Streaming Data with Drill and Kafka
- querying time series data from Drill and OpenTSDB, Querying Time Series Data from Drill and OpenTSDB-Conclusion
R
- R language, Drill Is Easy to Use
- RAT, configuring, Configuring RAT
- RDBMS (relational database management system) (see rellational databases)
- record batches, Schema Inference Overview, The Record Reader, Record Batches
- record reader, The Record Reader-Testing the Reader
- column projection accounting, Column Projection Accounting
- creating RegexRecordReader class, The Record Reader
- defining column names, Defining Column Names
- defining vectors, Defining Vectors
- Drill's columnar structure, Drill’s Columnar Structure
- error handling, Error Handling
- for Excel plug-in, The Excel Custom Record Reader
- loading data into vectors, Loading Data into Vectors
- logging for, Logging
- opening a file as an input stream, Opening the File
- project all, Project All
- project none, Project None
- project some, Project Some
- projection, Projection
- reading data, Reading Data
- record batches, Record Batches
- regex parsing, Regex Parsing
- releasing resources, Releasing Resources
- setup, Setup
- testing, Testing the Reader-Scaling Up
- testing the reader shell, Testing the Reader Shell
- record-oriented files in JSON, querying, Querying record-oriented files
- regex field in storage plug-in, Other Log Analysis with Drill
- REGION (in core-site.xml), replacing with Amazon S3 endpoint, Standalone Drill
- regular expressions (regex)
- relational calculus format (SQL), Statement Preparation
- relational database management system (RDBMS) (see relational databases)
- relational databases, Drill Is a Query Engine, Not a Database
- REPEATED cardinality, Drill’s Columnar Structure
- repeated LIST, JSON Lists in Drill
- repeated MAP, JSON Lists in Drill
- REQUIRED cardinality, Drill’s Columnar Structure
- reserved words in column names, Spaces in Column Names, Reserved Words in Column Names
- RESTful interface, Other Interfaces, Connecting to Drill
- root workspace, Defining a Workspace
- row oriented and columnar data formats, Data representation
- RowSetUtilities.verify function, Testing the Wildcard Case
- ROW_NUMBER function, Querying column-oriented JSON files with KVGEN()
S
- s3a
- sample data for Drill, Data Life Cycle: Data Exploration to Production
- scan operators, Logical and physical plans, The Record Reader
- schema ambiguity, JSON scalar types
- schema change exception, Schema Inference Overview
- schema evolution, Schema Inference Overview
- schema inference, The SQL Relational Model, Schema Inference
- schema-free or schema-on-read model, Drill Is a Query Engine, Not a Database
- schema-on-read, Accessing Columns in a Query, Schema-on-Read-Schema Inference
- schema-on-write, Drill does not require you to define a schema, Accessing Columns in a Query, The SQL Relational Model
- SchemaBuilder, Testing the Wildcard Case
- schemas
- screen operator, Logical and physical plans
- scripting languages, Drill support for, Drill Is Easy to Use
- secret keys for Amazon S3, Getting access credentials for S3
- security, configuring for Drill in production deployment, Security
- SELECT statements, Drill Is a Query Engine, Not a Database
- semantic analysis of SQL statements, Parsing and semantic analysis
- separation of compute and storage, Installing Drill
- sergeant module (R), Connecting to Drill Using R-Connecting to Drill Using Java
- sessions
- SHOW DATABASES query, Defining a Workspace
- shuffles, Monitoring the Drill Process
- SIMD (Single Instruction, Multiple Data), Data representation
- single-node deployment, Installing Drill
- site directory, creating for Drill, Creating a Site Directory
- SORT function, Data Analysis Using Drill
- spaces in column names (see whitespace)
- Spark, Hadoop
- spill-to-disk capabilities for operators, Configuring Memory
- SPLIT function, Cleaning and Preparing Data Using String Manipulation Functions
- split JSON files, querying, Using the FLATTEN() function to query split JSON files
- splittable and non-splittable files, Distributed File Scans, Distributed File Scans
- spreadsheets, Querying Delimited Data
- SQL
- Apache Impala and Presto query engines, Drill Is a Low-Latency Query Engine
- Drill query format, Drill SQL Query Format-Understanding Drill Data Types
- entering queries at Drill prompt on macOS or Linux, Starting Drill on macOS or Linux in Embedded Mode
- entering queries at Drill prompt on Windows, Starting Drill on a Windows Machine
- inferring schama and mapping to SQL relational model, The SQL Relational Model
- learning more about, Querying Delimited Data
- operators in all caps, Specifying a Default Data Source
- queries converted into directed acyclic graph, Schema Inference Overview
- session state, SQL Session State
- statement preparation by Drill, Statement Preparation-Statement Execution
- tools attempting to provide SQL layer on Hadoop, Comparing Drill with Similar Tools
- typing queries into Zeppelin notebook, Querying Drill from a Zeppelin notebook
- use by Drill as query engine, Drill Is a Query Engine, Not a Database
- views, Data Life Cycle: Data Exploration to Production
- SQL databases having JDBC driver, Connecting Drill to a Relational Database
- SQL Lab, Configuring Superset to work with Drill
- SQLAlchemy, Exploring Data with Apache Superset
- SQLite, Connecting Drill to a Relational Database
- SQLLine, using to test error messages, Error Handling
- sqlTypeOf function, TypeOf functions
- SQRT function, Data Analysis Using Drill
- statement execution by Drill, Statement Execution
- statement preparation by Drill, Statement Preparation-Statement Execution
- statically installing a UDF, Statically Installing a UDF
- storage
- Storage Configuration panel, Choosing a Data Source
- storage configurations, Storage Configurations, Format Plug-ins and Format Configuration
- storage engines, The Apache Hadoop Ecosystem
- storage plug-ins, Connecting Drill to Data Sources, Storage Plug-ins, Format Plug-ins and Format Configuration, Writing a Format Plug-in
- configuring, Defining a Workspace
- configuring new plug-in, Configuring a New Storage Plug-in
- configuring to read HTTPD web server logs, Configuring Drill to Read HTTPD Web Server Logs
- configuring to read log files natively, Other Log Analysis with Drill
- Drill accessing data sources with, Choosing a Data Source
- getting information on, using PHP connector, Interacting with Drill from PHP
- hive, Querying Multiple Data Sources
- jdbc, other uses of, Other uses of the drill JDBC storage plug-in
- JSON configuration for kafka plug-in, Connecting to and Querying Streaming Data with Drill and Kafka
- kafka plug-in, enabling, Connecting to and Querying Streaming Data with Drill and Kafka
- kudu plug-in, creating, Connecting to and Querying Kudu
- mongo plug-in, Connecting to and Querying MongoDB from Drill
- openTSDB, Querying Time Series Data from Drill and OpenTSDB
- storage configuration for, Storage Configurations
- viewing configuration, Defining a Workspace
- store.json.read_numbers_as_double, Data types in JSON files
- streaming data, connecting to and querying with Drill and Kafka, Connecting to and Querying Streaming Data with Drill and Kafka-Improving the performance of Kafka queries
- string manipulation functions, Cleaning and Preparing Data Using String Manipulation Functions-Working with Dates and Times in Drill
- strings
- ST_AsText function, Geospatial Functions
- ST_GeoFromText function, Geospatial Functions
- subdirectories, accessing in queries, Querying Directories
- subqueries, Understanding Drill Data Types
- SUM function, Writing Drill User-Defined Functions
- Superset, Exploring Data with Apache Superset-Building a demonstration visualization using Drill and Superset
- SYN scans, Examples of Queries Using PCAP Data Files-Automating the process using an aggregate function
T
- TABLE function, Table Functions
- table.map.field technique, Accessing Maps (Key–Value Pairs) in Drill
- table.map.key technique, Accessing Maps (Key–Value Pairs) in Drill
- Tableau, Other Interfaces
- tables
- TAR files
- templates
- text format plug-in, Format Plug-ins and Format Configuration
- threads
- time series data, querying from Drill and OpenTSB
- time series data, querying from Drill and OpenTSDB, Querying Time Series Data from Drill and OpenTSDB-Conclusion
- Time to Insight (TTI), Introduction to Apache Drill
- time zones, Converting Strings to Dates, Date and Time Functions in Drill
- tmp workspace, Defining a Workspace
- topics (Kafka), mappng to SQL tables, Connecting to and Querying Streaming Data with Drill and Kafka
- TO_CHAR function, Reformatting numbers
- TO_DATE function, Working with Dates and Times in Drill, Converting Strings to Dates
- TO_NUMBER function, Understanding Drill Data Types, Complex Data Conversion Functions, Summarizing Data with Aggregate Functions
- TO_TIMESTAMP function, Converting Strings to Dates
- trigonometric functions, Math and Trigonometric Functions
- true and false, Data types in JSON files
- Twitter data, analyzing with Drill, Analyzing Twitter Data with Drill
- type function, Data types in JSON files
- typeof function, TypeOf functions
U
- UDFs (see user-defined functions)
- UDFTemplate.java file, The Function File
- UNION type, Data types in JSON files, JSON Summary
- unique uses of Drill, Unique Uses of Drill-Conclusion
- UPDATE SESSION command, SQL Session State
- upstream, Logical and physical plans
- URLs
- URS in web server logs, analyzing with Drill, Analyzing URLs and query strings
- USE command, Specifying a Default Data Source, Drill’s REST Interface, Running Challenging Queries in Scripts
- user agent strings from web server log, querying, Accessing Maps (Key–Value Pairs) in Drill, Analyzing user agent strings
- user, defining specifically for Drill in production deployment, Security
- user-defined functions (UDFs), Writing Drill User-Defined Functions-Conclusion
- aggregate function API, The Aggregate Function API
- aggregate function checking for syn-flood attempts, Automating the process using an aggregate function
- analyzing Twitter data, Analyzing Twitter Data with Drill
- complex functions returning maps or arrays, Complex Functions: UDFs That Return Maps or Arrays-Writing Aggregate User-Defined Functions
- example aggregate UDF, Kendall's rank correlation coefficient, Example Aggregate UDF: Kendall’s Rank Correlation Coefficient-Conclusion
- how UDFs work in Drill, How User-Defined Functions Work in Drill
- structure of simple Drill UDF, Structure of a Simple Drill UDF
- use case, finding and filtering valid credit card numbers, Use Case: Finding and Filtering Valid Credit Card Numbers
- using in Drill deployment, User-Defined Functions and Custom Plug-ins
- wrapper for machine learning model functionality, Writing the UDF Wrapper-Writing the UDF Wrapper
- writing aggregate UDFs, Writing Aggregate User-Defined Functions
- UserException class, Error Handling
- using the @JsonTypeName annotations, Creating the Regex Plug-in Configuration Class
- UTC, setting up Drill to use, Date and Time Functions in Drill
V
- value vectors, Data representation, Drill’s Columnar Structure
- VARCHAR type, Understanding Drill Data Types, Data types in JSON files, CSV with header
- vectors, Creating the “Easy” Format Plug-in, The Record Reader
- versioning, Drill plug-in configurations not versioned, Fixing Configuration Problems
- views, Data Life Cycle: Data Exploration to Production
- visualizations
Z
- Zeppelin
- ZooKeeper, Installing and Running Drill
- clearning Dill's state in, Fixing Configuration Problems
- configuration variables in JDBC URL for Drill connection, JDBC and Drill
- configurations stored in, Cautions Before Getting Started
- configuring, Configuring ZooKeeper
- connecting to Drill via single ZooKeeper instance or multinode cluster, JDBC and Drill
- coordination of Drill cluster, Elements of a Drill System
- Drill-on-YARN clusters, configuration, Drill-on-YARN
- installation, Prerequisites
- installing and configuring ZooKeeper cluster, Preparing Your Cluster for Drill
- IP addresses of quorum as comma-separated list, Connecting to and Querying HBase from Drill
- quorum hosts and ports, Connecting to Hive with a remote metastore
- server, coordinating Drillbits in a cluster, Drill Components
- storage configurations stored in, Storage Configurations
..................Content has been hidden....................
You can't read the all page of ebook, please click
here login for view all page.