Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Index

$DRILL_SITE variable, Creating a Site Directory
7-zip, Special Configuration Instructions for Windows Installations
? (question mark), parameter substitution with, Using drillpy to Query Drill
@JsonProperty annotations, Creating the Regex Plug-in Configuration Class
@Output annotations, Writing Aggregate User-Defined Functions
@Param annotations, Defining input parameters
@Workspace annotations, Writing Aggregate User-Defined Functions
` ` (backticks)
- enclosing column names with spaces or same as reserved words, Spaces in Column Names
- in names not valid SQL or same as SQL keyword, Workspaces
{ } (curly braces), map notation in JSON, Accessing Maps (Key–Value Pairs) in Drill
{ } (square brackets), referencing individual array items, Arrays in Drill

absolute references in Drill UDFs, Accessing data in holder objects
access credentials for Amazon S3, Getting access credentials for S3
access keys for Amazon S3, in Hadoop, Working with Amazon S3
- including in core-site.xml config file, Standalone Drill
admission control, Admission Control
- queues for small and large queries, Admission Control
- setting memory for small and large queries, Admission Control
- timeout in query queues, Admission Control
aggregate functions, Summarizing Data with Aggregate Functions-Summarizing Data with Aggregate Functions
- aggregate UDFs, Writing Drill User-Defined Functions
- comparison with window functions, Comparison of aggregate and window analytic functions-Common Problems in Querying Delimited Data
- reference of functions available as of Drill 1.14, Aggregate and Window Functions
- using to check for syn flood attacks, Automating the process using an aggregate function
- writing aggregate UDFs, Writing Aggregate User-Defined Functions
  - aggregate function API, The Aggregate Function API
  - example, Kendall's rank correlation coefficient, Example Aggregate UDF: Kendall’s Rank Correlation Coefficient-Conclusion
all-text mode, Data types in JSON files, JSON Lists in Drill, JSON Summary, Capturing Schema Mapping in Views
ALTER SESSION statement, Drill’s REST Interface
- ALTER SESSION SET, Data types in JSON files
  - turning on admission control, Admission Control
- using scripting to add parameters to a query, Running Challenging Queries in Scripts
Amazon EC2, Drill running in, Elements of a Drill System
Amazon S3, Connecting Drill to Cloud Storage, Data Engineering with Drill
- getting access credentials for, Getting access credentials for S3
- Hive data stored on, Querying Hive Data from Drill
- reads and network performance, Monitoring the Drill Process
- using for storage in Drill deployment, Working with Amazon S3-Troubleshooting
  - access keys stored with Hadoop, Access keys with Hadoop
  - defining Amazon S3 storage configuration, Defining the Amazon S3 storage configuration
  - distributing the configuration, Distributing the configuration
  - standalone Drill, Hadoop configuration file, Standalone Drill
  - troubleshooting configuration, Troubleshooting
Amazon Simple Storage Service (see Amazon S3)
Amazon Web Services (see AWS)
analyzing complex and nested data (see querying complex and nested data)
ANSI standards
- date format, Working with Dates and Times in Drill
- SQL, What Is Apache Drill?, Drill SQL Query Format
Apache Calcite (see Calcite)
Apache Drill (see Drill)
Apache Hadoop (see Hadoop)
Apache Hive (see Hive)
Apache Impala (see Impala)
Apache Pig (see Pig)
Apache RAT (see RAT, configuring)
Apache Spark (see Spark)
Apache Superset (see Superset)
Apache Zeppelin (see Zeppelin)
arrays, Arrays and Maps-Arrays in Drill
- accessing individual items in Drill, Accessing Columns in a Query
- ARRAY cardinality, TypeOf functions
- Drill support for, Accessing Columns in a Query
- in JSON, JSON Lists in Drill
- JSON record-oriented data encode as array of maps, Querying record-oriented files
- notation in Drill, Arrays in Drill
- qureying in Drill, Arrays in Drill
- returned by FLATTEN function, Using the FLATTEN() function to query split JSON files
- values, Arrays in Drill
Arrow, Data representation
AS clause, Accessing Columns in a Query
- renaming columns in queries with, Reserved Words in Column Names
attacks against web servers, identifying, Analyzing URLs and query strings
authentication
- configuring for Drill in production deployment, Security
- from Drill, Querying Data in Hadoop from Drill
- in Superset, Exploring Data with Apache Superset
autorestart, setting Drillbit to, Starting Drill in Distributed Mode
AWS (Amazon Web Services), Hadoop
Azure (see Microsoft Azure)

ease of use (Drill), Drill Is Easy to Use
Eclipse, Installing Maven
- failure reading function class error, Testing Empty Projection
- installing, Installing the IDE
- refreshing all files after Drill updates from Git, Installing the IDE
embedded mode, Installing and Running Drill, Querying Delimited Data
- connecting to Drill in, JDBC and Drill
- settings saved to /tmp directory, Specifying a Default Data Source
environment variables, Configuring ODBC on Linux or macOS, Configuring Memory
- DRILL_HOME, Installing Drill in Embedded Mode on macOS or Linux, Specifying a Default Data Source, Connecting Drill to a Relational Database, Default Schema, Creating a Site Directory, Configuring Logging, Testing the Configuration
- EXTN_CLASSPATH, Troubleshooting
- for distribution directory, Testing the Configuration
- for Drill installation on Windows, Special Configuration Instructions for Windows Installations
- setting on Mac OS X and Linux for ODBC config files, Configuring ODBC on Linux or macOS
error handling, Starting the Drill Cluster
- crash error in Drill 1.13, Full HDFS integration
- data source errors, Logging Levels
- Error tab in query profile, Troubleshooting
- errors from lists of different types, JSON Lists in Drill
- errors from non-null values in arrays or maps, JSON Lists in Drill
- errors from spaces or reserved words in column names, Spaces in Column Names, Reserved Words in Column Names
- errors from Windows newlines, Illegal Characters in Column Headers
- errors with storage plug-ins, Choosing a Data Source
- floating-point numbers including decimal point, JSON Summary
- for record reader in regex format plug-in, Error Handling
- format inference and, Format Inference
- schema changes, Schema Inference Overview
- setting logging level to error, Configuring Logging
- SQL syntax errors, Parsing and semantic analysis
- tool or Drill operator strict about data types, Missing values
Excel files, drilling, Drilling Excel Files-Using the Excel Format Plug-in
- custom record reader for Excel plug-in, The Excel Custom Record Reader
- Maven pom.xml file for Excel plug-in, The pom.xml File
- using Excel format plug-in, Using the Excel Format Plug-in
exception handling
- Drill UserException class, Error Handling
- ExecutionSetupException, Error Handling
- index out of bounds exception, Illegal Characters in Column Headers
- NumberFormatException, Understanding Drill Data Types
- schema change exception, Schema Inference Overview
Exchangeable Image File (EXIF) metadata, analysis of, Finding Photos Taken Within a Geographic Region
exec.enable_union_type option, setting to true, Data types in JSON files
explicit projection, Explicit projection, Explicit projection
- CAST with, Casts to specify types
- testing in record reader, Testing Explicit Projection
extension field in storage plug-in, Other Log Analysis with Drill
external systems, Drill query capabilities, Drill Is Versatile
EXTN_CLASSPATH enironment variable, Troubleshooting
EXTRACT function, Date Arithmetic and Manipulation
extract, transform, and load (ETL) process, Introduction to Apache Drill, Data Life Cycle: Data Exploration to Production
- using to create Parquet files, Using Drill with the Parquet File Format
extraneous characters, removing from data, Understanding Drill Data Types

fields field in storage plug-in, Other Log Analysis with Drill
file formats
- defining in regex format plug-in, The Example Regex Format Plug-in
- Drill support for, Drill Is Versatile
file storage plug-in, Storage Configurations
file type inference, Data Source Inference, File Type Inference-File Format Variations
- (see also data source inference)
- file format variations, File Format Variations
- format inference, Format Inference
- format plug-ins and format configuration, Format Plug-ins and Format Configuration
filesystems
- Drill spport for distributed filesystems, Drill Is Versatile
- insufficient file splits causing query performance problem, Monitoring Queries
- local configuration pointing to local filesystem, Workspaces
filling empty values, Loading Data into Vectors
filters, Accessing Columns in a Query, Complex Data Conversion Functions
- filtering results from window function, Comparison of aggregate and window analytic functions
finding and filtering valid credit card numbers, writing Drill UDF for, Use Case: Finding and Filtering Valid Credit Card Numbers
FLATTEN function, Using the FLATTEN() function to query split JSON files
FoodMart sample dataset, Data Life Cycle: Data Exploration to Production
Foreman (Drill server), Drill Components
form submissions, analyzing for malicious activity, Analyzing URLs and query strings
format configurations, Format Plug-ins and Format Configuration
format field in storage plug-in, Other Log Analysis with Drill
format inference, Format Inference
format plug-ins, Format Plug-ins and Format Configuration, Writing a Format Plug-in-Conclusion
- additional details for advanced cases, Additional Details-Create a Plug-In Project
  - contributing to Drill, pull request, Contributing to Drill: The Pull Request
  - creating a plug-in project, Create a Plug-In Project
  - default format configuration, Default Format Configuration
  - file chunks, File Chunks
  - maintaining the code, next steps, Next Steps
  - maintaining your branch of Drill repo, Maintaining Your Branch
  - production build, Production Build
- creating Easy format plug-in, Creating the “Easy” Format Plug-in-Cautions Before Getting Started
  - cautions before starting, Cautions Before Getting Started
  - creating plug-in package, Creating the Plug-in Package
  - Drill module configuration, Drill Module Configuration
  - Maven pom.xml file, Creating the Maven pom.xml File-Creating the Maven pom.xml File
  - plug-in configuration, Format Plug-in Configuration
- creating regex plug-in configuration class, Creating the Regex Plug-in Configuration Class-Creating the Format Plug-in Class
  - copyright headers and code format, Copyright Headers and Code Format
  - fixing configuration problems, Fixing Configuration Problems
  - testing the configuration, Testing the Configuration
  - troubleshooting, Troubleshooting
- creating regex plug-in format class, Creating the Format Plug-in Class-How Drill Finds Your Plug-in
  - configuring RAT to check copyright header, Configuring RAT
  - creating test file, Creating a Test File
  - creating unit test, Creating the Unit Test
  - efficient debugging, Efficient Debugging
  - how Drill finds your plug-ins, How Drill Finds Your Plug-in
- example regex plug-in, The Example Regex Format Plug-in
- Excel plug-in, The pom.xml File-Using the Excel Format Plug-in
- record reader for regex format plug-in, The Record Reader-Testing the Reader
  - column projection accounting, Column Projection Accounting
  - columnar structure in Drill, Drill’s Columnar Structure
  - defining column names, Defining Column Names
  - defining vectors, Defining Vectors
  - error handling, Error Handling
  - loading data into vectors, Loading Data into Vectors
  - opening a file as an input stream, Opening the File
  - project all, Project All
  - project none, Project None
  - project some, Project Some
  - projection, Projection
  - reading data, Reading Data
  - record batches, Record Batches
  - regex parsing, Regex Parsing
  - releasing resources, Releasing Resources
  - setup, Setup
  - testing the reader shell, Testing the Reader Shell
- testing the record reader, Testing the Reader-Scaling Up
  - scaling up, Scaling Up
  - testing empty projection, Testing Empty Projection
  - testing explicit projection, Testing Explicit Projection
  - testing wildcard (SELECT *) case, Testing the Wildcard Case
format strings
- log format string, Configuring Drill to Read HTTPD Web Server Logs
- log format strings from Apache server, Configuring Drill to Read HTTPD Web Server Logs
- reference listing of Drill format strings, Drill Formatting Strings
FormatCreator class, How Drill Finds Your Plug-in
FormatPlugin class, How Drill Finds Your Plug-in
fragments, Controlling CPU Usage
- (see also major fragments; minor fragments)
- currently running, drill.fragments.running, Monitoring JMX Metrics
- record readers and, The Record Reader
FROM clause, Choosing a Data Source
- modifying to query data.world dataset, Other uses of the drill JDBC storage plug-in
- OpenTDSB required parameters, Special considerations for time series data
- structure in Drill queries, Querying Multiple Data Sources
- structuring when querying a relational database, Querying an RDBMS from Drill
fs.default.name setting, Connecting Drill to Hive
fs.defaultFS property, Full HDFS integration
functions
- reference listing of Drill functions, List of Drill Functions-String Distance Functions
- regular functions vs. Drill UDFs, How User-Defined Functions Work in Drill
- structure of Drill UDFs, Structure of a Simple Drill UDF-Conclusion

geographic information systems (GIS) functionality, Finding Photos Taken Within a Geographic Region
- reference of geo-spatial functions in Drill, Geospatial Functions
Ghemawat, Sanjay, A Very Brief History of Big Data
Git
- creating new branch for regex plug-in code, Creating the “Easy” Format Plug-in
- maintaining your branch of Drill repo, Maintaining Your Branch
- setting up and getting Apache Drill source code, Setting Up Git and Getting the Source Code
GitHub
- Apache GitHub repo for Drill, Creating the Drill Build Environment
- Drill source code and developer documentation, Comparing Drill with Similar Tools
- forking Apache Drill source code into your GitHub repo, Setting Up Git and Getting the Source Code
- repository for this book, Querying Delimited Data
Google Cloud Storage (GCS), Connecting Drill to Cloud Storage
- connecting Drill to, Connecting to other cloud storage services
Google Dremel paper (Melnik et al.), Hadoop
Google File System (Ghemawat et al.), A Very Brief History of Big Data
GROUP BY operation, Logical and physical plans, Comparison of aggregate and window analytic functions
- no support for column aliases in Drill, Summarizing Data with Aggregate Functions
- use with aggregate functions, Writing Aggregate User-Defined Functions
- using aggregate UDFs with, Writing Drill User-Defined Functions
- using with aggregate functions, Summarizing Data with Aggregate Functions

launch command, custom, creating for Drill, Creating a Site Directory
Lightbend HOCON configuration system, Drill Module Configuration
Linux
- cgroups, support in Drill 1.14, Controlling CPU Usage
- configuring ODBC on, Configuring ODBC on Linux or macOS
- installing clush (CLUster SHell), Distributing Drill Binaries and Configuration
- installing Drill in distributed mode, Installing Drill in Distributed Mode on macOS or Linux-Starting Drill in Distributed Mode
  - starting Drill, Starting Drill in Distributed Mode
- installing Drill in embedded mode, Installing Drill in Embedded Mode on macOS or Linux
- installing Maven, Installing Maven
- installing Superset on, Exploring Data with Apache Superset
- newline character ( ) for line endings, Illegal Characters in Column Headers
- starting Drill on, Starting Drill on macOS or Linux in Embedded Mode
lists (JSON), in Drill, JSON Lists in Drill
little endian and big endian data, Querying data from HBase
local storage configuration, Storage Configurations
log files, analyzing with Drill, Analyzing Log Files with Drill-Other Log Analysis with Drill
- configuring Drill to read HTTPD web server logs, Configuring Drill to Read HTTPD Web Server Logs
- other types of log files, Other Log Analysis with Drill
- querying web server logs, Querying Web Server Logs
  - analyzing URLs and query strings, Analyzing URLs and query strings
Logback
- configration in logback.xml file, Configuring Logging
- rolling log feature to cap log file size, Logging Levels
LogFormat configuration option, Configuring Drill to Read HTTPD Web Server Logs
logging
- configuring for Drill, Configuring Logging
- configuring logging levels for Drill production deployment, Logging Levels
- defining logger for record reader for regex format plug-in, Logging
- examining log files for installation mistakes, Testing the Installation
logical plan for queries, Logical and physical plans
low-latency features, Low-Latency Features-Network exchanges
- code generation, Code generation
- network exchanges, Network exchanges
Luhn algorithm, Use Case: Finding and Filtering Valid Credit Card Numbers

ODBC (Open Database Connectivity), Data representation, ODBC and Drill
- client unable to handle schema changes, Schema Inference Overview
- configuring on Linux or Mac OS X, Configuring ODBC on Linux or macOS
- configuring on Windows, Configuring ODBC on Windows
- Python pyODBC module, Other Ways of Connecting to Drill from Python
ODBC/JDBC interface, Installing and Running Drill, Elements of a Drill System, Other Interfaces, Connecting to Drill
Office Open XML format, Java libraries for, Drilling Excel Files
Open Database Connectivity (see ODBC; ODBC/JDBC interface)
OpenTSDB, Querying Time Series Data from Drill and OpenTSDB-Conclusion
operation overview (Drill), Drill Operation Overview-Network exchanges
- main Drill components, Drill Components
- SQL session state, SQL Session State
- statement execution, Statement Execution
  - data representation, Data representation
- statement preparation, Statement Preparation
  - distribution, Distribution
  - logical and physical plans, Logical and physical plans
  - parsing and semantic analysis, Parsing and semantic analysis
operators, Logical and physical plans
- metrics on, in monitoring of Drill queries, Monitoring Queries
OPTIONAL cardinality, Drill’s Columnar Structure
Oracle, Connecting Drill to a Relational Database
- configuring Drill to query a database, Oracle
Oracle Java SE Development Kit 8 (JDK 8), Preparing Your Machine for Drill, Preparing Your Cluster for Drill
output value, defining for Drill UDFs, Setting the output value
OutputMutator class, Defining Vectors

R language, Drill Is Easy to Use
- connecting to Drill with, Connecting to Drill Using R-Connecting to Drill Using Java
  - accessing other functionality in Drill RESTful interface, Accessing other functionality in R
  - querying Drill using sergeant, Querying Drill from R Using sergeant
- H20 platform with R interface, Building and Serializing a Model
RAT, configuring, Configuring RAT
RDBMS (relational database management system) (see rellational databases)
record batches, Schema Inference Overview, The Record Reader, Record Batches
record reader, The Record Reader-Testing the Reader
- column projection accounting, Column Projection Accounting
- creating RegexRecordReader class, The Record Reader
- defining column names, Defining Column Names
- defining vectors, Defining Vectors
- Drill's columnar structure, Drill’s Columnar Structure
- error handling, Error Handling
- for Excel plug-in, The Excel Custom Record Reader
- loading data into vectors, Loading Data into Vectors
- logging for, Logging
- opening a file as an input stream, Opening the File
- project all, Project All
- project none, Project None
- project some, Project Some
- projection, Projection
- reading data, Reading Data
- record batches, Record Batches
- regex parsing, Regex Parsing
- releasing resources, Releasing Resources
- setup, Setup
- testing, Testing the Reader-Scaling Up
- testing the reader shell, Testing the Reader Shell
record-oriented files in JSON, querying, Querying record-oriented files
regex field in storage plug-in, Other Log Analysis with Drill
REGION (in core-site.xml), replacing with Amazon S3 endpoint, Standalone Drill
regular expressions (regex)
- defining log file format, Other Log Analysis with Drill
- example regex format plug-in, The Example Regex Format Plug-in
relational calculus format (SQL), Statement Preparation
relational database management system (RDBMS) (see relational databases)
relational databases, Drill Is a Query Engine, Not a Database
- column aliases in, difference from Drill, Reserved Words in Column Names
- connecting Drill to, Connecting Drill to a Relational Database-Querying Data in Hadoop from Drill
  - configuring Drill to query a RDBMS, Configuring Drill to query an RDBMS
  - querying a RDMBS from Drill, Querying an RDBMS from Drill
- Drill and, Choosing a Data Source
- schema-on-write, Accessing Columns in a Query, The SQL Relational Model
REPEATED cardinality, Drill’s Columnar Structure
repeated LIST, JSON Lists in Drill
repeated MAP, JSON Lists in Drill
REQUIRED cardinality, Drill’s Columnar Structure
reserved words in column names, Spaces in Column Names, Reserved Words in Column Names
RESTful interface, Other Interfaces, Connecting to Drill
- accessing functionality from R, using sergeant module, Accessing other functionality in R
- limitations of Drill's RESTful interface, Drill’s REST Interface
- PHP connector module using, Using the Connector
- pydrill wrapper for, Connecting to Drill Using pydrill
- sessions and USE operator, Specifying a Default Data Source
- setting up for Drill, Drill’s REST Interface
root workspace, Defining a Workspace
row oriented and columnar data formats, Data representation
RowSetUtilities.verify function, Testing the Wildcard Case
ROW_NUMBER function, Querying column-oriented JSON files with KVGEN()

web interface for drill, Ways of Querying Data with Drill
- no sessions or USE operator, Specifying a Default Data Source
web server logs, querying with Drill, Analyzing Log Files with Drill-Analyzing URLs and query strings
- analyzing URLs and query strings, Analyzing URLs and query strings
- configuring Drill to read HTTPD web server logs, Configuring Drill to Read HTTPD Web Server Logs
- querying the web server logs, Querying Web Server Logs
Well-Known Text (WKT) representation format for spatial data, Geospatial Functions
wget command, Production Installation
WHERE clause
- fields to include to improve performance of Kafa queries, Improving the performance of Kafka queries
- variables causing Zeppelin to dynamically redraw visualization, Adding interactivity in Zeppelin
whitespace
- in JSON, Querying record-oriented files
- spaces in column names, Spaces in Column Names
wildcards
- in directory names, Querying Directories
- wildcard projection, Explicit projection
window functions, Other analytic functions: Window functions
- comparison with aggregate functions, Comparison of aggregate and window analytic functions-Common Problems in Querying Delimited Data
- reference of functions available as of Drill 1.14, Window Functions
- summary of, Other analytic functions: Window functions
Windows
- configuring for Drill and installing, Special Configuration Instructions for Windows Installations
- configuring ODBC on, Configuring ODBC on Windows
- downloading and installing Maven, Installing Maven
- new lines encoded with carriage return character ( ), Illegal Characters in Column Headers
- no support for Superset, Exploring Data with Apache Superset
worker Drillbit servers, Drill Components
workspaces, Workspaces
- and directory queries, Querying Directories
- becoming default schema, Default Schema
- defining, Defining a Workspace
- defining a table workspace, Defining a Table Workspace
- specifying in Drill SQL query, Choosing a Data Source
- using for variables carried over between function iterations, Putting It All Together

XML format (Excel files), Drilling Excel Files

YARN, The Apache Hadoop Ecosystem
- using as cluster manager in Drill-on-YARN deployment, Drill-on-YARN
YYYY-MM-DD date format, Converting Strings to Dates

Zeppelin
- exploring data with Zeppelin and Drill, Exploring Data with Apache Zeppelin and Drill-Adding interactivity in Zeppelin
  - adding interactivity in Zeppelin, Adding interactivity in Zeppelin
  - configuring Zeppelin to query Drill, Configuring Zeppelin to query Drill
  - querying Drill from Zeppelin notebook, Querying Drill from a Zeppelin notebook
ZooKeeper, Installing and Running Drill
- clearning Dill's state in, Fixing Configuration Problems
- configuration variables in JDBC URL for Drill connection, JDBC and Drill
- configurations stored in, Cautions Before Getting Started
- configuring, Configuring ZooKeeper
- connecting to Drill via single ZooKeeper instance or multinode cluster, JDBC and Drill
- coordination of Drill cluster, Elements of a Drill System
- Drill-on-YARN clusters, configuration, Drill-on-YARN
- installation, Prerequisites
- installing and configuring ZooKeeper cluster, Preparing Your Cluster for Drill
- IP addresses of quorum as comma-separated list, Connecting to and Querying HBase from Drill
- quorum hosts and ports, Connecting to Hive with a remote metastore
- server, coordinating Drillbits in a cluster, Drill Components
- storage configurations stored in, Storage Configurations

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Index

Create new playlist

Sign In

Sign Up

Index

Symbols

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z

Table of Contents for
Index