Preface

The ability to analyze massive amounts of data is perhaps one of the most important developments of the 21st century. However, until recently, the tooling to analyze large datasets was exceedingly complex or expensive (or both). Apache Drill has the potential to change all that.

Apache Drill opens up incredible new possibilities for analyzing data because Drill enables you to query a variety of data sources using a standard language.

Who Should Read This Book

We envisioned this book for three groups of people: analysts or others who will be using Drill to query data, systems administrators who will be deploying and maintaining Drill in production environments, and developers who will be writing code to extend the functionality of Drill.

Why We Wrote This Book

Three years ago, Charles was introduced to Drill at the Strata Conference in San Jose, CA, and it sparked a realization that Drill could fundamentally change the way data is analyzed. After a few conversations with MapR chief scientist Ted Dunning, Charles realized that Drill had enormous unrealized potential for use with security-related datasets. However, at the time many of Drill’s capabilities were undocumented, and the availability of information about how to develop for Drill ranged from limited to nonexistent. Charles wanted to extend the capabilities of Drill, but he had no idea where to begin. This book is everything that Charles would have wanted if he were starting his journey with Drill today.1

Paul has worked at a number of business intelligence (BI) companies on a range of query and database tools. When he came across Drill, it seemed like the best of many different tools combined into one, while also being both open source and extensible. Paul joined the Drill team and has worked to get the word out about Drill’s capabilities.

This book shows you how to use Drill to analyze data effectively. The book is not intended to replace Drill’s documentation, but rather to serve as a guide to getting you on the right path with Drill. It represents the compilation of several years of lessons learned and should go a long way toward explaining what Drill is and how it solves user problems.

We also wrote this book for people who are interested in extending the capabilities of Drill. After you begin experimenting with Drill, you will likely develop ideas about missing functionality. When Charles started with Drill, the lack of documentation in this area was one of his biggest frustrations, and it is the goal of this book to remedy that situation. Chapters 8 through 12 cover in depth how to extend Drill’s functionality in easy-to-understand language.

Navigating This Book

This book is intended for three rather distinct audiences, each with different skill sets. Here’s how we address these audiences:

  • Chapters 1 through 3 are a general introduction to Drill. They will give you a good idea of how to get up and running.
  • Chapters 4 through 7 are intended for analysts, data scientists, and anyone who will be using Drill to analyze data. With the exception of Chapter 7, all of the chapters in this section require an understanding of SQL.
  • Chapters 8, 10, 11, and 12 discuss how to extend the functionality of Drill. These chapters require an understanding of Java development to get the most out of them.
  • Chapter 9 discusses the intricacies of installing and configuring Drill in a production environment. If you are a system administrator, you will want to read this chapter.
  • Chapter 13 covers many different and diverse use cases for Drill. Regardless of your role, you will want to read this chapter to really understand the power of Drill.

Online Resources

All the code and data files referenced in the book are available at the following repository on GitHub. Please use the Issues tab in GitHub to report any errata in the code.

Drill has comprehensive documentation as well.

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, email addresses, filenames, and file extensions.

Constant width

Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.

Constant width bold

Shows commands or other text that should be typed literally by the user.

Constant width italic

Shows text that should be replaced with user-supplied values or by values determined by context.

Tip

This element signifies a tip or suggestion.

Note

This element signifies a general note.

Warning

This element indicates a warning or caution.

Using Code Examples

Supplemental material (code examples, exercises, etc.) is available for download at https://github.com/cgivre/drillbook.

This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.

We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Learning Apache Drill by Charles Givre and Paul Rogers (O’Reilly). Copyright 2019 Charles Givre and Paul Rogers, 978-1-492-03279-3.”

If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at .

O’Reilly Safari

Note

Safari (formerly Safari Books Online) is a membership-based training and reference platform for enterprise, government, educators, and individuals.

Members have access to thousands of books, training videos, Learning Paths, interactive tutorials, and curated playlists from over 250 publishers, including O’Reilly Media, Harvard Business Review, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Adobe, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, and Course Technology, among others.

For more information, please visit http://oreilly.com/safari.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

  • O’Reilly Media, Inc.
  • 1005 Gravenstein Highway North
  • Sebastopol, CA 95472
  • 800-998-9938 (in the United States or Canada)
  • 707-829-0515 (international or local)
  • 707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://bit.ly/learning-apache-drill.

To comment or ask technical questions about this book, send email to .

For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Follow us on Twitter: http://twitter.com/oreillymedia

Watch us on YouTube: http://www.youtube.com/oreillymedia

Acknowledgments

The authors would like to thank Arina Ielchiieva, John Omernik, Aman Sinha, and Parth Chandra for taking time from their busy schedules to provide us with thorough technical reviews. We would also like to thank Jeff Bleiel and the entire O’Reilly editorial team for working with us to see this book to completion. Finally, we would like to thank the dedicated contributors to the Drill project, without whom this book would not be possible.

Special Thanks from Charles

I would especially like to thank my wife, Alisheva, and children, Mel, Dovie, Rozie, and Goldie, for putting up with my absences and late nights while writing this book and for supporting me as I pursued my interest in Drill and geekery/nerdcraft in general. I definitely couldn’t do it without you.

I would like thank all the members of the Drill development committee who worked with me and who have taught me so much about Java development, GitHub, and how to write production-quality code. I would also like to thank my coauthor, Paul, who has put up with countless questions and who has also taught me a lot about the internals of Drill.

Finally, thanks to Ted Dunning and Ellen Friedman for inviting me to contribute to this project, and the Drill Project Management Committee (PMC) for their lapses in judgment in making me a committer to the Drill project and most recently a PMC member.

Special Thanks from Paul

I wish to thank my wife, Anne, and children, Delaine, Forrest, and Pauline, for their patience as I disappeared on nights and weekends to peck away at this book.

I wish to also thank the Drill development team for their generous help during the two years I spent working on Drill and for the continued assistance in answering questions while writing the book. This book is a way to pass their knowledge along to others. Thanks to the original Drill developers, who built the product in record time, and to the later developers who continue to improve Drill. I am honored to serve on the Drill PMC with a wonderful group of contributors. Thanks also to Charles for driving the book to completion and for sharing his user perspective with all of us.

I would also like to thank MapR for funding the creation of Drill and for contributing it to the Apache Software Foundation (ASF) so that everyone can use it. Finally, thanks to the ASF for providing ongoing support for the Apache Drill project.

1 It feels very strange for Charles to write about himself in the third person.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset