Index

A note on the digital index

A link in an index entry is displayed as the section title in which that entry appears. Because some sections have multiple index markers, it is not unusual for an entry to have several links to the same section. Clicking on any link will take you directly to the place in the text in which the marker appears.

A

access dates, partitioning users by, Partitioning users by last access dateReducer code, Querying for user reputation by last access dateDriver code
anonymizing data, MotivationReducer code, Anonymous comments and distinct usersDriver code
antijoin operations, A Refresher on Joins
Apache Hadoop (see Hadoop)
audio, trends in nature of data, Images, Audio, and Video
averages, calculating, Average exampleData flow diagram

B

BigTable design (Google), Google BigTable
binning pattern
description, Pattern DescriptionPerformance analysis
examples, Binning by Hadoop-related tagsMapper code
Bloom filtering pattern
description, Pattern DescriptionPerformance analysis
examples, Hot listMapper Code
reduce side joins with, Reputable user and comment joinComment mapper code
Bloom filters
about, Overview
downsides, Downsides
tweaking, Tweaking Your Bloom Filter
use cases, Use CasesGoogle BigTable
Bloom, Burton Howard, Overview
BloomFilter class, Bloom filter training

C

Cartesian product pattern
description, Pattern DescriptionPerformance Analysis
examples, Comment ComparisonMapper code
Cartesian products, A Refresher on Joins
chain folding
about, Chain FoldingChain Folding
ChainMapper class and, The ChainMapper and ChainReducer Approach, Driver code
ChainReducer class and, The ChainMapper and ChainReducer Approach, Driver code
examples, Bin users by reputationDriver code
ChainMapper class, The ChainMapper and ChainReducer Approach, Driver code
ChainReducer class
about, Patterns as a Library or Component
chain folding example, The ChainMapper and ChainReducer Approach, Driver code
CombineFileInputFormat class, Job Chaining
combiner phase (Hadoop), MapReduce and Hadoop Refresher
comments
about, The Examples in This Book
anonymizing, Anonymizing StackOverflow comments, Anonymous comments and distinct usersDriver code
building on StackOverflow, Post/comment building on StackOverflowReducer code
generating random, Generating random StackOverflow commentsRecordReader code
reduce side join example, User and comment joinCombiner optimization
self-joining, Comment ComparisonMapper code
Comparator interface, MapReduce and Hadoop Refresher
composite join pattern
description, Pattern DescriptionPerformance analysis
examples, Composite user comment joinReducer and combiner
CompositeInputFormat class
Cartesian project examples, Input format code
composite join examples, Motivation, Composite user comment join
CompositeInputSplit class, Input format code
Configurable interface, Partitioner code
Configuration class, Main method, Helper methods
Context interface, Mapper Code
ControlledJob class, With JobControlHelper methods
count of a field, Minimum, maximum, and count exampleData flow diagram
Counting Bloom Filter, Downsides
counting with counters pattern
description, Pattern DescriptionPerformance analysis
examples, Number of users per stateDriver code
CreationDate XML attribute, Mapper code
CROSS statement (Pig), Resemblances
Cutting, Doug, MapReduce History

D

data cleansing, Known uses
data organization patterns
binning pattern, Pattern DescriptionMapper code
generating data pattern, Data Organization Patterns, Pattern DescriptionRecordReader code
partitioning pattern, Pattern DescriptionReducer code
shuffling pattern, Pattern DescriptionReducer code
structured to hierarchical pattern, Pattern DescriptionReducer code
total order sorting pattern, Pattern DescriptionOrder reducer code
Date class, Mapper code
Dean, Jeffrey, MapReduce History
deduplication, Motivation
design patterns
about, Design Patterns
data organization patterns, Data Organization PatternsReducer code
effects of YARN, The Effects of YARN
filtering patterns, Filtering PatternsCombiner optimization
importance of, Pig and Hive
input and output patterns, Input and Output PatternsDriver code
join patterns, Join PatternsMapper code
as libraries or components, Patterns as a Library or Component
MapReduce and, Design Patterns and MapReduceDesign Patterns
metapatterns, MetapatternsDriver code
sharing, How You Can Help
summarization patterns, Summarization PatternsDriver code
trends in nature of data, Trends in the Nature of DataStreaming Data
DISTINCT operation (Pig), Resemblances
distinct pattern
description, Pattern DescriptionPerformance analysis
examples, Distinct user IDsCombiner optimization
distributed grep, Known uses, Distributed grep
DistributedCache class
Bloom filtering examples, Mapper code, Mapper Code, Reputable user and comment join
chain folding example, Bin users by reputation, Driver code
generating data examples, RecordReader code
job chaining examples, Basic job chaining, Driver code
reduced side join examples, Reputable user and comment join
replicated join examples, Replicated user comment example
DocumentBuilder class, Reducer code

E

Element class, Reducer code
external source input pattern
description, Pattern DescriptionPerformance analysis
examples, Reading from Redis InstancesDriver code
external source output pattern
description, Pattern DescriptionPerformance analysis
examples, Writing to Redis instancesDriver Code

F

FileInputFormat class
customizing input and output, InputFormat, OutputFormat
“Word Count” program example, Hadoop Example: Word Count
FileOutputCommitter class, OutputFormat
FileOutputFormat class
customizing input and output, OutputFormat
external source output examples, Writing to Redis instances
“Word Count” program example, Hadoop Example: Word Count
FileSystem class, Bloom filter training, OutputFormat
FILTER keyword (Pig), Resemblances
filtering pattern
description, Pattern DescriptionPerformance analysis
examples, Distributed grepMapper Code
filtering patterns
Bloom filtering pattern, Pattern DescriptionMapper Code
distinct pattern, Pattern DescriptionCombiner optimization
filtering pattern, Pattern DescriptionMapper Code
top ten pattern, Pattern DescriptionReducer code
FOREACH … GENERATE expression (Pig), Resemblances
FSDataInputStream class, InputFormat
full outer joins, A Refresher on Joins, A Refresher on Joins

G

“The Gang of Four” book, Preface, Design Patterns
generating data pattern
about, Data Organization Patterns
description, Pattern DescriptionPerformance analysis
examples, Generating random StackOverflow commentsRecordReader code
Ghemawat, Sanjay, MapReduce History
Google BigTable design, Google BigTable
grep tool, Known uses, Distributed grep
GROUP BY clause (SQL), Resemblances
GROUP … BY expression (Pig), Resemblances

H

Hadoop
about, The Examples in This Book
design patterns and, Design Patterns
historical overview, MapReduce History
map tasks, MapReduce and Hadoop RefresherMapReduce and Hadoop Refresher
reduce tasks, MapReduce and Hadoop RefresherMapReduce and Hadoop Refresher
“Word Count” program example, Hadoop Example: Word CountHadoop Example: Word Count
Hadoop Distributed File System (HDFS), MapReduce and Hadoop Refresher, Structure
HashMap class
about, The Examples in This Book
numerical summarizations example, Combiner optimization
Redis hash and, Writing to Redis instances
replicated join examples, Mapper code
HBase database
Bloom filter example, HBase Query using a Bloom filterMapper Code
updating data and, Motivation
HDFS (Hadoop Distributed File System), MapReduce and Hadoop Refresher, Structure
Hive data warehouse, Pig and Hive
hot list of keywords example, Hot listMapper code
HStreaming product, Streaming Data

I

identity reducers, Structure
IdentityMapper class, Structure
images, trends in nature of data, Images, Audio, and Video
inner joins
about, A Refresher on Joins
protecting against explosions, Known uses
input and output patterns
about, Input and Output Patterns
customizing input and output, Customizing Input and Output in HadoopRecordWriter
external source input pattern, Pattern DescriptionDriver code
external source output pattern, Pattern DescriptionDriver Code
generating data pattern, Pattern DescriptionRecordReader code
partition pruning pattern, Pattern DescriptionDriver code
input format, MapReduce and Hadoop Refresher, InputFormat
input splits, MapReduce and Hadoop Refresher, InputFormat
InputFormat class
about, Customizing Input and Output in HadoopInputFormat
createRecordReader method, InputFormat
external source input examples, Structure, InputFormat code
generating data examples, Structure, InputFormat code
getSplits method, InputFormat, Structure
partition pruning examples, InputFormat code
InputSampler class, Driver code
InputSplit class
about, InputFormat
external source input examples, Structure, InputSplit code
partition pruning examples, InputSplit code
IntWritable class, Hadoop Example: Word Count
inverted index pattern
description, Pattern DescriptionPerformance analysis
examples, Wikipedia reference inverted indexCombiner optimization

J

job chaining
about, Job Chaining
examples, Basic job chainingDriver code
with job control, With JobControlHelper methods
with master drivers, With the Driver
parallel, Parallel job chainingDriver code
with shell scripting, With Shell ScriptingSample run
Job class
about, Hadoop Example: Word Count
isComplete method, With the Driver
setCombinerClass method, Hadoop Example: Word Count
setNumReduceTasks method, Reducer code
submit method, With the Driver, Driver code
waitForCompletion method, With the Driver, Driver code
job merging
about, Metapatterns, Job MergingJob Merging
examples, Anonymous comments and distinct usersDriver code
JobConf class, Driver code
JobControl class, With the Driver, With JobControlHelper methods
join operations
about, A Refresher on Joins
antijoins, A Refresher on Joins
Cartesian products, A Refresher on Joins
inner joins, A Refresher on Joins
outer joins, A Refresher on JoinsA Refresher on Joins
join patterns
about, Join Patterns
Cartesian product pattern, Pattern DescriptionMapper code
composite join pattern, Pattern DescriptionReducer and combiner
reduce side join pattern, Pattern DescriptionComment mapper code
replicated join pattern, Pattern DescriptionMapper code

K

KeyValueTextOutputFormat class, Composite user comment join
keywords hot list example, Hot listMapper code

L

left outer joins, A Refresher on Joins
LineRecordReader class
about, InputFormat
partition pruning examples, Structure
LineRecordWriter class, OutputFormat
LongSumReducer class, Bin users by reputation
LongWritable class, Hadoop Example: Word Count

M

Map class, Mapper code
map function, Mapper Code
map phase (Hadoop), MapReduce and Hadoop Refresher, Chain Folding
map tasks (Hadoop)
about, MapReduce and Hadoop Refresher
combiner phase, MapReduce and Hadoop Refresher
map phase, MapReduce and Hadoop Refresher, Chain Folding
partitioner phase, MapReduce and Hadoop Refresher
record reader phase, MapReduce and Hadoop Refresher
reduce tasks and, MapReduce and Hadoop Refresher
mapred API, The Examples in This Book, Driver code
MapReduce
about, Design Patterns and MapReduce
design patterns and, Design Patterns and MapReduceDesign Patterns
historical overview, MapReduce History
Pig and Hive considerations, Pig and Hive
mapreduce API, The Examples in This Book, Driver code
maximum value of a field, Minimum, maximum, and count exampleData flow diagram
median, calculating, Median and standard deviationData flow diagram
metapatterns
about, Metapatterns
chain folding, Chain FoldingDriver code
job chaining, Job ChainingHelper methods
job merging, Job MergingDriver code
minimum value of a field, Minimum, maximum, and count exampleData flow diagram
modulus operation, MapReduce and Hadoop Refresher
MongoDB database, Known uses
MRDPUtils.transformXmlToMap helper function, Hadoop Example: Word Count
multidimensional data, Images, Audio, and Video
MultipleInputs class, Structure, Driver code, Driver code
MultipleOutputs class
about, Patterns as a Library or Component
binning pattern and, Structure, Driver code
chain folding example, Binning mapper code, Driver code
job chaining examples, Job two mapper, Driver code
job merging examples, Job Merging, Merged reducer code

N

NullOutputFormat class
binning examples, Mapper code
chain folding examples, Driver code
partition pruning examples, OutputFormat code
NullWritable class
job chaining examples, Mapper code
job merging examples, Merged reducer code
top ten examples, Reducer code
total order sorting examples, Order reducer code
Numerical Aggregation pattern, Resemblances
numerical summarizations pattern
description, Pattern DescriptionPerformance analysis
examples, Minimum, maximum, and count exampleData flow diagram

O

Oozie project, Job Chaining
outer joins, A Refresher on JoinsA Refresher on Joins
outlier analysis, Known uses
output committers, OutputFormat, Consequences
output format phase (Hadoop), MapReduce and Hadoop Refresher
output patterns (see input and output patterns)
OutputFormat class
about, Customizing Input and Output in Hadoop, OutputFormat
checkOutputSpecs method, OutputFormat
external source output examples, Structure, OutputFormat code
getOutputCommitter method, OutputFormat
getRecordWriter method, OutputFormat, RecordWriter
partition pruning examples, OutputFormat code, OutputFormat code

P

parallel job chaining, Parallel job chainingDriver code
partition pruning pattern
description, Pattern Description
examples, Partitioning by last access date to Redis instancesDriver code
partitioner phase (Hadoop), MapReduce and Hadoop Refresher
partitioning pattern
description, Pattern DescriptionPerformance analysis
examples, Partitioning users by last access dateReducer code
Path interface, Driver code
patterns (see design patterns)
Pig language
about, Pig and Hive
COGROUP method, Resemblances
CROSS statement, Resemblances
DISTINCT operation, Resemblances
FILTER keyword, Resemblances
FOREACH … GENERATE expression, Resemblances
GROUP … BY expression, Resemblances
hierarchical data structures and, Resemblances
join operations, Resemblances, Resemblances
ordering in, Resemblances
shuffling data in, Resemblances
SPLIT operation, Resemblances
top ten pattern considerations, Resemblances
posts
about, The Examples in This Book
building on StackOverflow, Post/comment building on StackOverflowReducer code
pruning partitions, Known uses, Pattern DescriptionDriver code

R

random sampling of data, Known uses, Simple Random Sampling
RandomSampler class, Driver code
record counts
counting with counters example, Motivation, Known usesDriver code
numerical summarizations example, Known uses
record reader phase (Hadoop), MapReduce and Hadoop Refresher
RecordReader class
about, Customizing Input and Output in HadoopRecordReader
close method, RecordReader
external source input examples, Structure, RecordReader code
generating data examples, Structure, RecordReader code
getCurrentKey method, RecordReader
getCurrentValue method, RecordReader
getProgress method, RecordReader
initialize method, RecordReader
nextKeyValue method, RecordReader
partition pruning examples, Structure, RecordReader code
records, filtering out, Known uses
RecordWriter class
about, Customizing Input and Output in Hadoop, RecordWriter
close method, RecordWriter
external source output examples, Structure, RecordWriter code
partition pruning examples, RecordWriter code
write method, RecordWriter
Redis key-value store
external source input examples, Reading from Redis InstancesDriver code
external source output examples, Writing to Redis instancesDriver Code
partition pruning examples, Partitioning by last access date to Redis instances
reduce function, MapReduce and Hadoop Refresher, Hadoop Example: Word Count
reduce phase (Hadoop), MapReduce and Hadoop Refresher
reduce side join pattern
with Bloom filter, Reputable user and comment joinComment mapper code
description, Pattern DescriptionPerformance analysis
examples, User and comment joinCombiner optimization
reduce tasks (Hadoop)
about, MapReduce and Hadoop Refresher
map tasks and, MapReduce and Hadoop Refresher
output format phase, MapReduce and Hadoop Refresher
reduce phase, MapReduce and Hadoop Refresher
shuffle phase, MapReduce and Hadoop Refresher
sort phase, MapReduce and Hadoop Refresher
replicated join pattern
description, Pattern DescriptionPerformance analysis
examples, Replicated user comment exampleMapper code
right outer joins, A Refresher on Joins, A Refresher on Joins

S

sampling data, Filtering Patterns, Known uses, Simple Random Sampling
SciDB database, Images, Audio, and Video
SELECT DISTINCT statement (SQL), Resemblances
self-joining comments, Comment ComparisonMapper code
SequenceFile class, Consequences, Analyze mapper code
SequenceFileOutputFormat class, Driver code
setup function, Mapper code, Mapper Code
sharding data, Known uses
shell scripts, job chaining in, With Shell ScriptingSample run
shuffle phase (Hadoop), MapReduce and Hadoop Refresher
shuffling pattern
description, Pattern DescriptionPerformance analysis
examples, Anonymizing StackOverflow commentsReducer code
simple random sampling (SRS), Known uses, Simple Random Sampling
sort phase (Hadoop), MapReduce and Hadoop Refresher
SortedMap interface, Reducer code
SortedMapWritable class, Mapper codeData flow diagram
sorting pattern
description, Pattern DescriptionPerformance analysis
examples, Sort users by last visitOrder reducer code
SPLIT operation (Pig), Resemblances
SQL
GROUP BY clause, Resemblances
hierarchical data structures and, Resemblances
join operations, Resemblances
ordering data by random value, Resemblances
ordering in, Resemblances
partition pruning and, Resemblances
SELECT DISTINCT statement, Resemblances
top ten pattern considerations, Resemblances
WHERE clause, Resemblances, Resemblances
SRS (simple random sampling), Known uses, Simple Random Sampling
StackOverflow
about, The Examples in This Book
anonymizing comments, Anonymizing StackOverflow comments, Anonymous comments and distinct users
comments table, The Examples in This Book
generating random comments, Generating random StackOverflow commentsRecordReader code
post/comment building on, Post/comment building on StackOverflowReducer code
posts table, The Examples in This Book
question/answer building on, Question/answer building on StackOverflowReducer code
self-joining comments, Comment ComparisonMapper code
updating data and, Motivation
user and comment joins, User and comment joinCombiner optimization
users table, The Examples in This Book
standard deviation, calculating, Median and standard deviationData flow diagram
streaming data, Streaming Data
String class
composite join example, Driver code
inverted index example, Wikipedia reference inverted index
job merging examples, TaggedText WritableComparable
StringTokenizer class, Hadoop Example: Word Count
structured to hierarchical pattern
description, Pattern DescriptionPerformance analysis
examples, Post/comment building on StackOverflowReducer code
summarization patterns
counting with counters pattern, Pattern DescriptionDriver code
inverted index pattern, Pattern DescriptionCombiner optimization
numerical summarizations pattern, Pattern DescriptionData flow diagram

T

temporary files, Job Chaining
Text class
composite join examples, Composite user comment join, Mapper code
job merging examples, TaggedText WritableComparable, TaggedText WritableComparable
“Word Count” program example, Hadoop Example: Word Count
TextInputFormat class
customizing input and output, InputFormat, RecordReader
“Word Count” program example, Hadoop Example: Word Count
TextOutputFormat class
composite join examples, Composite user comment join
customizing input and output, OutputFormat
“Word Count” program example, Hadoop Example: Word Count
top ten pattern
description, Pattern DescriptionPerformance analysis
examples, Top ten users by reputationReducer code
total order sorting pattern
description, Pattern DescriptionPerformance analysis
examples, Sort users by last visitOrder reducer code
TotalOrderPartitioner class
about, Patterns as a Library or Component
total order sorting pattern and, Structure, Driver code, Analyze mapper code
tracking threads of events, Known uses
TreeMap class
numerical summarizations example, Reducer code
top ten example, Mapper code
TupleWritable class, Mapper code

V

video, trends in nature of data, Images, Audio, and Video
viewing data, Known uses

W

WHERE clause (SQL), Resemblances, Resemblances
White, Tom, MapReduce and Hadoop Refresher
Wikipedia reference inverted index example, Wikipedia reference inverted indexCombiner optimization
“Word Count” program example (Hadoop), Hadoop Example: Word CountHadoop Example: Word Count
word counts
numerical summarizations example, Known uses
“Word Count” program example, Hadoop Example: Word CountHadoop Example: Word Count
WordCountMapper class, Hadoop Example: Word Count
Writable interface, InputSplit code
WritableComparable interface
about, RecordReader
job merging examples, TaggedText WritableComparable
partition pruning examples, Custom WritableComparable code
Writeable interface
numerical summarization example, MinMaxCountTuple code
“Word Count” program example, Hadoop Example: Word Count

Y

YARN (Yet Another Resource Negotiator), The Effects of YARN
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset