A note on the digital index A link in an index entry is displayed as the section title in which that entry appears. Because some sections have multiple index markers, it is not unusual for an entry to have several links to the same section. Clicking on any link will take you directly to the place in the text in which the marker appears.
A access dates, partitioning users by, Partitioning users by last access date –Reducer code , Querying for user reputation by last access date –Driver code anonymizing data, Motivation –Reducer code , Anonymous comments and distinct users –Driver code antijoin operations, A Refresher on Joins Apache Hadoop (see Hadoop) audio, trends in nature of data, Images, Audio, and Video averages, calculating, Average example –Data flow diagram B BigTable design (Google), Google BigTable binning pattern description, Pattern Description –Performance analysis examples, Binning by Hadoop-related tags –Mapper code Bloom filtering pattern description, Pattern Description –Performance analysis examples, Hot list –Mapper Code reduce side joins with, Reputable user and comment join –Comment mapper code Bloom filters about, Overview downsides, Downsides tweaking, Tweaking Your Bloom Filter use cases, Use Cases –Google BigTable Bloom, Burton Howard, Overview BloomFilter class, Bloom filter training C Cartesian product pattern description, Pattern Description –Performance Analysis examples, Comment Comparison –Mapper code Cartesian products, A Refresher on Joins chain folding about, Chain Folding –Chain Folding ChainMapper class and, The ChainMapper and ChainReducer Approach , Driver code ChainReducer class and, The ChainMapper and ChainReducer Approach , Driver code examples, Bin users by reputation –Driver code ChainMapper class, The ChainMapper and ChainReducer Approach , Driver code ChainReducer class about, Patterns as a Library or Component chain folding example, The ChainMapper and ChainReducer Approach , Driver code CombineFileInputFormat class, Job Chaining combiner phase (Hadoop), MapReduce and Hadoop Refresher comments about, The Examples in This Book anonymizing, Anonymizing StackOverflow comments , Anonymous comments and distinct users –Driver code building on StackOverflow, Post/comment building on StackOverflow –Reducer code generating random, Generating random StackOverflow comments –RecordReader code reduce side join example, User and comment join –Combiner optimization self-joining, Comment Comparison –Mapper code Comparator interface, MapReduce and Hadoop Refresher composite join pattern description, Pattern Description –Performance analysis examples, Composite user comment join –Reducer and combiner CompositeInputFormat class Cartesian project examples, Input format code composite join examples, Motivation , Composite user comment join CompositeInputSplit class, Input format code Configurable interface, Partitioner code Configuration class, Main method , Helper methods Context interface, Mapper Code ControlledJob class, With JobControl –Helper methods count of a field, Minimum, maximum, and count example –Data flow diagram Counting Bloom Filter, Downsides counting with counters pattern description, Pattern Description –Performance analysis examples, Number of users per state –Driver code CreationDate XML attribute, Mapper code CROSS statement (Pig), Resemblances Cutting, Doug, MapReduce History D data cleansing, Known uses data organization patterns binning pattern, Pattern Description –Mapper code generating data pattern, Data Organization Patterns , Pattern Description –RecordReader code partitioning pattern, Pattern Description –Reducer code shuffling pattern, Pattern Description –Reducer code structured to hierarchical pattern, Pattern Description –Reducer code total order sorting pattern, Pattern Description –Order reducer code Date class, Mapper code Dean, Jeffrey, MapReduce History deduplication, Motivation design patterns about, Design Patterns data organization patterns, Data Organization Patterns –Reducer code effects of YARN, The Effects of YARN filtering patterns, Filtering Patterns –Combiner optimization importance of, Pig and Hive input and output patterns, Input and Output Patterns –Driver code join patterns, Join Patterns –Mapper code as libraries or
components, Patterns as a Library or Component MapReduce and, Design Patterns and MapReduce –Design Patterns metapatterns, Metapatterns –Driver code sharing, How You Can Help summarization patterns, Summarization Patterns –Driver code trends in nature of data, Trends in the Nature of Data –Streaming Data DISTINCT operation (Pig), Resemblances distinct pattern description, Pattern Description –Performance analysis examples, Distinct user IDs –Combiner optimization distributed grep, Known uses , Distributed grep DistributedCache class Bloom filtering examples, Mapper code , Mapper Code , Reputable user and comment join chain folding example, Bin users by reputation , Driver code generating data examples, RecordReader code job chaining examples, Basic job chaining , Driver code reduced side join examples, Reputable user and comment join replicated join examples, Replicated user comment example DocumentBuilder class, Reducer code F FileInputFormat class customizing input and output, InputFormat , OutputFormat “Word Count” program
example, Hadoop Example: Word Count FileOutputCommitter class, OutputFormat FileOutputFormat class customizing input and output, OutputFormat external source output examples, Writing to Redis instances “Word Count” program
example, Hadoop Example: Word Count FileSystem class, Bloom filter training , OutputFormat FILTER keyword (Pig), Resemblances filtering pattern description, Pattern Description –Performance analysis examples, Distributed grep –Mapper Code filtering patterns Bloom filtering pattern, Pattern Description –Mapper Code distinct pattern, Pattern Description –Combiner optimization filtering pattern, Pattern Description –Mapper Code top ten pattern, Pattern Description –Reducer code FOREACH … GENERATE expression (Pig), Resemblances FSDataInputStream class, InputFormat full outer joins, A Refresher on Joins , A Refresher on Joins G “The Gang of Four” book, Preface , Design Patterns generating data pattern about, Data Organization Patterns description, Pattern Description –Performance analysis examples, Generating random StackOverflow comments –RecordReader code Ghemawat, Sanjay, MapReduce History Google BigTable design, Google BigTable grep tool, Known uses , Distributed grep GROUP BY clause (SQL), Resemblances GROUP … BY expression (Pig), Resemblances H Hadoop about, The Examples in This Book design patterns and, Design Patterns historical overview, MapReduce History map tasks, MapReduce and Hadoop Refresher –MapReduce and Hadoop Refresher reduce tasks, MapReduce and Hadoop Refresher –MapReduce and Hadoop Refresher “Word Count” program
example, Hadoop Example: Word Count –Hadoop Example: Word Count Hadoop Distributed File System (HDFS), MapReduce and Hadoop Refresher , Structure HashMap class about, The Examples in This Book numerical summarizations example, Combiner optimization Redis hash and, Writing to Redis instances replicated join examples, Mapper code HBase database Bloom filter example, HBase Query using a Bloom filter –Mapper Code updating data and, Motivation HDFS (Hadoop Distributed File System), MapReduce and Hadoop Refresher , Structure Hive data warehouse, Pig and Hive hot list of keywords example, Hot list –Mapper code HStreaming product, Streaming Data I identity reducers, Structure IdentityMapper class, Structure images, trends in nature of data, Images, Audio, and Video inner joins about, A Refresher on Joins protecting against explosions, Known uses input and output patterns about, Input and Output Patterns customizing input and output, Customizing Input and Output in Hadoop –RecordWriter external source input pattern, Pattern Description –Driver code external source output pattern, Pattern Description –Driver Code generating data pattern, Pattern Description –RecordReader code partition pruning pattern, Pattern Description –Driver code input format, MapReduce and Hadoop Refresher , InputFormat input splits, MapReduce and Hadoop Refresher , InputFormat InputFormat class about, Customizing Input and Output in Hadoop –InputFormat createRecordReader method, InputFormat external source input examples, Structure , InputFormat code generating data examples, Structure , InputFormat code getSplits method, InputFormat , Structure partition pruning examples, InputFormat code InputSampler class, Driver code InputSplit class about, InputFormat external source input examples, Structure , InputSplit code partition pruning examples, InputSplit code IntWritable class, Hadoop Example: Word Count inverted index pattern description, Pattern Description –Performance analysis examples, Wikipedia reference inverted index –Combiner optimization J job chaining about, Job Chaining examples, Basic job chaining –Driver code with job control, With JobControl –Helper methods with master drivers, With the Driver parallel, Parallel job chaining –Driver code with shell scripting, With Shell Scripting –Sample run Job class about, Hadoop Example: Word Count isComplete method, With the Driver setCombinerClass method, Hadoop Example: Word Count setNumReduceTasks method, Reducer code submit method, With the Driver , Driver code waitForCompletion method, With the Driver , Driver code job merging about, Metapatterns , Job Merging –Job Merging examples, Anonymous comments and distinct users –Driver code JobConf class, Driver code JobControl class, With the Driver , With JobControl –Helper methods join operations about, A Refresher on Joins antijoins, A Refresher on Joins Cartesian products, A Refresher on Joins inner joins, A Refresher on Joins outer joins, A Refresher on Joins –A Refresher on Joins join patterns about, Join Patterns Cartesian product pattern, Pattern Description –Mapper code composite join pattern, Pattern Description –Reducer and combiner reduce side join pattern, Pattern Description –Comment mapper code replicated join pattern, Pattern Description –Mapper code M Map class, Mapper code map function, Mapper Code map phase (Hadoop), MapReduce and Hadoop Refresher , Chain Folding map tasks (Hadoop) about, MapReduce and Hadoop Refresher combiner phase, MapReduce and Hadoop Refresher map phase, MapReduce and Hadoop Refresher , Chain Folding partitioner phase, MapReduce and Hadoop Refresher record reader phase, MapReduce and Hadoop Refresher reduce tasks and, MapReduce and Hadoop Refresher mapred API, The Examples in This Book , Driver code MapReduce about, Design Patterns and MapReduce design patterns and, Design Patterns and MapReduce –Design Patterns historical overview, MapReduce History Pig and Hive considerations, Pig and Hive mapreduce API, The Examples in This Book , Driver code maximum value of a field, Minimum, maximum, and count example –Data flow diagram median, calculating, Median and standard deviation –Data flow diagram metapatterns about, Metapatterns chain folding, Chain Folding –Driver code job chaining, Job Chaining –Helper methods job merging, Job Merging –Driver code minimum value of a field, Minimum, maximum, and count example –Data flow diagram modulus operation, MapReduce and Hadoop Refresher MongoDB database, Known uses MRDPUtils.transformXmlToMap helper function, Hadoop Example: Word Count multidimensional data, Images, Audio, and Video MultipleInputs class, Structure , Driver code , Driver code MultipleOutputs class about, Patterns as a Library or Component binning pattern and, Structure , Driver code chain folding example, Binning mapper code , Driver code job chaining examples, Job two mapper , Driver code job merging examples, Job Merging , Merged reducer code N NullOutputFormat class binning examples, Mapper code chain folding examples, Driver code partition pruning examples, OutputFormat code NullWritable class job chaining examples, Mapper code job merging examples, Merged reducer code top ten examples, Reducer code total order sorting examples, Order reducer code Numerical Aggregation pattern, Resemblances numerical summarizations pattern description, Pattern Description –Performance analysis examples, Minimum, maximum, and count example –Data flow diagram O Oozie project, Job Chaining outer joins, A Refresher on Joins –A Refresher on Joins outlier analysis, Known uses output committers, OutputFormat , Consequences output format phase (Hadoop), MapReduce and Hadoop Refresher output patterns (see input and output patterns) OutputFormat class about, Customizing Input and Output in Hadoop , OutputFormat checkOutputSpecs method, OutputFormat external source output examples, Structure , OutputFormat code getOutputCommitter method, OutputFormat getRecordWriter method, OutputFormat , RecordWriter partition pruning examples, OutputFormat code , OutputFormat code P parallel job chaining, Parallel job chaining –Driver code partition pruning pattern description, Pattern Description examples, Partitioning by last access date to Redis instances –Driver code partitioner phase (Hadoop), MapReduce and Hadoop Refresher partitioning pattern description, Pattern Description –Performance analysis examples, Partitioning users by last access date –Reducer code Path interface, Driver code patterns (see design patterns) Pig language about, Pig and Hive COGROUP method, Resemblances CROSS statement, Resemblances DISTINCT operation, Resemblances FILTER keyword, Resemblances FOREACH … GENERATE expression, Resemblances GROUP … BY expression, Resemblances hierarchical data structures and, Resemblances join operations, Resemblances , Resemblances ordering in, Resemblances shuffling data in, Resemblances SPLIT operation, Resemblances top ten pattern considerations, Resemblances posts about, The Examples in This Book building on StackOverflow, Post/comment building on StackOverflow –Reducer code pruning partitions, Known uses , Pattern Description –Driver code R random sampling of data, Known uses , Simple Random Sampling RandomSampler class, Driver code record counts counting with counters example, Motivation , Known uses –Driver code numerical summarizations example, Known uses record reader phase (Hadoop), MapReduce and Hadoop Refresher RecordReader class about, Customizing Input and Output in Hadoop –RecordReader close method, RecordReader external source input examples, Structure , RecordReader code generating data examples, Structure , RecordReader code getCurrentKey method, RecordReader getCurrentValue method, RecordReader getProgress method, RecordReader initialize method, RecordReader nextKeyValue method, RecordReader partition pruning examples, Structure , RecordReader code records, filtering out, Known uses RecordWriter class about, Customizing Input and Output in Hadoop , RecordWriter close method, RecordWriter external source output examples, Structure , RecordWriter code partition pruning examples, RecordWriter code write method, RecordWriter Redis key-value store external source input examples, Reading from Redis Instances –Driver code external source output examples, Writing to Redis instances –Driver Code partition pruning examples, Partitioning by last access date to Redis instances reduce function, MapReduce and Hadoop Refresher , Hadoop Example: Word Count reduce phase (Hadoop), MapReduce and Hadoop Refresher reduce side join pattern with Bloom filter, Reputable user and comment join –Comment mapper code description, Pattern Description –Performance analysis examples, User and comment join –Combiner optimization reduce tasks (Hadoop) about, MapReduce and Hadoop Refresher map tasks and, MapReduce and Hadoop Refresher output format phase, MapReduce and Hadoop Refresher reduce phase, MapReduce and Hadoop Refresher shuffle phase, MapReduce and Hadoop Refresher sort phase, MapReduce and Hadoop Refresher replicated join pattern description, Pattern Description –Performance analysis examples, Replicated user comment example –Mapper code right outer joins, A Refresher on Joins , A Refresher on Joins S sampling data, Filtering Patterns , Known uses , Simple Random Sampling SciDB database, Images, Audio, and Video SELECT DISTINCT statement (SQL), Resemblances self-joining comments, Comment Comparison –Mapper code SequenceFile class, Consequences , Analyze mapper code SequenceFileOutputFormat class, Driver code setup function, Mapper code , Mapper Code sharding data, Known uses shell scripts, job chaining in, With Shell Scripting –Sample run shuffle phase (Hadoop), MapReduce and Hadoop Refresher shuffling pattern description, Pattern Description –Performance analysis examples, Anonymizing StackOverflow comments –Reducer code simple random sampling (SRS), Known uses , Simple Random Sampling sort phase (Hadoop), MapReduce and Hadoop Refresher SortedMap interface, Reducer code SortedMapWritable class, Mapper code –Data flow diagram sorting pattern description, Pattern Description –Performance analysis examples, Sort users by last visit –Order reducer code SPLIT operation (Pig), Resemblances SQL GROUP BY clause, Resemblances hierarchical data structures and, Resemblances join operations, Resemblances ordering data by random value, Resemblances ordering in, Resemblances partition pruning and, Resemblances SELECT DISTINCT statement, Resemblances top ten pattern considerations, Resemblances WHERE clause, Resemblances , Resemblances SRS (simple random sampling), Known uses , Simple Random Sampling StackOverflow about, The Examples in This Book anonymizing comments, Anonymizing StackOverflow comments , Anonymous comments and distinct users comments table, The Examples in This Book generating random comments, Generating random StackOverflow comments –RecordReader code post/comment building on, Post/comment building on StackOverflow –Reducer code posts table, The Examples in This Book question/answer building on, Question/answer building on StackOverflow –Reducer code self-joining comments, Comment Comparison –Mapper code updating data and, Motivation user and comment joins, User and comment join –Combiner optimization users table, The Examples in This Book standard deviation, calculating, Median and standard deviation –Data flow diagram streaming data, Streaming Data String class composite join example, Driver code inverted index example, Wikipedia reference inverted index job merging examples, TaggedText WritableComparable StringTokenizer class, Hadoop Example: Word Count structured to hierarchical pattern description, Pattern Description –Performance analysis examples, Post/comment building on StackOverflow –Reducer code summarization patterns counting with counters pattern, Pattern Description –Driver code inverted index pattern, Pattern Description –Combiner optimization numerical summarizations pattern, Pattern Description –Data flow diagram T temporary files, Job Chaining Text class composite join examples, Composite user comment join , Mapper code job merging examples, TaggedText WritableComparable , TaggedText WritableComparable “Word Count” program
example, Hadoop Example: Word Count TextInputFormat class customizing input and output, InputFormat , RecordReader “Word Count” program
example, Hadoop Example: Word Count TextOutputFormat class composite join examples, Composite user comment join customizing input and output, OutputFormat “Word Count” program
example, Hadoop Example: Word Count top ten pattern description, Pattern Description –Performance analysis examples, Top ten users by reputation –Reducer code total order sorting pattern description, Pattern Description –Performance analysis examples, Sort users by last visit –Order reducer code TotalOrderPartitioner class about, Patterns as a Library or Component total order sorting pattern and, Structure , Driver code , Analyze mapper code tracking threads of events, Known uses TreeMap class numerical summarizations example, Reducer code top ten example, Mapper code TupleWritable class, Mapper code W WHERE clause (SQL), Resemblances , Resemblances White, Tom, MapReduce and Hadoop Refresher Wikipedia reference inverted index example, Wikipedia reference inverted index –Combiner optimization “Word Count”
program example (Hadoop), Hadoop Example: Word Count –Hadoop Example: Word Count word counts numerical summarizations example, Known uses “Word Count” program
example, Hadoop Example: Word Count –Hadoop Example: Word Count WordCountMapper class, Hadoop Example: Word Count Writable interface, InputSplit code WritableComparable interface about, RecordReader job merging examples, TaggedText WritableComparable partition pruning examples, Custom WritableComparable code Writeable interface numerical summarization example, MinMaxCountTuple code “Word Count” program
example, Hadoop Example: Word Count
..................Content has been hidden....................
You can't read the all page of ebook, please click
here login for view all page.