Index
A
- --as-avrodatafile argument / There's more...
- --as-sequencefile argument / There's more...
- -a arguments / How it works...
- Accumulator interface / How it works...
- Accumulo
- row key designing, to store geographic events / Designing a row key to store geographic events in Accumulo, How to do it..., How it works...
- geographic event data bulk importing, MapReduce used / Using MapReduce to bulk import geographic event data into Accumulo, How to do it..., How it works...
- custom field constraint setting, to input geographic event data / Setting a custom field constraint forinputting geographic event data in Accumulo, How to do it..., How it works...
- SumCombiner, using / Counting fatalities for different versions of the same key using SumCombiner, How to do it..., How it works...
- used for enforcing cell-level security, on scans / Enforcing cell-level security on scans using Accumulo, How to do it..., How it works...
- sources aggregating, MapReduce used / Aggregating sources in Accumulo using MapReduce, How to do it..., How it works...
- AccumuloFileOutputFormat
- AccumuloInputFormat class / How it works...
- AccumuloOutputFormat
- AccumuloTableAssistant.java / AccumuloTableAssistant.java
- ACLED
- ACLEDIngestReducer.java class / How to do it...
- ACLEDSourceReducer static inner class / How to do it...
- addCacheArchive() static method / There's more...
- Apache Avro
- Apache Giraph
- Apache Hive
- Apache logs
- Apache Mahout
- Apache Mahout 0.6
- Apache Pig
- about / Using Apache Pig to filter bot traffic from web server logs, There's more...
- using to sort web server log data, by timestamp / Using Apache Pig to sort web server log data by timestamp, See also
- used for sorting web server log data, by timestamp / Using Apache Pig to sort web server log data by timestamp, See also
- used, for sorting data / How to do it...
- using, to view sessionize web server log data / Using Apache Pig to sessionize web server log data, How to do it..., See also
- functionality extending, Python used / Using Python to extend Apache Pig functionality, How it works...
- SELECT operation, performing with GROUP BY operation / Using Pig to load a table and perform a SELECT operation with GROUP BY, How it works...
- using, to load table / Using Pig to load a table and perform a SELECT operation with GROUP BY, How it works...
- replicated join, used for joining data / Joining data using Apache Pig replicated join, How it works..., There's more...
- merge join, used for joining data / Joining sorted data using Apache Pig merge join, How it works...
- skewed join, used for joining skewed data / Joining skewed data using Apache Pig skewed join, How to do it..., How it works...
- record with malformed IP address, example / How to do it...
- Apache Pig 0.10
- Apache Thrift
- apache_clf.txt dataset / Getting ready
- associative / How it works...
- Audioscrobbler dataset
- AvroWrapper class / How it works...
- AvroWriter job / There's more...
- AvroWriter MapReduce job / How it works...
B
C
D
- -D argument/flag / How it works...
- data
- moving between clusters, distributed copy used / Moving data efficiently between clusters using Distributed Copy, There's more...
- importing from MySQL into HDFS, Sqoop used / Importing data from MySQL into HDFS using Sqoop, Getting ready, How it works..., There's more...
- exporting from MySQL into HDFS, Sqoop used / Exporting data from HDFS into MySQL using Sqoop, Getting ready, How it works...
- compressing, LZO used / Compressing data using LZO, How to do it...
- sorting, Apache Pig used / How to do it...
- joining in mapper, MapReduce used / Joining data in the Mapper using MapReduce, How to do it..., How it works..., There's more...
- joining, Apache Pig replicated join used / Joining data using Apache Pig replicated join, How it works..., There's more...
- dataflow programs
- example data generating, URL for / See also
- Datafu
- data locality / Introduction
- Datanode / Introduction
- data serialization / Using Apache Avro to serialize data
- datediff() argument / How it works...
- date format strings / Date format strings follow Java SimpleDateFormat guidelines
- debugging information
- dfs.block.size property / How it works...
- dfs.replication property / How it works..., How it works...
- distcp command / There's more...
- DistinctCounterJob / How it works...
- distinct IPs
- DISTRIBUTE BY / SORT BY versus DISTRIBUTE BY versus CLUSTER BY versus ORDER BY
- distributed breadth-first search
- DistributedCache class / There's more...
- distributed cache mechanism / How it works...
- distributed copy
- DistributedLzoIndexer / How it works..., There's more...
- DROP temporary tables / DROP temporary tables
- dump command / How it works...
E
F
G
- Ganglia
- Ganglia meta daemon (gmetad) / Getting ready
- Ganglia monitoring daemon (gmond) / Getting ready
- geographical event data
- cleaning, Hive used / Using Hive and Python to clean and transform geographical event data, How to do it..., How it works..., There's more...
- transforming, Hive used / Using Hive and Python to clean and transform geographical event data, How to do it..., How it works..., There's more...
- cleaning, Python used / Using Hive and Python to clean and transform geographical event data, How to do it..., How it works..., There's more...
- transforming, Python used / Using Hive and Python to clean and transform geographical event data, How to do it..., How it works..., There's more...
- reading, by creating custom Hadoop Writable / Creating custom Hadoop Writable and InputFormat to read geographical event data, How to do it..., How it works...
- reading, by creating custom InputFormat / Creating custom Hadoop Writable and InputFormat to read geographical event data, How to do it..., How it works...
- geographic event data
- events transforming, Hive date UDFs used / Using Hive date UDFs to transform and sort event dates from geographic event data, How to do it...
- events sorting, Hive date UDFs used / Using Hive date UDFs to transform and sort event dates from geographic event data, How to do it...
- per-month report of fatalities building over, Hive used / Using Hive to build a per-month report of fatalities over geographic event data, How it works..., Date reformatting code template
- bulk importing into Accumulo, MapReduce used / Using MapReduce to bulk import geographic event data into Accumulo, How to do it..., How it works...
- inputting in Accumulo, by setting custom field constraint / Setting a custom field constraint forinputting geographic event data in Accumulo, How to do it..., How it works...
- geographic events
- get command / There's more...
- getCurrentVertex() method / How to do it...
- getmerge command / There's more...
- getRecordReader() method / How it works...
- getReverseTime() function / How it works...
- getRowID() / How to do it...
- getZOrderedCurve() method / How to do it..., How it works...
- Git Client
- GitHub
- Google BigTable design
- Google BigTable design approach
- Google Pregel paper
- Greenplum external table
- GroupComparator class / How it works...
- GzipCodec / Reading and writing data to SequenceFiles
H
- $HADOOP_BIN / Importing and exporting data into HDFS using Hadoop shell commands
- Hadoop
- about / Introduction, Introduction, Developing and testing MapReduce jobs with MRUnit
- URL / Getting ready
- streaming job, executing / How to do it...
- starting, in pseudo-distributed mode / Starting Hadoop in pseudo-distributed mode, How to do it..., How it works..., There's more...
- starting, in fully-distributed mode / Starting Hadoop in distributed mode, How to do it..., How it works..., There's more...
- new nodes, adding to existing cluster / Getting ready, There's more...
- rebalancing / There's more...
- cluster monitoring, Ganglia used / Monitoring cluster health using Ganglia, Getting ready, How it works...
- hadoop-streaming.jar file / How to do it...
- Hadoop Distributed Copy (distcp) tool / Moving data efficiently between clusters using Distributed Copy
- hadoop fs -COMMAND / Importing and exporting data into HDFS using Hadoop shell commands
- Hadoop FS shell / Getting ready
- hadoop mradmin -refreshNodes command / How it works...
- Hadoop shell commands
- hadoop shell script / Importing and exporting data into HDFS using Hadoop shell commands
- Hadoop streaming
- Hadoop Writable
- HashSet instance / How it works...
- HDFS
- about / Introduction, Introduction
- data importing, Hadoop shell commands used / Importing and exporting data into HDFS using Hadoop shell commands, How to do it..., How it works...
- data exporting, Hadoop shell commands used / Importing and exporting data into HDFS using Hadoop shell commands, How to do it..., How it works...
- data importing from MySQL, Sqoop used / Importing data from MySQL into HDFS using Sqoop, Getting ready, How it works..., There's more...
- data exporting from MySQL, Sqoop used / Exporting data from HDFS into MySQL using Sqoop, Getting ready, How it works...
- data exporting, into MongoDB / Exporting data from HDFS into MongoDB, How to do it..., How it works...
- data, importing from MongoDB / Importing data from MongoDB into HDFS, How to do it...
- data exporting into MongoDB, Pig used / Exporting data from HDFS into MongoDB using Pig, How to do it..., How it works...
- using, in Greenplum external table / Using HDFS in a Greenplum external table, How it works..., There's more...
- data loading, Flume used / Using Flume to load data into HDFS, How it works...
- services / Introduction
- data, reading to / Reading and writing data to HDFS, How to do it..., How it works...
- data, writing to / Reading and writing data to HDFS, How to do it..., How it works...
- replication factor, setting / Setting the replication factor for HDFS, How it works...
- block size, setting / Setting the block size for HDFS, How it works...
- external table over weblog data, mapping / Using Hive to map an external table over weblog data in HDFS, How it works...
- external table, mapping / How to do it...
- HDFS, services
- hdfs-site.xml file / Getting ready, How it works...
- HdfsReader class / There's more...
- HdfsWriter class / How it works..., There's more...
- Hive
- used, for transforming geographical event data / Using Hive and Python to clean and transform geographical event data, How to do it..., How it works..., There's more...
- used, for cleaning geographical event data / Using Hive and Python to clean and transform geographical event data, How to do it..., How it works..., There's more...
- used for mapping external table over weblog, in HDFS / Using Hive to map an external table over weblog data in HDFS, How it works...
- using, to create tables from weblog query results / Using Hive to dynamically create tables from the results of a weblog query, How to do it..., There's more...
- using to intersect weblog IPs and determine country / Using Hive to intersect weblog IPs and determine the country, How to do it...
- multitable join support / Hive supports multitable joins
- ON operator / The ON operator for inner joins does not support inequality conditions
- using to build per-month report of fatalities, over geographic event data / Using Hive to build a per-month report of fatalities over geographic event data, How it works..., Date reformatting code template
- custom UDF, implementing / Implementing a custom UDF in Hive to help validate source reliability over geographic event data , Getting ready, How to do it..., How it works...
- existing UDFs, checking out / Check out the existing UDFs
- used, for marking non-violence longest period / Getting ready, How to do it..., How it works...
- Hive date UDFs
- Hive query language
- Hive string UDFs
I
J
K
L
M
- --maxIter parameter / How it works...
- -mapper location_regains_mapper.py argument / How it works...
- -m argument / How it works...
- -md arguments / How it works...
- -ml arguments / How it works...
- main() method / How to do it..., How to do it...
- map() function / How to do it..., How to do it..., How to do it..., How it works...
- map() method / How it works...
- map-side join
- MapDriver class / How it works...
- Map input records counter / Using Counters in a MapReduce job to track bad records
- maponly jobs / How it works...
- Mapper class / How it works...
- mapred-site.xml configuration file / Getting ready, There's more...
- mapred.cache.files property / How it works...
- mapred.child.java.opts property / How to do it...
- mapred.compress.map.output property / How to do it...
- mapred.job.reuse.jvm.num.tasks property / How to do it...
- mapred.job.tracker property / How to do it..., How it works...
- mapred.map.child.java.opts property / How to do it...
- mapred.map.output.compression.codec property / How to do it...
- mapred.map.tasks.speculative.execution property / How to do it...
- mapred.output.compression.codec property / How to do it...
- mapred.output.compression.type property / How to do it...
- mapred.output.compress property / How to do it...
- mapred.reduce.child.java.opts property / How to do it...
- mapred.reduce.tasks.speculative.execution property / How to do it...
- mapred.reduce.tasks property / How to do it...
- mapred.skip.attempts.to.start.skipping property / There's more...
- mapred.skip.map.auto.incr.proc.count property / There's more...
- mapred.skip.map.max.skip.records property / There's more...
- mapred.skip.out.dir property / There's more...
- mapred.skip.reduce.auto.incr.proc.count property / There's more...
- mapred.tasktracker.reduce.tasks.maximum property / There's more...
- mapred.textoutputformat.separator property / There's more...
- MapReduce
- about / How it works...
- used, for transforming Apache logs into TSV format / Transforming Apache logs into TSV format using MapReduce, How to do it..., How it works..., There's more...
- using, to calculate page views / Using MapReduce and secondary sort to calculate page views, How to do it..., How it works...
- calculating, secondary sort used / Using MapReduce and secondary sort to calculate page views, How to do it..., How it works...
- output files naming, MultipleOutputs, using / Using MultipleOutputs in MapReduce to name output files, How to do it..., How it works...
- distributed cache, using to find lines with matching keywords over newa archives / Using the distributed cache in MapReduce to find lines that contain matching keywords over news archives, How it works..., Distributed cache does not work in local jobrunner mode
- used, for joining data in mapper / Joining data in the Mapper using MapReduce, How to do it..., How it works..., There's more...
- used, for counting distinct IPs in weblog data / Counting distinct IPs in weblog data using MapReduce and Combiners, How to do it..., How it works...
- used, for counting distinct IPs / How to do it...
- used for bulk importing geographic event data, into Accumulo / Using MapReduce to bulk import geographic event data into Accumulo, How to do it..., How it works...
- used for aggregating sources in Accumulo / Aggregating sources in Accumulo using MapReduce, How to do it..., How it works...
- MapReduce job
- MapReduce job, properties
- MapReduce jobs
- MapReduce running jobs
- MapReduce used
- mapred_excludes file / How it works...
- masters configuration file / There's more...
- Maven 2.2
- merge join, Apache Pig
- Microsoft SQL Server
- min() operator / The Combiner does not always have to be the same class as your Reducer
- Mockito
- MongoDB
- Mongo Hadoop Adaptor / Getting ready
- Mongo Java Driver
- MRUnit
- MultipleOutputs
- MySQL
- mysql.user table / How it works...
- MySQL JDBC driver JAR file / Getting ready
N
O
P
- --password option / How it works...
- PageRank
- page views
- per-month report of fatalities
- Pig
- play counts
- prev_date / How it works...
- protobufRecord object / How it works...
- ProtobufWritable class / How it works...
- ProtobufWritable instance / How it works...
- Protocol Buffers
- pseudo-distributed mode
- Python
- using, to extend Apache Pig functionality / Using Python to extend Apache Pig functionality, How it works...
- used, for cleaning geographical event data / Using Hive and Python to clean and transform geographical event data, How to do it..., How it works..., There's more...
- used, for transforming geographical event data / Using Hive and Python to clean and transform geographical event data, How to do it..., How it works..., There's more...
- AS keyword, used for type casing values / Type casing values using the AS keyword
- Python streaming
Q
R
S
T
U
V
W
X
Z
..................Content has been hidden....................
You can't read the all page of ebook, please click
here login for view all page.