How to do it...

  1. Create a table named person in MySQL using the following DDL:
        CREATE TABLE 'person' ( 
'person_id' int(11) NOT NULL AUTO_INCREMENT,
'first_name' varchar(30) DEFAULT NULL,
'last_name' varchar(30) DEFAULT NULL,
'gender' char(1) DEFAULT NULL,
'age' tinyint(4) DEFAULT NULL,
PRIMARY KEY ('person_id')
)
  1. Insert some data:
        Insert into person values('Barack','Obama','M',55); 
Insert into person values('Bill','Clinton','M',70);
Insert into person values('Hillary','Clinton','F',69);
Insert into person values('Bill','Gates','M',61);
Insert into person values('Michelle','Obama','F',52);
  1. Download mysql-connector-java-x.x.xx-bin.jar from http://dev.mysql.com/downloads/connector/j/.
  2. Make the MySQL driver available to the spark-shell and launch it:
        $ spark-shell --driver-class-path/path-to-mysql-jar/mysql-
connector-java-5.1.34-bin.jar
Note that path-to-mysql-jar is not the actual path name. You need to use your pathname. This is just a syntax.
  1. Construct a JDBC URL:
        scala> val url="jdbc:mysql://localhost:3306/hadoopdb"
  1. Create a connection properties object with a username and password:
        scala> val prop = new java.util.Properties
scala> prop.setProperty("user","hduser")
scala> prop.setProperty("password","********")
  1. Load the DataFrame with the JDBC data source (url, table name, and properties):
        scala> val people = sqlContext.read.jdbc(url,"person",prop)
  1. Show the results in a nice tabular format by executing the following command:
        scala> people.show
  1. This has loaded the whole table. What if I would like to only load males (url, table name, predicates, and properties)? To do this, run the following command:
        scala> val males = spark.read.jdbc
(url,"person",Array("gender='M'"),prop)

scala> males.show
  1. Show only the first names by executing the following command:
        scala> val first_names = people.select("first_name")
scala> first_names.show
  1. Show only people below 60 years of age by executing the following command:
        scala> val below60 = people.filter(people("age") < 60)
scala> below60.show
  1. Group people by gender as follows:
        scala> val grouped = people.groupBy("gender")
  1. Find the number of males and females by executing the following command:
        scala> val gender_count = grouped.count
scala> gender_count.show
  1. Find the average age of males and females by executing the following command:
        scala> val avg_age = grouped.avg("age")
scala> avg_age.show
  1. Now if you'd like to save this avg_age data to a new table, run the following command:
        scala> gender_count.write.jdbc(url,"gender_count",prop)
  1. Save the people DataFrame in the Parquet format:
        scala> people.write.parquet("people.parquet")
  1. Save the people DataFrame in the JSON format:
        scala> people.write.json("people.json")
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset