Extraction

Now let's move on to the concept of extraction. The print command doesn't really show the data in a very useful format. So, instead of using our Spark DataFrame, we could use the pandas open source data analytics library to create a pandas DataFrame that shows the data in a table.

Now we can look at an example that will make our SQL coders happy.

Import the pandas library and use the .toPandas() method to show the SQL query results:

import pandas as pd
sqlContext.sql("select STATION, METRIC from MyWeather limit 2").toPandas()

Running the preceding commands results in the following output:

Here is another example of simple SQL query execution, this time counting the number of metrics recorded for each weather station and then creating a list of the weather stations ordered by the number of metric records for the weather station:

query = """
select
    STATION ,
    count(*) as metric_count
from MyWeather
group by STATION
order by count(*) desc
"""
sqlContext.sql(query).toPandas()

The preceding code gives us the following output:

You are encouraged to experiment with additional variations of SQL statements and then review the results in real time.

Table of Contents for Extraction

Create new playlist

Sign In

Sign Up

Table of Contents for
Extraction