Creating a Spark-enabled notebook

To use Spark in Watson Studio, you need to create a notebook and associate a Spark version with it by performing the following steps:

  1. The steps to create the notebook are the same as we have followed in previous chapters. First, from within the project, locate the Notebook section and click on New Notebook. On the New notebook page, provide a name and description:

  1. Notice that, in the preceding screenshot, Python 3.5 is the selected languageā€”this is fine but then if we scroll down, we will see Spark version*. From the drop-down list, you can select the runtime environment for the notebook. For our example, we can select Default Spark Python 3.5 XS (Driver with 1 vCPU and 4GB, 2 executors with 1 vCPU and 4 GB RAM each):

  1. Once you click on Create Notebook, the notebook environment will be instanced and you will be ready to begin entering Spark commands.
  2. Once your Spark-enabled notebook is created, you can run Python commands and execute Spark jobs to process Spark SQL queries using DataFrame abstractions as a data source, as shown in the following example:
df_data_2.createOrReplaceTempView("station")
sqlDF = spark.sql("SELECT * FROM station where VALUE > 200")
sqlDF.show()

Don't pay too much attention to the actual code in the preceding example at this point as, in the next sections, we will use our Spark-enabled notebook to create a Spark ML pipeline.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset