Introducing TensorFrames

At the time of writing, TensorFrames is an experimental binding for Apache Spark; it was introduced in early 2016, shortly after the release of TensorFlow. With TensorFrames, one can manipulate Spark DataFrames with TensorFlow programs. Referring to the tensor diagrams in the previous section, we have updated the figure to show how Spark DataFrames work with TensorFlow, as shown in the following diagram:

Introducing TensorFrames

As noted in the preceding diagram, TensorFrames provides a bridge between Spark DataFrames and TensorFlow. This allows you to take your DataFrames and apply them as input into your TensorFlow computation graph. TensorFrames also allows you to take the TensorFlow computation graph output and push it back into DataFrames so you can continue your downstream Spark processing.

In terms of common usage scenarios for TensorFrames, these typically include the following:

Utilize TensorFlow with your data

The integration of TensorFlow and Apache Spark with TensorFrames allows data scientists to expand their analytics, streaming, graph, and machine learning capabilities to include Deep Learning via TensorFlow. This allows you to both train and deploy models at scale.

Parallel training to determine optimal hyperparameters

When building deep learning models, there are several configuration parameters (that is, hyperparameters) that impact on how the model is trained. Common in Deep Learning/artificial neural networks are hyperparameters that define the learning rate (if the rate is high it will learn quickly, but it may not take into account highly variable input - that is, it will not learn well if the rate and variability in the data is too high) and the number of neurons in each layer of your neural network (too many neurons results in noisy estimates, while too few neurons will result in the network not learning well).

As observed in Deep Learning with Apache Spark and TensorFlow (https://databricks.com/blog/2016/01/25/deep-learning-with-apache-spark-and-tensorflow.html), using Spark with TensorFlow to help find the best set of hyperparameters for neural network training resulted in an order of magnitude reduction in training time and a 34% lower error rate for the handwritten digit recognition dataset.

For more information on Deep Learning and hyperparameters, please refer to:

At the time of writing, TensorFrames is officially supported as of Apache Spark 1.6 (Scala 2.10), though most contributions are currently focused on Spark 2.0 (Scala 2.11). The easiest way to use TensorFrames is to access it via Spark Packages (https://spark-packages.org).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset