Predicting prices and evaluating the model

ShortTermPredictionServiceImpl is the class that actually performs the prediction with the given model and data. At first, it transforms PriceData into a Spark DataFrame with the scheme corresponding to the one used for training by calling transformPriceData(priceData: PriceData). Then, the model.transform(dataframe) method is called; we extract the variables we need, write into the debugger log and return to the caller:

override def predictPriceDeltaLabel(priceData: PriceData, mlModel: org.apache.spark.ml.Transformer): (String, Row) = {
val df = transformPriceData(priceData)
val prediction = mlModel.transform(df)
val predictionData = prediction.select("probability", "prediction", "rawPrediction").head()
(predictionData.get(1).asInstanceOf[Double].toInt.toString, predictionData)
}

While running, the application collects data about the prediction output: predicted label and actual price delta. This information is used to build the root web page, displaying statistics such as TPR (true positive rate), FPR (false positive rate), TNR (true negative rate), and FNR (false negative rate), which were described earlier.

These statistics are counted on the fly from the SHORT_TERM_PREDICTION_BINARY table. Basically, by using the CASE-WHEN construction, we add new columns: TPR, FPR, TNR, and FNR. They are defined as follows:

  • TPR with value 1 if the predicted label was 1 and price delta was > 0, and value 0 otherwise
  • FPR with value 1 if the predicted label was 1 and price delta was <= 0, and value 0 otherwise
  • TNR with value 1 if the predicted label was 0 and price delta was <= 0, and value 0 otherwise
  • FNR with value 1 if the predicted label was 0 and price delta was > 0, and value 0 otherwise

Then, all records are grouped by model name, and TPR, FPR, TNR, and FNR are summed up, giving us the total numbers for each model. Here is the SQL code responsible for this:

SELECT MODEL, SUM(TPR) as TPR, SUM(FPR) as FPR, SUM(TNR) as TNR, 
SUM(FNR) as FNR, COUNT(*) as TOTAL FROM (SELECT *,
case when PREDICTED_LABEL='1' and ACTUAL_PRICE_DELTA > 0
then 1 else 0 end as TPR,
case when PREDICTED_LABEL='1' and ACTUAL_PRICE_DELTA <=0
then 1 else 0 end as FPR,
case when PREDICTED_LABEL='0' and ACTUAL_PRICE_DELTA <=0
then 1 else 0 end as TNR,
case when PREDICTED_LABEL='0' and ACTUAL_PRICE_DELTA > 0
then 1 else 0 end as FNR
FROM SHORT_TERM_PREDICTION_BINARY)
GROUP BY MODEL
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset