Turning the movie review classifier into a web application

Now that we are somewhat familiar with the basics of Flask web development, let's advance to the next step and implement our movie classifier into a web application. In this section, we will develop a web application that will first prompt a user to enter a movie review, as shown in the following screenshot:

Turning the movie review classifier into a web application

After the review has been submitted, the user will see a new page that shows the predicted class label and the probability of the prediction. Furthermore, the user will be able to provide feedback about this prediction by clicking on the Correct or Incorrect button, as shown in the following screenshot:

Turning the movie review classifier into a web application

If a user clicked on either the Correct or Incorrect button, our classification model will be updated with respect to the user's feedback. Furthermore, we will also store the movie review text provided by the user as well as the suggested class label, which can be inferred from the button click, in a SQLite database for future reference. (Alternatively, a user could skip the update step and click the Submit another review button to submit another review.)

The third page that the user will see after clicking on one of the feedback buttons is a simple thank you screen with a Submit another review button that redirects the user back to the start page. This is shown in the following screenshot:

Turning the movie review classifier into a web application

Note

Before we take a closer look at the code implementation of this web application, I encourage you to take a look at the live demo that I uploaded at http://raschkas.pythonanywhere.com to get a better understanding of what we are trying to accomplish in this section.

Files and folders – looking at the directory tree

To start with the big picture, let's take a look at the directory tree that we are going to create for this movie classification application, which is shown here:

Files and folders – looking at the directory tree

In the previous section of this chapter, we already created the vectorizer.py file, the SQLite database reviews.sqlite, and the pkl_objects subdirectory with the pickled Python objects.

The app.py file in the main directory is the Python script that contains our Flask code, and we will use the review.sqlite database file (which we created earlier in this chapter) to store the movie reviews that are being submitted to our web application. The templates subdirectory contains the HTML templates that will be rendered by Flask and displayed in the browser, and the static subdirectory will contain a simple CSS file to adjust the look of the rendered HTML code.

Note

A separate directory containing the movie review classifier application with the code discussed in this section is provided with the code examples for this book, which you can either obtain directly from Packt or download from GitHub at https://github.com/rasbt/python-machine-learning-book-2nd-edition/. The code in this section can be found in the.../code/ch09/movieclassifier subdirectory.

Implementing the main application as app.py

Since the app.py file is rather long, we will conquer it in two steps. The first section of app.py imports the Python modules and objects that we are going to need, as well as the code to unpickle and set up our classification model:

from flask import Flask, render_template, request
from wtforms import Form, TextAreaField, validators
import pickle
import sqlite3
import os
import numpy as np

# import HashingVectorizer from local dir
from vectorizer import vect

app = Flask(__name__)

######## Preparing the Classifier
cur_dir = os.path.dirname(__file__)
clf = pickle.load(open(os.path.join(cur_dir,
                 'pkl_objects',
                 'classifier.pkl'), 'rb'))
db = os.path.join(cur_dir, 'reviews.sqlite')

def classify(document):
    label = {0: 'negative', 1: 'positive'}
    X = vect.transform([document])
    y = clf.predict(X)[0]
    proba = np.max(clf.predict_proba(X))
    return label[y], proba

def train(document, y):
    X = vect.transform([document])
    clf.partial_fit(X, [y])

def sqlite_entry(path, document, y):
    conn = sqlite3.connect(path)
    c = conn.cursor()
    c.execute("INSERT INTO review_db (review, sentiment, date)"
    " VALUES (?, ?, DATETIME('now'))", (document, y))
    conn.commit()
    conn.close()

This first part of the app.py script should look very familiar to us by now. We simply imported the HashingVectorizer and unpickled the logistic regression classifier. Next, we defined a classify function to return the predicted class label as well as the corresponding probability prediction of a given text document. The train function can be used to update the classifier, given that a document and a class label are provided.

Using the sqlite_entry function, we can store a submitted movie review in our SQLite database along with its class label and timestamp for our personal records. Note that the clf object will be reset to its original, pickled state if we restart the web application. At the end of this chapter, you will learn how to use the data that we collect in the SQLite database to update the classifier permanently.

The concepts in the second part of the app.py script should also look quite familiar to us:

######## Flask
class ReviewForm(Form):
    moviereview = TextAreaField('',
                                [validators.DataRequired(),
                                validators.length(min=15)])

@app.route('/')
def index():
    form = ReviewForm(request.form)
    return render_template('reviewform.html', form=form)

@app.route('/results', methods=['POST'])
def results():
    form = ReviewForm(request.form)
    if request.method == 'POST' and form.validate():
        review = request.form['moviereview']
        y, proba = classify(review)
        return render_template('results.html',
                                content=review,
                                prediction=y,
                                probability=round(proba*100, 2))
    return render_template('reviewform.html', form=form)

@app.route('/thanks', methods=['POST'])
def feedback():
    feedback = request.form['feedback_button']
    review = request.form['review']
    prediction = request.form['prediction']

    inv_label = {'negative': 0, 'positive': 1}
    y = inv_label[prediction]
    if feedback == 'Incorrect':
        y = int(not(y))
    train(review, y)
    sqlite_entry(db, review, y)
    return render_template('thanks.html')

if __name__ == '__main__':
    app.run(debug=True)

We defined a ReviewForm class that instantiates a TextAreaField, which will be rendered in the reviewform.html template file (the landing page of our web application). This, in turn, is rendered by the index function. With the validators.length(min=15) parameter, we require the user to enter a review that contains at least 15 characters. Inside the results function, we fetch the contents of the submitted web form and pass it on to our classifier to predict the sentiment of the movie classifier, which will then be displayed in the rendered results.html template.

The feedback function, which we implemented in app.py in the previous subsection, may look a little bit complicated at first glance. It essentially fetches the predicted class label from the results.html template if a user clicked on the Correct or Incorrect feedback button, and transforms the predicted sentiment back into an integer class label that will be used to update the classifier via the train function, which we implemented in the first section of the app.py script. Also, a new entry to the SQLite database will be made via the sqlite_entry function if feedback was provided, and eventually the thanks.html template will be rendered to thank the user for the feedback.

Setting up the review form

Next, let's take a look at the reviewform.html template, which constitutes the starting page of our application:

<!doctype html>
<html>
  <head>
    <title>Movie Classification</title>
      <link rel="stylesheet"
       href="{{ url_for('static', filename='style.css') }}">
  </head>
  <body>

    <h2>Please enter your movie review:</h2>

    {% from "_formhelpers.html" import render_field %}

    <form method=post action="/results">
      <dl>
        {{ render_field(form.moviereview, cols='30', rows='10') }}
      </dl>
      <div>
        <input type=submit value='Submit review'
        name='submit_btn'>
      </div>
    </form>

  </body>
</html>

Here, we simply imported the same _formhelpers.html template that we defined in the Form validation and rendering section earlier in this chapter. The render_field function of this macro is used to render a TextAreaField where a user can provide a movie review and submit it via the Submit review button displayed at the bottom of the page. This TextAreaField is 30 columns wide and 10 rows tall, and would look like this:

Setting up the review form

Creating a results page template

Our next template, results.html, looks a little bit more interesting:

<!doctype html>
<html>
  <head>
    <title>Movie Classification</title>
      <link rel="stylesheet"
      href="{{ url_for('static', filename='style.css') }}">
  </head>
  <body>

    <h3>Your movie review:</h3>
    <div>{{ content }}</div>

    <h3>Prediction:</h3>
    <div>This movie review is <strong>{{ prediction }}</strong>
    (probability: {{ probability }}%).</div>

    <div id='button'>
      <form action="/thanks" method="post">
        <input type=submit value='Correct'
        name='feedback_button'>
        <input type=submit value='Incorrect'
        name='feedback_button'>
        <input type=hidden value='{{ prediction }}'
        name='prediction'>
        <input type=hidden value='{{ content }}' name='review'>
      </form>
    </div>

    <div id='button'>
      <form action="/">
       <input type=submit value='Submit another review'>
      </form>
    </div>

  </body>
</html>

First, we inserted the submitted review, as well as the results of the prediction, in the corresponding fields {{ content }}, {{ prediction }}, and {{ probability }}. You may notice that we used the {{ content }} and {{ prediction }} placeholder variables a second time in the form that contains the Correct and Incorrect buttons. This is a workaround to POST those values back to the server to update the classifier and store the review in case the user clicks on one of those two buttons.

Furthermore, we imported a CSS file (style.css) at the beginning of the results.html file. The setup of this file is quite simple; it limits the width of the contents of this web application to 600 pixels and moves the Incorrect and Correct buttons labeled with the div ID button down by 20 pixels:

body{
  width:600px;
}

.button{
  padding-top: 20px;
}

This CSS file is merely a placeholder, so please feel free to adjust it to adjust the look and feel of the web application to your liking.

The last HTML file we will implement for our web application is the thanks.html template. As the name suggests, it simply provides a nice thank you message to the user after providing feedback via the Correct or Incorrect button. Furthermore, we will put a Submit another review button at the bottom of this page, which will redirect the user to the starting page. The contents of the thanks.html file are as follows:

<!doctype html>
<html>
  <head>
    <title>Movie Classification</title>
      <link rel="stylesheet"
      href="{{ url_for('static', filename='style.css') }}">
  </head>
  <body>

    <h3>Thank you for your feedback!</h3>

    <div id='button'>
      <form action="/">
        <input type=submit value='Submit another review'>
      </form>
    </div>

  </body>
</html>

Now, it would be a good idea to start the web application locally from our Terminal via the following command before we advance to the next subsection and deploy it on a public web server:

python3 app.py

After we have finished testing our application, we also shouldn't forget to remove the debug=True argument in the app.run() command of our app.py script.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset