Setting up your daily personal newsletter

In order to set up a personal email with news stories, we're going to utilize IFTTT again. As before, in Chapter 3, Build an App to Find Cheap Airfares, we'll use the Webhooks channel to send a POST request. But this time, the payload will be our news stories. If you haven't set up the Webhooks channel, do so now. Instructions can be found in Chapter 3Build an App to Find Cheap Airfares. You should also set up the Gmail channel. Once that is complete, we'll add a recipe to combine the two. Follow the steps to set up IFTTT:

  1. First, click New Applet from the IFTTT home page and then click +this. Then, search for the Webhooks channel:

  1. Select that, and then select Receive a web request:

  1. Then, give the request a name. I'm using news_event:

  1. Finish by clicking Create trigger. Next, click on +that to set up the email piece. Search for Gmail and click on that:

  1. Once you have clicked Gmail, click Send yourself an email. From there, you can customize your email message:

Input a subject line, and include {{Value1}} in the email body. We will pass our story title and link into this with our POST request. Click on Create action and then Finish to finalize it.

Now, we're ready to generate the script that will run on a schedule, automatically sending us articles of interest. We're going to create a separate script for this, but one last thing we need to do in our existing code is serialize our vectorizer and our model, as demonstrated in the following code block:

import pickle 
 
pickle.dump(model, open(r'/input/a/path/here/to/news_model_pickle.p', 'wb')) 
 
pickle.dump(vect, open(r'/input/a/path/here/to/news_vect_pickle.p', 'wb')) 

With that, we have saved everything we need from our model. In our new script, we will read those in to generate our new predictions. We're going to use the same scheduling library to run the code as we used in Chapter 3Build an App to Find Cheap Airfares. Putting it all together, we have the following script:

import pandas as pd 
from sklearn.feature_extraction.text import TfidfVectorizer 
from sklearn.svm import LinearSVC 
import schedule 
import time 
import pickle 
import json 
import gspread 
from oauth2client.service_account import ServiceAccountCredentials 
import requests 
from bs4 import BeautifulSoup 
 
 
def fetch_news(): 
 
    try: 
        vect = pickle.load(open(r'/your/path/to/news_vect_pickle.p', 'rb')) 
        model = pickle.load(open(r'/your/path/to /news_model_pickle.p', 'rb')) 
 
        JSON_API_KEY = r'/your/path/to/API KEY.json' 
 
        scope = ['https://spreadsheets.google.com/feeds', 'https://www.googleapis.com/auth/drive'] 
 
        credentials = ServiceAccountCredentials.from_json_keyfile_name(JSON_API_KEY, scope) 
        gc = gspread.authorize(credentials) 
 
        ws = gc.open("NewStories") 
        sh = ws.sheet1 
        zd = list(zip(sh.col_values(2),sh.col_values(3), sh.col_values(4))) 
        zf = pd.DataFrame(zd, columns=['title','urls','html']) 
        zf.replace('', pd.np.nan, inplace=True) 
        zf.dropna(inplace=True) 
 
 
        def get_text(x): 
            soup = BeautifulSoup(x, 'html5lib') 
            text = soup.get_text() 
            return text 
 
        zf.loc[:,'text'] = zf['html'].map(get_text) 
 
        tv = vect.transform(zf['text']) 
        res = model.predict(tv) 
 
        rf = pd.DataFrame(res, columns=['wanted']) 
        rez = pd.merge(rf, zf, left_index=True, right_index=True) 
 
        rez = rez.iloc[:20,:] 
 
        news_str = '' 
        for t, u in zip(rez[rez['wanted']=='y']['title'], rez[rez['wanted']=='y']['urls']): 
            news_str = news_str + t + '
' + u + '
' 
 
        payload = {"value1" : news_str} 
        r = requests.post('https://maker.ifttt.com/trigger/news_event/with/key/bNHFwiZx0wMS7EnD425n3T', data=payload) 
 
        # clean up worksheet 
        lenv = len(sh.col_values(1)) 
        cell_list = sh.range('A1:F' + str(lenv)) 
        for cell in cell_list: 
            cell.value = "" 
        sh.update_cells(cell_list) 
        print(r.text) 
 
    except: 
        print('Action Failed') 
 
schedule.every(480).minutes.do(fetch_news) 
 
while 1: 
    schedule.run_pending() 
    time.sleep(1) 

What this script will do is run every 4 hours, pull down the news stories from Google Sheets, run the stories through the model, generate an email by sending a POST request to IFTTT for those stories that are predicted to be of interest, and then, finally, it will clear out the stories in the spreadsheet so only new stories get sent in the next email.

Congratulations! You now have your own personalized news feed!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset