In order to set up a personal email with news stories, we're going to utilize IFTTT again. As before, in Chapter 3, Build an App to Find Cheap Airfares, we'll use the Webhooks channel to send a POST request. But this time, the payload will be our news stories. If you haven't set up the Webhooks channel, do so now. Instructions can be found in Chapter 3, Build an App to Find Cheap Airfares. You should also set up the Gmail channel. Once that is complete, we'll add a recipe to combine the two. Follow the steps to set up IFTTT:
- First, click New Applet from the IFTTT home page and then click +this. Then, search for the Webhooks channel:
- Select that, and then select Receive a web request:
- Then, give the request a name. I'm using news_event:
- Finish by clicking Create trigger. Next, click on +that to set up the email piece. Search for Gmail and click on that:
- Once you have clicked Gmail, click Send yourself an email. From there, you can customize your email message:
Input a subject line, and include {{Value1}} in the email body. We will pass our story title and link into this with our POST request. Click on Create action and then Finish to finalize it.
Now, we're ready to generate the script that will run on a schedule, automatically sending us articles of interest. We're going to create a separate script for this, but one last thing we need to do in our existing code is serialize our vectorizer and our model, as demonstrated in the following code block:
import pickle pickle.dump(model, open(r'/input/a/path/here/to/news_model_pickle.p', 'wb')) pickle.dump(vect, open(r'/input/a/path/here/to/news_vect_pickle.p', 'wb'))
With that, we have saved everything we need from our model. In our new script, we will read those in to generate our new predictions. We're going to use the same scheduling library to run the code as we used in Chapter 3, Build an App to Find Cheap Airfares. Putting it all together, we have the following script:
import pandas as pd from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.svm import LinearSVC import schedule import time import pickle import json import gspread from oauth2client.service_account import ServiceAccountCredentials import requests from bs4 import BeautifulSoup def fetch_news(): try: vect = pickle.load(open(r'/your/path/to/news_vect_pickle.p', 'rb')) model = pickle.load(open(r'/your/path/to /news_model_pickle.p', 'rb')) JSON_API_KEY = r'/your/path/to/API KEY.json' scope = ['https://spreadsheets.google.com/feeds', 'https://www.googleapis.com/auth/drive'] credentials = ServiceAccountCredentials.from_json_keyfile_name(JSON_API_KEY, scope) gc = gspread.authorize(credentials) ws = gc.open("NewStories") sh = ws.sheet1 zd = list(zip(sh.col_values(2),sh.col_values(3), sh.col_values(4))) zf = pd.DataFrame(zd, columns=['title','urls','html']) zf.replace('', pd.np.nan, inplace=True) zf.dropna(inplace=True) def get_text(x): soup = BeautifulSoup(x, 'html5lib') text = soup.get_text() return text zf.loc[:,'text'] = zf['html'].map(get_text) tv = vect.transform(zf['text']) res = model.predict(tv) rf = pd.DataFrame(res, columns=['wanted']) rez = pd.merge(rf, zf, left_index=True, right_index=True) rez = rez.iloc[:20,:] news_str = '' for t, u in zip(rez[rez['wanted']=='y']['title'], rez[rez['wanted']=='y']['urls']): news_str = news_str + t + ' ' + u + ' ' payload = {"value1" : news_str} r = requests.post('https://maker.ifttt.com/trigger/news_event/with/key/bNHFwiZx0wMS7EnD425n3T', data=payload) # clean up worksheet lenv = len(sh.col_values(1)) cell_list = sh.range('A1:F' + str(lenv)) for cell in cell_list: cell.value = "" sh.update_cells(cell_list) print(r.text) except: print('Action Failed') schedule.every(480).minutes.do(fetch_news) while 1: schedule.run_pending() time.sleep(1)
What this script will do is run every 4 hours, pull down the news stories from Google Sheets, run the stories through the model, generate an email by sending a POST request to IFTTT for those stories that are predicted to be of interest, and then, finally, it will clear out the stories in the spreadsheet so only new stories get sent in the next email.
Congratulations! You now have your own personalized news feed!