Summary of the projects

Let's start with Chapter 1, The Python Machine Learning Ecosystem.

In the first chapter, we began with an overview of ML with Python. We started with the ML workflow, which included acquisition, inspection, preparation, modeling evaluation, and deployment. Then we studied the various Python libraries and functions that are needed for each step of the workflow. Lastly, we set up our ML environment to execute the projects.

Chapter 2, Building an App to Find Underpriced Apartments, as the name says, was based on building an app to find underpriced apartments. Initially, we listed our data to find the source of the apartments in the required location. Then, we inspected the data, and after preparing and visualizing the data, we performed regression modeling. Linear regression is a type of supervised ML. Supervised, in this context, simply means we provide the output values for our training set.

Then, we spent the remainder of our time exploring the options as per our choice. We created an application that made finding the right apartment just a little bit easier.

In Chapter 3, Building an App to Find Cheap Airfare, we built a similar app as in Chapter 2, Building an App to Find Underpriced Apartments, but to find cheap airfare. We started by sourcing airfare prices on the web. We used one of the trending techniques, web scraping, to retrieve the data of the airplane fares. To parse the DOM for our Google page, we used the Beautifulsoup library. Then, we used anomaly detection techniques to identify outlier fares. In doing this, cheaper airfare can be found, and we'll receive real-time text alerts using IFTTT.

In Chapter 4, Forecasting the IPO Market Using Logistic Regression, we looked at how the IPO market works. First, we discussed what an Initial Public Offering (IPO) is, and what the research tells us about this market. After that, we discussed a number of strategies that we can apply to predict the IPO market. It involved data cleansing and feature-engineering. Then, we implemented binary classification of the data using logistic regression to analyze the data. Then, we evaluated the final model, which was obtained as the output.

We also understood that the features that have an impact on our model include the feature importance that comes out of a random forest classifier. This more accurately reflects the true impact of a given feature.

Chapter 5, Create a Custom Newsfeed, was mostly for avid news readers who are interested in knowing what's going on around the globe. By creating a custom newsfeed, you can decide what news updates you get on your devices. In this chapter, you learned how to build a system that understands your taste in news, and will send you a tailored newsletter each day. We started by creating a supervised training set with the Pocket app, and then leveraged the Pocket API to retrieve the stories. We used the Embedly API to extract story bodies.

Then, we studied the basics of natural language processing (NLP) and Support Vector Machines (SVMs). We integrated If This Then That (IFTTT) with RSS feeds and Google sheets so that we could stay up to date with notifications, emails, and more. Lastly, we set up a daily personal newsletter. We used the Webhooks channel to send a POST request.

The script runs every four hours, pulls down the news stories from Google Sheets, runs the stories through the model, generates an email by sending a POST request to IFTTT for the stories that are predicted to be of interest, and then, finally, it will clear out the stories in the spreadsheet so only new stories get sent in the next email. And that's how we get our very own personalized newsfeed.

In Chapter 6, Predicate Whether Your Content Will Go Viral, we examined some of the most-shared content and attempted to find the common elements that differentiate this content from the content people were less inclined to share. The chapter started by providing an understanding of what exactly virality means. We also looked at what research tells us about virality.

Then, as we did it in the rest of the chapters we will be sourcing the shared counts and content. We used a dataset that was collected from a now-defunct website called ruzzit.com. This site, when it was active, tracked the most-shared content over time, which was exactly what we needed for this project. Then we explored the features of shareability, which included exploring image data, clustering, exploring the headlines, and exploring the story's content.

The last, but most important, part was building the predictive content-scoring model. We used an algorithm called random forest regression. We built the model with zero errors in it. Then, we evaluated the model and added some features to enhance it.

In Chapter 7, Use Machine Learning to Forecast the Stock Market, we learned how to build and test a trading strategy. We also learned how not to do it. There are countless pitfalls to avoid when trying to devise your own system, and it's nearly an impossible task, but it can be a lot of fun, and sometimes it can even be profitable. That said, don't do dumb things, such as risking money you can't afford to lose.

When you're ready to risk you money, you might as well learn some tricks and tips to avoid losing much of it. Who likes to lose in life—be it for money or a game?

We mostly concentrated our attention on stocks and the stock market. Initially, we analyzed types of markets and then the researched on the stock market. It's always better to have some prior knowledge before risking anything. We began developing our strategy by focusing on the technical aspects. We went through the S&P 500 over the last few years and used pandas to import our data. That gave us access to several sources of stock data, including Yahoo! and Google.

Then we built the regression model. We started with a very basic model using only the stock's prior closing values to predict the next day's close, and built it using a support vector regression. Lastly, we evaluated the performance of our model and the trades that were carried out.

Long before Siri was released with the iPhone 4S, we had chatbots that were used widely across multiple applications. In Chapter 9, Building a Chatbot, we learned about the Turing Test and its origins. Then we looked at a program called ELIZA. If ELIZA was an early example of chatbots, what have we seen since then? In recent years, there has been an explosion of new chatbots—the most notable of these is Cleverbot.

Then, we looked at the interesting part: designing these chatbots.

But what about more advanced bots? How are they built?

Surprisingly, most chatbots you're likely to encounter don't use ML; they're what's know as retrieval-based models. This means responses are predefined according to the question and the context. The most common architecture for these bots is something called Artificial Intelligence Markup Language (AIML). AIML is an XML-based schema for representing how the bot should interact given the user's input. It's really just a more advanced version of how ELIZA works.

Lastly, we did sequence-to-sequence modeling for chatbots. This is frequently used in machine translation and question-answering applications as it allows us to map an input sequence of any length to an output sequence of any length.

In Chapter 8, Classifying Images with Convolutional Neural Networks, we looked at building a Convolutional Neural Network (CNN) to classify images in the Zalando Research dataset using Keras.

We started by extracting the image's features. Then, using CNNs, we understood the network topology, the various convolutional layers and filters, and what max pooling layers are.

Try building deeper models or grid searching over the many hyperparameters we used in our models. Assess your classifier's performance as you would with any other model—try building a confusion matrix to understand what classes we predicted well and what classes we weren't as strong in!

In Chapter 10, Build a Recommendation Engine, we explored different varieties of recommendation systems. We saw how they're implemented commercially and how they work. Then we implemented our own recommendation engine for finding GitHub repositories.

We started with collaborative filtering. Collaborative filtering is based on the idea that, somewhere out there in the world, you have a taste doppelganger—someone who has the same feelings about how good Star Wars is and how awful Love Actually is.

Then we also studied what content-based filtering and hybrid systems are.

Lastly, we used the GitHub API to create a recommendation engine based on collaborative filtering. The plan was to get all of the repositories that I'd starred over time and to then get all of the creators of those repositories to find out what repositories they'd starred. This enabled us to find out which users starred repositories were most similar to mine.

Table of Contents for Summary of the projects

Create new playlist

Sign In

Sign Up

Table of Contents for
Summary of the projects