Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Summary

We introduced the problem of record linkage and emphasized its importance. We introduced the package, RecordLinkage, in R to solve record linkage problems. We started with generating features, string- and phonetic-based, for record pairs so that they can be processed further down the pipeline to dedup records. We covered expectation maximization and weights-based methods to perform a dedup task on our record pairs. Finally, we wrapped up the chapter by introducing machine learning methods for dedup tasks. Under unsupervised methods, K-means clustering was discussed. We further leveraged the output of the K-means algorithm to train a supervised model.

In the next chapter we go through streaming data and its challenges. We will build a stream clustering algorithm for a given streaming data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Summary

Create new playlist

Sign In

Sign Up

Table of Contents for
Summary