Evaluating the embeddings

Let's now evaluate what our model has learned and how well our model has understood the semantics of the text. The genism library provides the most_similar function, which gives us the top similar words related to the given word.

As you can see in the following code, given san_diego as an input, we are getting all the other related city names that are most similar:

model.most_similar('san_diego')

[(u'san_antonio', 0.8147615790367126), (u'indianapolis', 0.7657858729362488), (u'austin', 0.7620342969894409), (u'memphis', 0.7541092038154602), (u'phoenix', 0.7481759786605835), (u'seattle', 0.7471771240234375), (u'dallas', 0.7407466769218445), (u'san_francisco', 0.7373261451721191), (u'la', 0.7354192137718201), (u'boston', 0.7213659286499023)]

We can also apply arithmetic operations on our vectors to check how accurate our vectors are as follows:

model.most_similar(positive=['woman', 'king'], negative=['man'], topn=1)

[(u'queen', 0.7255150675773621)]

We can also find the words that do not match in the given set of words; for instance, in the following list called text, other than the word holiday, all others are city names. Since Word2Vec has understood this difference, it returns the word holiday as the one that does not match with the other words in the list as shown:

text = ['los_angeles','indianapolis', 'holiday', 'san_antonio','new_york']

model.doesnt_match(text)

'holiday'
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset