Solving our initial challenge

We will now put everything together and demonstrate our system for the following new post that we assign to the new_post variable:

new_post = '''
Disk drive problems. Hi, I have a problem with my hard disk.
After 1 year it is working only sporadically now.
I tried to format it, but now it doesn't boot any more.
Any ideas? Thanks. '''

As you learned earlier, you will first have to vectorize this post before you predict
its label:

>>> new_post_vec = vectorizer.transform([new_post])
>>> new_post_label = km.predict(new_post_vec)[0]

Now that we have the clustering, we do not need to compare new_post_vec to all post vectors. Instead, we can focus only on the posts of the same cluster. Let's fetch their indices in the original data set:

>>> similar_indices = (km.labels_ == new_post_label).nonzero()[0]

The comparison in the bracket results in a Boolean array, and nonzero converts that array into a smaller array containing the indices of the True elements.

Using similar_indices, we then simply have to build a list of posts together with their similarity scores:

>>> similar = []
>>> for i in similar_indices:
...    dist = scipy.linalg.norm((new_post_vec - vectorized[i]).toarray())
...    similar.append((dist, train_data.data[i]))
>>> similar = sorted(similar)
>>> print("Count similar: %i" % len(similar))
Count similar: 56

We found 56 posts in the cluster of our post. To give the user a quick idea of what kind of similar posts are available, we can now present the most similar post (show_at_1), and two less similar but still related ones, all from the same cluster:

>>> show_at_1 = similar[0]
>>> show_at_2 = similar[len(similar) // 10]
>>> show_at_3 = similar[len(similar) // 2]

The following table shows the posts together with their similarity values:

Position

Similarity

Excerpt from post

1

1.038

BOOT PROBLEM with IDE controller

Hi,

I've got a Multi I/O card (IDE controller + serial/parallel interface) and two floppy drives (5 1/4, 3 1/2) and a Quantum ProDrive 80AT connected to it. I was able to format the hard disk, but I could not boot from it. I can boot from drive A: (which disk drive does not matter) but if I remove the disk from drive A and press the reset switch, the LED of drive A: continues to glow, and the hard disk is not accessed at all. I guess this must be a problem of either the Multi I/O card or floppy disk drive settings (jumper configuration?) Does anyone have any idea what the reason for this could be. [...]

2

1.150

Booting from B drive

I have a 5 1/4" drive as drive A. How can I make the system boot from my 3 1/2" B drive? (Optimally, the computer would be able to boot from either A or B, checking them in order for a bootable disk. But if I have to switch cables around and simply switch the drives so that it can't boot 5 1/4" disks, that's OK. Also, boot_b won't do the trick for me. [...]

[...]

3

1.280

IBM PS/1 vs TEAC FD

Hello, I already tried our national news group without success. I tried to replace a friend's original IBM floppy disk in his PS/1-PC with a normal TEAC drive. I already identified the power supply on pins 3 (5V) and 6 (12V), shorted pin 6 (5.25"/3.5" switch), and inserted pullup resistors (2K2) on pins 8, 26, 28, 30, and 34. The computer doesn't complain about a missing FD, but the FD's light stays on all the time. The drive spins up ok. when I insert a disk, but I can't access it. The TEAC works fine in a normal PC. Are there any points I missed? [...]

[...]

It is interesting that the posts reflect the similarity measurement score. The first post contains all the salient words from our new post. The second also revolves around booting problems but is about floppy disks and not hard disks. Finally, the third is neither about hard disks nor about booting problems. Still, all the posts, we would say, belong to the same domain as the new post.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset