From a machine learning point of view, raw text is useless. If we manage to transform it into meaningful numbers, we can then feed it into our machine learning algorithms, such as clustering. This is also true for more mundane operations on text, such as similarity measurement.