The tsvector datatype is a sorted list of distinct lexeme. Lexeme is the fundamental unit of word, simply speaking one can see a lexeme as the word root without a suffix or inflectional forms or grammatical variants . The following example shows casting a text to a tsvector as follows:
car_portal=# SELECT 'A wise man always has something to say, whereas a fool always needs to say something'::tsvector;
tsvector
--------------------------------------------------------------------------------------------
'A' 'a' 'always' 'fool' 'has' 'man' 'needs' 'say' 'say,' 'something' 'to' 'whereas' 'wise'
(1 row)
Casting a text to tsvector does not normalize the document completely due to the lack of linguistic rules. To normalize the preceding example, one can use the to_tsvector() function to normalize the text properly, as follows:
car_portal=# SELECT to_tsvector('english', 'A wise man always has something to say, whereas a fool always needs to say something');
to_tsvector
---------------------------------------------------------------------------------------
'alway':4,12 'fool':11 'man':3 'need':13 'say':8,15 'someth':6,16 'wherea':9 'wise':2
(1 row)
As shown in the preceding example, the to_tsvector function stripped some letters, such as s from always, and also generated the integer position of lexemes, which can be used for proximity ranking.