The tsvector data type

The tsvector datatype is a sorted list of distinct lexeme. Lexeme is the fundamental unit of word, simply speaking one can see a lexeme as the word root without a suffix or inflectional forms or grammatical variants . The following example shows casting a text to a tsvector as follows:

car_portal=# SELECT 'A wise man always has something to say, whereas a fool always needs to say something'::tsvector;
tsvector
--------------------------------------------------------------------------------------------
'A' 'a' 'always' 'fool' 'has' 'man' 'needs' 'say' 'say,' 'something' 'to' 'whereas' 'wise'
(1 row)

Casting a text to tsvector does not normalize the document completely due to the lack of linguistic rules. To normalize the preceding example, one can use the to_tsvector() function to normalize the text properly, as follows:

car_portal=# SELECT to_tsvector('english', 'A wise man always has something to say, whereas a fool always needs to say something');
to_tsvector
---------------------------------------------------------------------------------------
'alway':4,12 'fool':11 'man':3 'need':13 'say':8,15 'someth':6,16 'wherea':9 'wise':2
(1 row)

As shown in the preceding example, the to_tsvector function stripped some letters, such as s from always, and also generated the integer position of lexemes, which can be used for proximity ranking.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset