Full-text Search using Postgres
I was working on a search feature that performs a linguistic search, and searches ignoring parenthesis, hyphens and other special characters. The search would be performed on a few fields in a PostgreSQL 10 database. Since Postgres supports full-text search, I decided to use it.
PostgreSQL full-text search
Full-text search is an indexing and search technique that does not just grep the text for certain keywords which may be a word or part of a word, but takes into account linguistic features as well.
As an example, let us consider this scenario. You're trying to search for the word
bunny in the column sentence which may have values like 'There was one bunny' in table essays, a SQL statement like this would be good. We will use a case insensitive search by using the ILIKE operator.
SELECT sentence FROM essays WHERE sentence ILIKE '%bunny%';
Now, consider these two scenarios. The simple text search will not work on them. Even a regular expression would not be successful.
1) If I wanted to search for
bunny and wanted to include sentences like 'There were many bunnies' as well as 'There was one bunny'.
2) If I wanted to search for
monkeys and wanted to include sentences like 'There were many monkeys' as well as 'There was one monkey'.
In this case, we apply full-text search. Full-text search has a function
to_tsvector that breaks up the haystack into tokens. Another function
to_tsquery performs a query on these tokens.
Start the psql console and type this:
select to_tsvector('There were many bunnies') @@ to_tsquery('bunny');
We get these results:
?column? ---------- t (1 row)
select to_tsvector('There was one bunny') @@ to_tsquery('bunny');
That select statement returns true as well.
If you're using a German database which contains German words, set the language to 'german'. By default, the language is set to 'english'.
select to_tsvector('german', 'Ich möchte ein Buch') @@ to_tsquery('german', 'Buch'); select to_tsvector('german', 'Ich will Buch') @@ to_tsquery('german', 'Bucher');
Now, what if the search is for something like 'IT personnel'? This will fail with English as the default language because words like 'a', 'it', 'the', 'is' are considered stopwords. You can configure the database to ignore the stopword. Alternatively, if you don't have access to the database configuration files, just use 'simple', like this:
select to_tsvector('german', 'They all worked in IT jobs') @@ to_tsquery('it');
That will return true.
You can also do a full-text search and include AND or OR operators. To search for sentences with
pigeons we use the OR operator represented by |
select to_tsvector('The crows sat on the tree') @@ to_tsquery('crows|pigeons');
Similarly, to search for sentences with
pigeons, we use the AND operator, represented by &
select to_tsvector('One pigeon and five crows sat on the tree') @@ to_tsquery('crows&pigeons');
This is a very basic introduction to full-text search. For more, check out the documentation corresponding to the database and version that you use.
Created on 21 March 2020
If you liked this article, follow us on Facebook and Twitter (@aruljohn).