lots of tech info here

Full-text Search using Postgres

Published on 21 March 2020 | No Comments

I was working on a search feature that performs a linguistic search, and searches ignoring parenthesis, hyphens and other special characters. The search would be performed on a few fields in a PostgreSQL 10 database. Since Postgres supports full-text search, I decided to use it.

postgresql full-text search

PostgreSQL full-text search

Full-text search is an indexing and search technique that does not just grep the text for certain keywords which may be a word or part of a word, but takes into account linguistic features as well.

As an example, let us consider this scenario. You're trying to search for the word bunny in the column sentence which may have values like 'There was one bunny' in table essays, a SQL statement like this would be good. We will use a case insensitive search by using the ILIKE operator.

SELECT sentence FROM essays WHERE sentence ILIKE '%bunny%';

Now, consider these two scenarios. The simple text search will not work on them. Even a regular expression would not be successful.

1) If I wanted to search for bunny and wanted to include sentences like 'There were many bunnies' as well as 'There was one bunny'.

2) If I wanted to search for monkeys and wanted to include sentences like 'There were many monkeys' as well as 'There was one monkey'.

In this case, we apply full-text search. Full-text search has a function to_tsvector that breaks up the haystack into tokens. Another function to_tsquery performs a query on these tokens.

Start the psql console and type this:

 select to_tsvector('There were many bunnies') @@ to_tsquery('bunny');

We get these results:

 ?column? 
----------
 t
(1 row)

Type this:

select to_tsvector('There was one bunny') @@ to_tsquery('bunny');

That select statement returns true as well.

If you're using a German database which contains German words, set the language to 'german'. By default, the language is set to 'english'.

select to_tsvector('german', 'Ich möchte ein Buch') @@ to_tsquery('german', 'Buch');
select to_tsvector('german', 'Ich will Buch') @@ to_tsquery('german', 'Bucher');

Now, what if the search is for something like 'IT personnel'? This will fail with English as the default language because words like 'a', 'it', 'the', 'is' are considered stopwords. You can configure the database to ignore the stopword. Alternatively, if you don't have access to the database configuration files, just use 'simple', like this:

select to_tsvector('german', 'They all worked in IT jobs') @@ to_tsquery('it');

That will return true.

You can also do a full-text search and include AND or OR operators. To search for sentences with crows or pigeons we use the OR operator represented by |

select to_tsvector('The crows sat on the tree') @@ to_tsquery('crows|pigeons');

Similarly, to search for sentences with crows and pigeons, we use the AND operator, represented by &

select to_tsvector('One pigeon and five crows sat on the tree') @@ to_tsquery('crows&pigeons');

This is a very basic introduction to full-text search. For more, check out the documentation corresponding to the database and version that you use.

Created on 21 March 2020

If you liked this article, follow us on Facebook and Twitter (@aruljohn).

Share this with Others

Subscribe

Tags