Jens Krämer

Faster Redmine Search With PostgreSQL Trigram Indexes

 |  redmine, postgresql

Here’s an easy way to speed up Redmine’s search if you’re using PostgreSQL.

How Redmine Search Works

If you know Redmine, you know there’s a slew of different kinds of objects, from projects and issues over commit messages and news to forum messages and wiki pages. Each of these data types is stored in their own table. When you run a search, Redmine builds SQL queries that look like this:

SELECT DISTINCT `issues`.`created_on`, `issues`.`id`
FROM `issues`
INNER JOIN `projects` ON `projects`.`id` = `issues`.`project_id`
WHERE (projects.status <> 9
  AND EXISTS (SELECT 1 AS one FROM enabled_modules em WHERE em.project_id = projects.id
              AND em.name='issue_tracking'))
  AND (issues.project_id IN (2, 3, 4))
  AND ((
    (subject LIKE '%query%') OR (issues.description LIKE '%query%')
  ))
ORDER BY `issues`.`created_on` DESC,`issues`.`id` DESC

As you can see, besides checking for project ids and status and whether the project actually has the Issue Tracking module enabled, the query has LIKE statements for the subject and description fields of the issues table.

That is just the query for issues, there will be one like this for every entity you are searching through, plus one for custom field values, and if you have ever worked with relational databases you’re starting to see the problem: Search is done using LIKE '%query%', and such queries usually cannot make use of any indexes, leading to a full scan of all rows that match the other criteria in the query. Ouch!

How to make it faster?

There are a few possible solutions to the problem:

You could rework the way search is implemented and use a fulltext indexing engine, either built into your database (MySQL and PostgreSQL both can do that), or an external engine like Solr or ElasticSearch.

I took the DB-Engine approach with the Redmine PostgreSQL Full Text Search Plugin before. It works great in some scenarios where it can return more relevant search results in shorter times, but it comes with a few problems as well. First of all, it does not support multi-lingual data sets well because you have to decide on a language definition for tokenization / stemming when configuring it. Whatever language you pick, chances are words from another language will get the wrong treatment, leading to false results.

This approach in general does not play nicely with new data types introduced by other plugins and, because rewriting / monkey patching of the core search is necessaary, is brittle when core Redmine changes anyhting about the search in new versions.

Get rid of the postfix part in the LIKE

A somewhat creative solution to the problem is to have two conventional B-tree indexes per column, i.e. one over subject, and one over the reverse value of it. Searching for subject LIKE 'foo%' OR reverse(subject) LIKE 'oof%' will then lead to fast index based searches results.

One problem here is that MySQL cannot index expressions but only ‘real’ columns, so you have to actually store the reverse value in it’s own column, which, together with the additional index, will roughly double the size of your database. Regardless of the database being used, Redmine’s search code will have to be changed or monkey patched, leading to the same problems with other plugins and Redmine updates I already mentioned above.

Trigram Indexes

A better approach would be to find a way to speed up that exact query on the database side, without having to change anything about the way Redmine search is implemented. That would solve the performance issue, while keeping full compatibility with other plugins and future Redmine updates.

Guess what, PostgreSQL indeed comes with a solution to this problem: The trigram index, implemented in the pg_trgm module. Trigrams are the result of breaking up text into 3-letter groups like this:

redmine=# select show_trgm('query');
            show_trgm
---------------------------------
 {"  q"," qu",ery,que,"ry ",uer}

After enabling the pg_trgm extension for your DB, you can create a trigram index like that:

CREATE INDEX index_issues_on_description_trgm
    ON issues USING gin (description gin_trgm_ops);

That’s it. From now on, whenever there’s an issues.description LIKE '%query%' kind of query, that index can and will be used by PostgreSQL to speed up the search.

Armed with that knowledge, you’re ready to…

Speed Up Your Redmine Search in Two Minutes

All you have to do is add the pg_trgm extension to your database and create trigram indexes for all the columns that are queried by Redmine search:

CREATE EXTENSION 'pg_trgm';

CREATE INDEX CONCURRENTLY index_changesets_on_comments_trgm
  ON changesets USING gin (comments gin_trgm_ops);
CREATE INDEX CONCURRENTLY index_documents_on_title_trgm
  ON documents USING gin (title gin_trgm_ops);
CREATE INDEX CONCURRENTLY index_documents_on_description_trgm
  ON documents USING gin (description gin_trgm_ops);
CREATE INDEX CONCURRENTLY index_issues_on_subject_trgm
  ON issues USING gin (subject gin_trgm_ops);
CREATE INDEX CONCURRENTLY index_issues_on_description_trgm
  ON issues USING gin (description gin_trgm_ops);
CREATE INDEX CONCURRENTLY index_projects_on_name_trgm
  ON projects USING gin (name gin_trgm_ops);
CREATE INDEX CONCURRENTLY index_projects_on_identifier_trgm
  ON projects USING gin (identifier gin_trgm_ops);
CREATE INDEX CONCURRENTLY index_projects_on_description_trgm
  ON projects USING gin (description gin_trgm_ops);
CREATE INDEX CONCURRENTLY index_messages_on_subject_trgm
  ON messages USING gin (subject gin_trgm_ops);
CREATE INDEX CONCURRENTLY index_messages_on_content_trgm
  ON messages USING gin (content gin_trgm_ops);
CREATE INDEX CONCURRENTLY index_news_on_title_trgm
  ON news USING gin (title gin_trgm_ops);
CREATE INDEX CONCURRENTLY index_news_on_summary_trgm
  ON news USING gin (summary gin_trgm_ops);
CREATE INDEX CONCURRENTLY index_news_on_description_trgm
  ON news USING gin (description gin_trgm_ops);
CREATE INDEX CONCURRENTLY index_attachments_on_filename_trgm
  ON attachments USING gin (filename gin_trgm_ops);
CREATE INDEX CONCURRENTLY index_attachments_on_description_trgm
  ON attachments USING gin (description gin_trgm_ops);
CREATE INDEX CONCURRENTLY index_wiki_pages_on_title_trgm
  ON wiki_pages USING gin (title gin_trgm_ops);
CREATE INDEX CONCURRENTLY index_wiki_contents_on_text_trgm
  ON wiki_contents USING gin (text gin_trgm_ops);

There’s also a rake task which generates and runs the above statements. Just put the file into lib/tasks of your Redmine installation, run bundle exec rake redmine:pg_trgm:setup and you’re done.

Note: While not strictly necessary I added the concurrently keyword which will run the actual index creation in the background. Thus, on large datasets, it might take a few minutes or more until the indizes are actually available.

If you are using any plugins that come with their own searchable data stuctures, you can of course add similar indexes for these as well.

Comments

You can use Markdown here.

For the sake of spam checking any data you submit, including your IP address, will be transferred to the US based Akismet web service (akismet.com). If that's not acceptable for you, you can also reach me by other means.