Tuesday, 30 July 2019

How the MATCH() Function Works in MySQL

In MySQL, the MATCH() function performs a full-text search. It accepts a comma separated list of table columns to be searched.
The table/s must have a FULLTEXT index before you can do a full-text search against them (although boolean queries against a MyISAM search index can work — albeit slowly — even without a FULLTEXT index).
You can create a FULLTEXT index when creating the table (using the CREATE TABLEstatement), or you can use the ALTER TABLE statement or the CREATE INDEX statement if the table already exists.
By default, the search is case-insensitive. To perform a case-sensitive search, use a case-sensitive or binary collation for the indexed columns.

Syntax

The syntax for the MATCH() function goes like this:
MATCH (col1,col2,...) AGAINST (expr [search_modifier])
Where col1,col2,... is the comma-separated list of columns to search, and expr is the input string/expression.
The optional search_modifier argument allows you to specify the search type. It can be any of the following values:
  • IN NATURAL LANGUAGE MODE
  • IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION
  • IN BOOLEAN MODE
  • WITH QUERY EXPANSION
The default mode is IN NATURAL LANGUAGE MODE.

Example 1 – Basic Usage

Here’s an example of how to use this function:
SELECT AlbumId, AlbumName
FROM Albums
WHERE MATCH(AlbumName) AGAINST('cool');
Result:
+---------+--------------------+
| AlbumId | AlbumName          |
+---------+--------------------+
|       5 | Casualties of Cool |
+---------+--------------------+
Here’s the full table that the above query was run against:
SELECT AlbumId, AlbumName
FROM Albums;
Result:
+---------+--------------------------+
| AlbumId | AlbumName                |
+---------+--------------------------+
|       1 | Powerslave               |
|       2 | Powerage                 |
|       3 | Singing Down the Lane    |
|       4 | Ziltoid the Omniscient   |
|       5 | Casualties of Cool       |
|       6 | Epicloud                 |
|       7 | Somewhere in Time        |
|       8 | Piece of Mind            |
|       9 | Killers                  |
|      10 | No Prayer for the Dying  |
|      11 | No Sound Without Silence |
|      12 | Big Swing Face           |
|      13 | Blue Night               |
|      14 | Eternity                 |
|      15 | Scandinavia              |
|      16 | Long Lost Suitcase       |
|      17 | Praise and Blame         |
|      18 | Along Came Jones         |
|      19 | All Night Wrong          |
|      20 | The Sixteen Men of Tain  |
+---------+--------------------------+

Example 2 – Error: “Can’t find FULLTEXT index”

InnoDB tables must have a FULLTEXT index before it will return results from a full-text search. If it doesn’t have a FULLTEXT index, you’ll likely get the following error:
ERROR 1191 (HY000): Can't find FULLTEXT index matching the column list
If you get that error, you’ll need to add a FULLTEXT index for all columns that you’re trying to search (see next example).
The exception to this could be if you’re running a boolean query against a MyISAM search index.
To be more specific, the MySQL documentation for boolean full-text searches states the following:
InnoDB tables require a FULLTEXT index on all columns of the MATCH()expression to perform boolean queries. Boolean queries against a MyISAMsearch index can work even without a FULLTEXT index, although a search executed in this fashion would be quite slow.

Example 3 – Adding a FULLTEXT Index to an Existing Table

Here’s an example of adding a FULLTEXT index to an existing table:
ALTER TABLE Albums  
ADD FULLTEXT(AlbumName);
In this case, I indexed the contents of the AlbumName column.
To index multiple columns, separate them with a comma (see next example).

Example 4 – Searching Multiple Columns

If you think you’ll need to search more than one column you’ll need to create an index that includes all columns to be searched. To do this, simply include each column as a comma-separated list.
Here’s an example where I add a FULLTEXT index to the film table (which is part of the Sakila sample database).
ALTER TABLE film 
ADD FULLTEXT(title, description);
In this case, I index the contents of the title and description columns.
Now that we’ve created a FULLTEXT index for both columns, we can do a full-text search against them:
SELECT title, description
FROM film
WHERE MATCH(title, description) AGAINST('vertigo');
Result:
+-------------------+-----------------------------------------------------------------------------------------------------------+
| title             | description                                                                                               |
+-------------------+-----------------------------------------------------------------------------------------------------------+
| VERTIGO NORTHWEST | A Unbelieveable Display of a Mad Scientist And a Mad Scientist who must Outgun a Mad Cow in Ancient Japan |
+-------------------+-----------------------------------------------------------------------------------------------------------+
Here’s another search, where the exact key-phrase does not match, but each keyword within that phrase does:
SELECT title, description
FROM film
WHERE MATCH(title, description) AGAINST('Iron Maiden');
Result:
+-------------+---------------------------------------------------------------------------------------------------------+
| title       | description                                                                                             |
+-------------+---------------------------------------------------------------------------------------------------------+
| IRON MOON   | A Fast-Paced Documentary of a Mad Cow And a Boy who must Pursue a Dentist in A Baloon                   |
| MAIDEN HOME | A Lacklusture Saga of a Moose And a Teacher who must Kill a Forensic Psychologist in A MySQL Convention |
+-------------+---------------------------------------------------------------------------------------------------------+
If you only want the exact phrase to match, place double quotes around it:
SELECT title, description
FROM film
WHERE MATCH(title, description) AGAINST('"Iron Maiden"');
Result:
Empty set (0.00 sec)
In this case, none of the columns contain that exact phrase.

Example 5 – Return the Relevance Score

Whenever you use the MATCH() function, each row in the table is assigned a relevance value. In other words, each row gets a score that determines how relevant it is to the search term. The results are then ordered by relevance (highest relevance first).
Relevance values are nonnegative floating-point numbers. Zero relevance means no similarity. Relevance is computed based on the number of words in the row (document), the number of unique words in the row, the total number of words in the collection, and the number of rows that contain a particular word.
To return the relevance of each result, simply include the MATCH() function in your list of columns to select.
Example:
SELECT 
  MATCH(title, description) AGAINST('Iron Maiden') AS Relevance,
  title, 
  description
FROM film
WHERE MATCH(title, description) AGAINST('Iron Maiden');
Result:
+-----------+-------------+---------------------------------------------------------------------------------------------------------+
| Relevance | title       | description                                                                                             |
+-----------+-------------+---------------------------------------------------------------------------------------------------------+
|         9 | IRON MOON   | A Fast-Paced Documentary of a Mad Cow And a Boy who must Pursue a Dentist in A Baloon                   |
|         9 | MAIDEN HOME | A Lacklusture Saga of a Moose And a Teacher who must Kill a Forensic Psychologist in A MySQL Convention |
+-----------+-------------+---------------------------------------------------------------------------------------------------------+
In this case the relevance score is very high for both rows.
Here’s another one where the relevance is lower:
SELECT 
  MATCH(title, description) AGAINST('Saga of a Moose') AS Relevance,
  title, 
  description
FROM film
WHERE MATCH(title, description) AGAINST('Saga of a Moose')
LIMIT 15;
Result:
+--------------------+------------------------+---------------------------------------------------------------------------------------------------------+
| Relevance          | title                  | description                                                                                             |
+--------------------+------------------------+---------------------------------------------------------------------------------------------------------+
| 2.4431142807006836 | CAPER MOTIONS          | A Fateful Saga of a Moose And a Car who must Pursue a Woman in A MySQL Convention                       |
| 2.4431142807006836 | DATE SPEED             | A Touching Saga of a Composer And a Moose who must Discover a Dentist in A MySQL Convention             |
| 2.4431142807006836 | DELIVERANCE MULHOLLAND | A Astounding Saga of a Monkey And a Moose who must Conquer a Butler in A Shark Tank                     |
| 2.4431142807006836 | FLASH WARS             | A Astounding Saga of a Moose And a Pastry Chef who must Chase a Student in The Gulf of Mexico           |
| 2.4431142807006836 | HAROLD FRENCH          | A Stunning Saga of a Sumo Wrestler And a Student who must Outrace a Moose in The Sahara Desert          |
| 2.4431142807006836 | MAIDEN HOME            | A Lacklusture Saga of a Moose And a Teacher who must Kill a Forensic Psychologist in A MySQL Convention |
| 2.4431142807006836 | SHANE DARKNESS         | A Action-Packed Saga of a Moose And a Lumberjack who must Find a Woman in Berlin                        |
| 2.4431142807006836 | SLEEPLESS MONSOON      | A Amazing Saga of a Moose And a Pastry Chef who must Escape a Butler in Australia                       |
| 2.4431142807006836 | WAKE JAWS              | A Beautiful Saga of a Feminist And a Composer who must Challenge a Moose in Berlin                      |
| 2.4431142807006836 | WONKA SEA              | A Brilliant Saga of a Boat And a Mad Scientist who must Meet a Moose in Ancient India                   |
| 1.2399028539657593 | AIRPLANE SIERRA        | A Touching Saga of a Hunter And a Butler who must Discover a Butler in A Jet Boat                       |
| 1.2399028539657593 | ALASKA PHANTOM         | A Fanciful Saga of a Hunter And a Pastry Chef who must Vanquish a Boy in Australia                      |
| 1.2399028539657593 | ARMY FLINTSTONES       | A Boring Saga of a Database Administrator And a Womanizer who must Battle a Waitress in Nigeria         |
| 1.2399028539657593 | BEAR GRACELAND         | A Astounding Saga of a Dog And a Boy who must Kill a Teacher in The First Manned Space Station          |
| 1.2399028539657593 | BERETS AGENT           | A Taut Saga of a Crocodile And a Boy who must Overcome a Technical Writer in Ancient China              |
+--------------------+------------------------+---------------------------------------------------------------------------------------------------------+
Note that the result set would’ve been much larger if I hadn’t used LIMIT 15 to limit the number of results to 15.

Example 6 – Only Return Results Above a Certain Relevance Score

We can take the previous example a step further and filter out only those results with a certain relevance score. In this case I specify that the relevance score must be higher than 2.
Be careful when doing this though. As seen above, relevance values can be very high or very low, depending on factors such as how much text is in the column, how many other rows match the search term, etc.
SELECT 
  MATCH(title, description) AGAINST('Saga of a Moose') AS Relevance,
  title, 
  description
FROM film
WHERE MATCH(title, description) AGAINST('Saga of a Moose') > 2;
Result:
+--------------------+------------------------+---------------------------------------------------------------------------------------------------------+
| Relevance          | title                  | description                                                                                             |
+--------------------+------------------------+---------------------------------------------------------------------------------------------------------+
| 2.4431142807006836 | CAPER MOTIONS          | A Fateful Saga of a Moose And a Car who must Pursue a Woman in A MySQL Convention                       |
| 2.4431142807006836 | DATE SPEED             | A Touching Saga of a Composer And a Moose who must Discover a Dentist in A MySQL Convention             |
| 2.4431142807006836 | DELIVERANCE MULHOLLAND | A Astounding Saga of a Monkey And a Moose who must Conquer a Butler in A Shark Tank                     |
| 2.4431142807006836 | FLASH WARS             | A Astounding Saga of a Moose And a Pastry Chef who must Chase a Student in The Gulf of Mexico           |
| 2.4431142807006836 | HAROLD FRENCH          | A Stunning Saga of a Sumo Wrestler And a Student who must Outrace a Moose in The Sahara Desert          |
| 2.4431142807006836 | MAIDEN HOME            | A Lacklusture Saga of a Moose And a Teacher who must Kill a Forensic Psychologist in A MySQL Convention |
| 2.4431142807006836 | SHANE DARKNESS         | A Action-Packed Saga of a Moose And a Lumberjack who must Find a Woman in Berlin                        |
| 2.4431142807006836 | SLEEPLESS MONSOON      | A Amazing Saga of a Moose And a Pastry Chef who must Escape a Butler in Australia                       |
| 2.4431142807006836 | WAKE JAWS              | A Beautiful Saga of a Feminist And a Composer who must Challenge a Moose in Berlin                      |
| 2.4431142807006836 | WONKA SEA              | A Brilliant Saga of a Boat And a Mad Scientist who must Meet a Moose in Ancient India                   |
+--------------------+------------------------+---------------------------------------------------------------------------------------------------------+

Example 7 – Include Zero Relevance Results

Here’s an example of listing out the relevance values for each row, even if the relevance value is zero. We can do this by not using the MATCH() function in the WHERE clause.
In this example, I don’t actually use a WHERE clause. I only use a LIMIT clause to limit the number of results.
SELECT 
  MATCH(title, description) AGAINST('Scientist') AS Relevance,
  title, 
  description
FROM film
LIMIT 15;
Result:
+-------------------+------------------+-----------------------------------------------------------------------------------------------------------------------+
| Relevance         | title            | description                                                                                                           |
+-------------------+------------------+-----------------------------------------------------------------------------------------------------------------------+
| 1.026631474494934 | ACADEMY DINOSAUR | A Epic Drama of a Feminist And a Mad Scientist who must Battle a Teacher in The Canadian Rockies                      |
|                 0 | ACE GOLDFINGER   | A Astounding Epistle of a Database Administrator And a Explorer who must Find a Car in Ancient China                  |
|                 0 | ADAPTATION HOLES | A Astounding Reflection of a Lumberjack And a Car who must Sink a Lumberjack in A Baloon Factory                      |
|                 0 | AFFAIR PREJUDICE | A Fanciful Documentary of a Frisbee And a Lumberjack who must Chase a Monkey in A Shark Tank                          |
|                 0 | AFRICAN EGG      | A Fast-Paced Documentary of a Pastry Chef And a Dentist who must Pursue a Forensic Psychologist in The Gulf of Mexico |
|                 0 | AGENT TRUMAN     | A Intrepid Panorama of a Robot And a Boy who must Escape a Sumo Wrestler in Ancient China                             |
|                 0 | AIRPLANE SIERRA  | A Touching Saga of a Hunter And a Butler who must Discover a Butler in A Jet Boat                                     |
|                 0 | AIRPORT POLLOCK  | A Epic Tale of a Moose And a Girl who must Confront a Monkey in Ancient India                                         |
| 2.053262948989868 | ALABAMA DEVIL    | A Thoughtful Panorama of a Database Administrator And a Mad Scientist who must Outgun a Mad Scientist in A Jet Boat   |
|                 0 | ALADDIN CALENDAR | A Action-Packed Tale of a Man And a Lumberjack who must Reach a Feminist in Ancient China                             |
|                 0 | ALAMO VIDEOTAPE  | A Boring Epistle of a Butler And a Cat who must Fight a Pastry Chef in A MySQL Convention                             |
|                 0 | ALASKA PHANTOM   | A Fanciful Saga of a Hunter And a Pastry Chef who must Vanquish a Boy in Australia                                    |
|                 0 | ALI FOREVER      | A Action-Packed Drama of a Dentist And a Crocodile who must Battle a Feminist in The Canadian Rockies                 |
|                 0 | ALICE FANTASIA   | A Emotional Drama of a A Shark And a Database Administrator who must Vanquish a Pioneer in Soviet Georgia             |
| 1.026631474494934 | ALIEN CENTER     | A Brilliant Drama of a Cat And a Mad Scientist who must Battle a Feminist in A MySQL Convention                       |
+-------------------+------------------+-----------------------------------------------------------------------------------------------------------------------+

Example 8 – Boolean Mode

MySQL allows us to run full-text searches in boolean mode. To do this, add the IN BOOLEAN MODE modifier to your query.
Boolean mode allows you to use operators such as + and - to specify whether a particular word or phrase must or must not be present.
In the following example, I prefix each word with a plus sign (+) to indicate that both words must be present.
SELECT 
  title, 
  description
FROM film
WHERE MATCH(title, description) AGAINST('+Saga +Moose' IN BOOLEAN MODE)
LIMIT 3;
Result:
+------------------------+---------------------------------------------------------------------------------------------+
| title                  | description                                                                                 |
+------------------------+---------------------------------------------------------------------------------------------+
| CAPER MOTIONS          | A Fateful Saga of a Moose And a Car who must Pursue a Woman in A MySQL Convention           |
| DATE SPEED             | A Touching Saga of a Composer And a Moose who must Discover a Dentist in A MySQL Convention |
| DELIVERANCE MULHOLLAND | A Astounding Saga of a Monkey And a Moose who must Conquer a Butler in A Shark Tank         |
+------------------------+---------------------------------------------------------------------------------------------+
In the next example, I change one of the plus signs to a minus sign (-). This means that only those rows that contain the word Saga will be returned, but only if they don’t also contain Moose:
SELECT 
  title, 
  description
FROM film
WHERE MATCH(title, description) AGAINST('+Saga -Moose' IN BOOLEAN MODE)
LIMIT 3;
Result:
+------------------+-------------------------------------------------------------------------------------------------+
| title            | description                                                                                     |
+------------------+-------------------------------------------------------------------------------------------------+
| AIRPLANE SIERRA  | A Touching Saga of a Hunter And a Butler who must Discover a Butler in A Jet Boat               |
| ALASKA PHANTOM   | A Fanciful Saga of a Hunter And a Pastry Chef who must Vanquish a Boy in Australia              |
| ARMY FLINTSTONES | A Boring Saga of a Database Administrator And a Womanizer who must Battle a Waitress in Nigeria |
+------------------+-------------------------------------------------------------------------------------------------+
The minus sign is used to exclude results that would otherwise be returned. Therefore, if all our search words are prefixed with a minus sign, an empty set is returned.
SELECT 
  title, 
  description
FROM film
WHERE MATCH(title, description) AGAINST('-Saga -Moose' IN BOOLEAN MODE)
LIMIT 3;
Result:
Empty set (0.00 sec)
There are many more operators that can be used with boolean searches, such as ~<>*, and more. For more detailed information on using boolean mode, see the MySQL documentation for Boolean Full-Text Searches.

Example 9 – With Blind Query Expansion

You can use the WITH QUERY EXPANSION or the IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION modifiers to perform blind query expansion. This can be useful for when the search phrase is very short, and where the results returned could be too narrow (thus excluding potentially relevant documents).
Query expansion can broaden the search by returning rows that would otherwise not be returned. In particular, if a non-matching row contains words that are also contained in a matching row, that non-matching row could become a matching row. In other words, a non-matching row can still be returned, simply because it shares other words with a matching row.
To demonstrate, here’s a normal search without query expansion:
SELECT AlbumId, AlbumName
FROM Albums
WHERE MATCH(AlbumName) AGAINST('Blue' IN NATURAL LANGUAGE MODE);
Result:
+---------+------------+
| AlbumId | AlbumName  |
+---------+------------+
|      13 | Blue Night |
+---------+------------+
I explicitly stated IN NATURAL LANGUAGE MODE but this is the default mode, so I could also have omitted this modifier if I so chose to.
And here’s the same search with query expansion:
SELECT AlbumId, AlbumName
FROM Albums
WHERE MATCH(AlbumName) AGAINST('Blue' WITH QUERY EXPANSION);
Result:
+---------+-----------------+
| AlbumId | AlbumName       |
+---------+-----------------+
|      13 | Blue Night      |
|      19 | All Night Wrong |
+---------+-----------------+
In this case, two results are returned. Notice how the second result doesn’t contain the search phrase (blue). But it does contain the word Night which also happens to be in the first result. So the expanded query deems that good enough to be a match.
Album names probably aren’t a very good use case for query expansion mode. A better use case might be a search for say, database, where an expanded query might also return documents that contain names like MySQLOracle, etc even if they don’t contain the phrase database.
Here’s another example. However, this example uses double quotes to specify that the whole search term should be present.
If we do the search without query expansion:
SELECT 
  title, 
  description
FROM film
WHERE MATCH(title, description) AGAINST('"Feminist And a Mad Scientist"')
LIMIT 3;
Result:
+------------------+--------------------------------------------------------------------------------------------------+
| title            | description                                                                                      |
+------------------+--------------------------------------------------------------------------------------------------+
| ACADEMY DINOSAUR | A Epic Drama of a Feminist And a Mad Scientist who must Battle a Teacher in The Canadian Rockies |
+------------------+--------------------------------------------------------------------------------------------------+
Only one result is returned (it’s the only result that contains the whole phrase, exactly as typed).
But if we use blind query expansion, here’s what happens:
SELECT 
  title, 
  description
FROM film
WHERE MATCH(title, description) AGAINST('"Feminist And a Mad Scientist"' WITH QUERY EXPANSION)
LIMIT 3;
Result:
+--------------------+------------------------------------------------------------------------------------------------------+
| title              | description                                                                                          |
+--------------------+------------------------------------------------------------------------------------------------------+
| ACADEMY DINOSAUR   | A Epic Drama of a Feminist And a Mad Scientist who must Battle a Teacher in The Canadian Rockies     |
| DINOSAUR SECRETARY | A Action-Packed Drama of a Feminist And a Girl who must Reach a Robot in The Canadian Rockies        |
| VICTORY ACADEMY    | A Insightful Epistle of a Mad Scientist And a Explorer who must Challenge a Cat in The Sahara Desert |
+--------------------+------------------------------------------------------------------------------------------------------+
The first row is the same one as when we didn’t use query expansion. However, the query then goes and returns rows that only contain parts of our search term. This result is limited to three only because I used LIMIT 3. The actual results are much greater:
SELECT COUNT(*)
FROM film
WHERE MATCH(title, description) AGAINST('"Feminist And a Mad Scientist"' WITH QUERY EXPANSION);
Result:
+----------+
| COUNT(*) |
+----------+
|     1000 |
+----------+
It’s quite possible that many of those results are completely irrelevant to the search term. Therefore, blind query expansion usually works best for shorter search terms.
For more information on using blind query expansion, see the MySQL documentation: Full-Text Searches with Query Expansion.

0 comments:

Post a Comment