In MySQL, the MATCH() function performs a full-text search. It accepts a comma separated list of table columns to be searched.
The table/s must have a
FULLTEXT
index before you can do a full-text search against them (although boolean queries against a MyISAM
search index can work — albeit slowly — even without a FULLTEXT
index).
You can create a
FULLTEXT
index when creating the table (using the CREATE TABLE
statement), or you can use the ALTER TABLE
statement or the CREATE INDEX
statement if the table already exists.
By default, the search is case-insensitive. To perform a case-sensitive search, use a case-sensitive or binary collation for the indexed columns.
Syntax
The syntax for the
MATCH()
function goes like this:MATCH (col1,col2,...) AGAINST (expr [search_modifier])
Where
col1,col2,...
is the comma-separated list of columns to search, and expr
is the input string/expression.
The optional
search_modifier
argument allows you to specify the search type. It can be any of the following values:IN NATURAL LANGUAGE MODE
IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION
IN BOOLEAN MODE
WITH QUERY EXPANSION
The default mode is
IN NATURAL LANGUAGE MODE
.Example 1 – Basic Usage
Here’s an example of how to use this function:
SELECT AlbumId, AlbumName FROM Albums WHERE MATCH(AlbumName) AGAINST('cool');
Result:
+---------+--------------------+ | AlbumId | AlbumName | +---------+--------------------+ | 5 | Casualties of Cool | +---------+--------------------+
Here’s the full table that the above query was run against:
SELECT AlbumId, AlbumName FROM Albums;
Result:
+---------+--------------------------+ | AlbumId | AlbumName | +---------+--------------------------+ | 1 | Powerslave | | 2 | Powerage | | 3 | Singing Down the Lane | | 4 | Ziltoid the Omniscient | | 5 | Casualties of Cool | | 6 | Epicloud | | 7 | Somewhere in Time | | 8 | Piece of Mind | | 9 | Killers | | 10 | No Prayer for the Dying | | 11 | No Sound Without Silence | | 12 | Big Swing Face | | 13 | Blue Night | | 14 | Eternity | | 15 | Scandinavia | | 16 | Long Lost Suitcase | | 17 | Praise and Blame | | 18 | Along Came Jones | | 19 | All Night Wrong | | 20 | The Sixteen Men of Tain | +---------+--------------------------+
Example 2 – Error: “Can’t find FULLTEXT index”
InnoDB
tables must have a FULLTEXT
index before it will return results from a full-text search. If it doesn’t have a FULLTEXT
index, you’ll likely get the following error:ERROR 1191 (HY000): Can't find FULLTEXT index matching the column list
If you get that error, you’ll need to add a
FULLTEXT
index for all columns that you’re trying to search (see next example).
The exception to this could be if you’re running a boolean query against a
MyISAM
search index.
To be more specific, the MySQL documentation for boolean full-text searches states the following:
InnoDB
tables require aFULLTEXT
index on all columns of theMATCH()
expression to perform boolean queries. Boolean queries against aMyISAM
search index can work even without aFULLTEXT
index, although a search executed in this fashion would be quite slow.
Example 3 – Adding a FULLTEXT Index to an Existing Table
Here’s an example of adding a
FULLTEXT
index to an existing table:ALTER TABLE Albums ADD FULLTEXT(AlbumName);
In this case, I indexed the contents of the
AlbumName
column.
To index multiple columns, separate them with a comma (see next example).
Example 4 – Searching Multiple Columns
If you think you’ll need to search more than one column you’ll need to create an index that includes all columns to be searched. To do this, simply include each column as a comma-separated list.
Here’s an example where I add a
FULLTEXT
index to the film
table (which is part of the Sakila sample database).ALTER TABLE film ADD FULLTEXT(title, description);
In this case, I index the contents of the
title
and description
columns.
Now that we’ve created a
FULLTEXT
index for both columns, we can do a full-text search against them:SELECT title, description FROM film WHERE MATCH(title, description) AGAINST('vertigo');
Result:
+-------------------+-----------------------------------------------------------------------------------------------------------+ | title | description | +-------------------+-----------------------------------------------------------------------------------------------------------+ | VERTIGO NORTHWEST | A Unbelieveable Display of a Mad Scientist And a Mad Scientist who must Outgun a Mad Cow in Ancient Japan | +-------------------+-----------------------------------------------------------------------------------------------------------+
Here’s another search, where the exact key-phrase does not match, but each keyword within that phrase does:
SELECT title, description FROM film WHERE MATCH(title, description) AGAINST('Iron Maiden');
Result:
+-------------+---------------------------------------------------------------------------------------------------------+ | title | description | +-------------+---------------------------------------------------------------------------------------------------------+ | IRON MOON | A Fast-Paced Documentary of a Mad Cow And a Boy who must Pursue a Dentist in A Baloon | | MAIDEN HOME | A Lacklusture Saga of a Moose And a Teacher who must Kill a Forensic Psychologist in A MySQL Convention | +-------------+---------------------------------------------------------------------------------------------------------+
If you only want the exact phrase to match, place double quotes around it:
SELECT title, description FROM film WHERE MATCH(title, description) AGAINST('"Iron Maiden"');
Result:
Empty set (0.00 sec)
In this case, none of the columns contain that exact phrase.
Example 5 – Return the Relevance Score
Whenever you use the
MATCH()
function, each row in the table is assigned a relevance value. In other words, each row gets a score that determines how relevant it is to the search term. The results are then ordered by relevance (highest relevance first).
Relevance values are nonnegative floating-point numbers. Zero relevance means no similarity. Relevance is computed based on the number of words in the row (document), the number of unique words in the row, the total number of words in the collection, and the number of rows that contain a particular word.
To return the relevance of each result, simply include the
MATCH()
function in your list of columns to select.
Example:
SELECT MATCH(title, description) AGAINST('Iron Maiden') AS Relevance, title, description FROM film WHERE MATCH(title, description) AGAINST('Iron Maiden');
Result:
+-----------+-------------+---------------------------------------------------------------------------------------------------------+ | Relevance | title | description | +-----------+-------------+---------------------------------------------------------------------------------------------------------+ | 9 | IRON MOON | A Fast-Paced Documentary of a Mad Cow And a Boy who must Pursue a Dentist in A Baloon | | 9 | MAIDEN HOME | A Lacklusture Saga of a Moose And a Teacher who must Kill a Forensic Psychologist in A MySQL Convention | +-----------+-------------+---------------------------------------------------------------------------------------------------------+
In this case the relevance score is very high for both rows.
Here’s another one where the relevance is lower:
SELECT MATCH(title, description) AGAINST('Saga of a Moose') AS Relevance, title, description FROM film WHERE MATCH(title, description) AGAINST('Saga of a Moose') LIMIT 15;
Result:
+--------------------+------------------------+---------------------------------------------------------------------------------------------------------+ | Relevance | title | description | +--------------------+------------------------+---------------------------------------------------------------------------------------------------------+ | 2.4431142807006836 | CAPER MOTIONS | A Fateful Saga of a Moose And a Car who must Pursue a Woman in A MySQL Convention | | 2.4431142807006836 | DATE SPEED | A Touching Saga of a Composer And a Moose who must Discover a Dentist in A MySQL Convention | | 2.4431142807006836 | DELIVERANCE MULHOLLAND | A Astounding Saga of a Monkey And a Moose who must Conquer a Butler in A Shark Tank | | 2.4431142807006836 | FLASH WARS | A Astounding Saga of a Moose And a Pastry Chef who must Chase a Student in The Gulf of Mexico | | 2.4431142807006836 | HAROLD FRENCH | A Stunning Saga of a Sumo Wrestler And a Student who must Outrace a Moose in The Sahara Desert | | 2.4431142807006836 | MAIDEN HOME | A Lacklusture Saga of a Moose And a Teacher who must Kill a Forensic Psychologist in A MySQL Convention | | 2.4431142807006836 | SHANE DARKNESS | A Action-Packed Saga of a Moose And a Lumberjack who must Find a Woman in Berlin | | 2.4431142807006836 | SLEEPLESS MONSOON | A Amazing Saga of a Moose And a Pastry Chef who must Escape a Butler in Australia | | 2.4431142807006836 | WAKE JAWS | A Beautiful Saga of a Feminist And a Composer who must Challenge a Moose in Berlin | | 2.4431142807006836 | WONKA SEA | A Brilliant Saga of a Boat And a Mad Scientist who must Meet a Moose in Ancient India | | 1.2399028539657593 | AIRPLANE SIERRA | A Touching Saga of a Hunter And a Butler who must Discover a Butler in A Jet Boat | | 1.2399028539657593 | ALASKA PHANTOM | A Fanciful Saga of a Hunter And a Pastry Chef who must Vanquish a Boy in Australia | | 1.2399028539657593 | ARMY FLINTSTONES | A Boring Saga of a Database Administrator And a Womanizer who must Battle a Waitress in Nigeria | | 1.2399028539657593 | BEAR GRACELAND | A Astounding Saga of a Dog And a Boy who must Kill a Teacher in The First Manned Space Station | | 1.2399028539657593 | BERETS AGENT | A Taut Saga of a Crocodile And a Boy who must Overcome a Technical Writer in Ancient China | +--------------------+------------------------+---------------------------------------------------------------------------------------------------------+
Note that the result set would’ve been much larger if I hadn’t used
LIMIT 15
to limit the number of results to 15.Example 6 – Only Return Results Above a Certain Relevance Score
We can take the previous example a step further and filter out only those results with a certain relevance score. In this case I specify that the relevance score must be higher than 2.
Be careful when doing this though. As seen above, relevance values can be very high or very low, depending on factors such as how much text is in the column, how many other rows match the search term, etc.
SELECT MATCH(title, description) AGAINST('Saga of a Moose') AS Relevance, title, description FROM film WHERE MATCH(title, description) AGAINST('Saga of a Moose') > 2;
Result:
+--------------------+------------------------+---------------------------------------------------------------------------------------------------------+ | Relevance | title | description | +--------------------+------------------------+---------------------------------------------------------------------------------------------------------+ | 2.4431142807006836 | CAPER MOTIONS | A Fateful Saga of a Moose And a Car who must Pursue a Woman in A MySQL Convention | | 2.4431142807006836 | DATE SPEED | A Touching Saga of a Composer And a Moose who must Discover a Dentist in A MySQL Convention | | 2.4431142807006836 | DELIVERANCE MULHOLLAND | A Astounding Saga of a Monkey And a Moose who must Conquer a Butler in A Shark Tank | | 2.4431142807006836 | FLASH WARS | A Astounding Saga of a Moose And a Pastry Chef who must Chase a Student in The Gulf of Mexico | | 2.4431142807006836 | HAROLD FRENCH | A Stunning Saga of a Sumo Wrestler And a Student who must Outrace a Moose in The Sahara Desert | | 2.4431142807006836 | MAIDEN HOME | A Lacklusture Saga of a Moose And a Teacher who must Kill a Forensic Psychologist in A MySQL Convention | | 2.4431142807006836 | SHANE DARKNESS | A Action-Packed Saga of a Moose And a Lumberjack who must Find a Woman in Berlin | | 2.4431142807006836 | SLEEPLESS MONSOON | A Amazing Saga of a Moose And a Pastry Chef who must Escape a Butler in Australia | | 2.4431142807006836 | WAKE JAWS | A Beautiful Saga of a Feminist And a Composer who must Challenge a Moose in Berlin | | 2.4431142807006836 | WONKA SEA | A Brilliant Saga of a Boat And a Mad Scientist who must Meet a Moose in Ancient India | +--------------------+------------------------+---------------------------------------------------------------------------------------------------------+
Example 7 – Include Zero Relevance Results
Here’s an example of listing out the relevance values for each row, even if the relevance value is zero. We can do this by not using the
MATCH()
function in the WHERE
clause.
In this example, I don’t actually use a
WHERE
clause. I only use a LIMIT
clause to limit the number of results.SELECT MATCH(title, description) AGAINST('Scientist') AS Relevance, title, description FROM film LIMIT 15;
Result:
+-------------------+------------------+-----------------------------------------------------------------------------------------------------------------------+ | Relevance | title | description | +-------------------+------------------+-----------------------------------------------------------------------------------------------------------------------+ | 1.026631474494934 | ACADEMY DINOSAUR | A Epic Drama of a Feminist And a Mad Scientist who must Battle a Teacher in The Canadian Rockies | | 0 | ACE GOLDFINGER | A Astounding Epistle of a Database Administrator And a Explorer who must Find a Car in Ancient China | | 0 | ADAPTATION HOLES | A Astounding Reflection of a Lumberjack And a Car who must Sink a Lumberjack in A Baloon Factory | | 0 | AFFAIR PREJUDICE | A Fanciful Documentary of a Frisbee And a Lumberjack who must Chase a Monkey in A Shark Tank | | 0 | AFRICAN EGG | A Fast-Paced Documentary of a Pastry Chef And a Dentist who must Pursue a Forensic Psychologist in The Gulf of Mexico | | 0 | AGENT TRUMAN | A Intrepid Panorama of a Robot And a Boy who must Escape a Sumo Wrestler in Ancient China | | 0 | AIRPLANE SIERRA | A Touching Saga of a Hunter And a Butler who must Discover a Butler in A Jet Boat | | 0 | AIRPORT POLLOCK | A Epic Tale of a Moose And a Girl who must Confront a Monkey in Ancient India | | 2.053262948989868 | ALABAMA DEVIL | A Thoughtful Panorama of a Database Administrator And a Mad Scientist who must Outgun a Mad Scientist in A Jet Boat | | 0 | ALADDIN CALENDAR | A Action-Packed Tale of a Man And a Lumberjack who must Reach a Feminist in Ancient China | | 0 | ALAMO VIDEOTAPE | A Boring Epistle of a Butler And a Cat who must Fight a Pastry Chef in A MySQL Convention | | 0 | ALASKA PHANTOM | A Fanciful Saga of a Hunter And a Pastry Chef who must Vanquish a Boy in Australia | | 0 | ALI FOREVER | A Action-Packed Drama of a Dentist And a Crocodile who must Battle a Feminist in The Canadian Rockies | | 0 | ALICE FANTASIA | A Emotional Drama of a A Shark And a Database Administrator who must Vanquish a Pioneer in Soviet Georgia | | 1.026631474494934 | ALIEN CENTER | A Brilliant Drama of a Cat And a Mad Scientist who must Battle a Feminist in A MySQL Convention | +-------------------+------------------+-----------------------------------------------------------------------------------------------------------------------+
Example 8 – Boolean Mode
MySQL allows us to run full-text searches in boolean mode. To do this, add the
IN BOOLEAN MODE
modifier to your query.
Boolean mode allows you to use operators such as
+
and -
to specify whether a particular word or phrase must or must not be present.
In the following example, I prefix each word with a plus sign (
+
) to indicate that both words must be present.SELECT title, description FROM film WHERE MATCH(title, description) AGAINST('+Saga +Moose' IN BOOLEAN MODE) LIMIT 3;
Result:
+------------------------+---------------------------------------------------------------------------------------------+ | title | description | +------------------------+---------------------------------------------------------------------------------------------+ | CAPER MOTIONS | A Fateful Saga of a Moose And a Car who must Pursue a Woman in A MySQL Convention | | DATE SPEED | A Touching Saga of a Composer And a Moose who must Discover a Dentist in A MySQL Convention | | DELIVERANCE MULHOLLAND | A Astounding Saga of a Monkey And a Moose who must Conquer a Butler in A Shark Tank | +------------------------+---------------------------------------------------------------------------------------------+
In the next example, I change one of the plus signs to a minus sign (
-
). This means that only those rows that contain the word Saga
will be returned, but only if they don’t also contain Moose
:SELECT title, description FROM film WHERE MATCH(title, description) AGAINST('+Saga -Moose' IN BOOLEAN MODE) LIMIT 3;
Result:
+------------------+-------------------------------------------------------------------------------------------------+ | title | description | +------------------+-------------------------------------------------------------------------------------------------+ | AIRPLANE SIERRA | A Touching Saga of a Hunter And a Butler who must Discover a Butler in A Jet Boat | | ALASKA PHANTOM | A Fanciful Saga of a Hunter And a Pastry Chef who must Vanquish a Boy in Australia | | ARMY FLINTSTONES | A Boring Saga of a Database Administrator And a Womanizer who must Battle a Waitress in Nigeria | +------------------+-------------------------------------------------------------------------------------------------+
The minus sign is used to exclude results that would otherwise be returned. Therefore, if all our search words are prefixed with a minus sign, an empty set is returned.
SELECT title, description FROM film WHERE MATCH(title, description) AGAINST('-Saga -Moose' IN BOOLEAN MODE) LIMIT 3;
Result:
Empty set (0.00 sec)
There are many more operators that can be used with boolean searches, such as
~
, <
, >
, *
, and more. For more detailed information on using boolean mode, see the MySQL documentation for Boolean Full-Text Searches.Example 9 – With Blind Query Expansion
You can use the
WITH QUERY EXPANSION
or the IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION
modifiers to perform blind query expansion. This can be useful for when the search phrase is very short, and where the results returned could be too narrow (thus excluding potentially relevant documents).
Query expansion can broaden the search by returning rows that would otherwise not be returned. In particular, if a non-matching row contains words that are also contained in a matching row, that non-matching row could become a matching row. In other words, a non-matching row can still be returned, simply because it shares other words with a matching row.
To demonstrate, here’s a normal search without query expansion:
SELECT AlbumId, AlbumName FROM Albums WHERE MATCH(AlbumName) AGAINST('Blue' IN NATURAL LANGUAGE MODE);
Result:
+---------+------------+ | AlbumId | AlbumName | +---------+------------+ | 13 | Blue Night | +---------+------------+
I explicitly stated
IN NATURAL LANGUAGE MODE
but this is the default mode, so I could also have omitted this modifier if I so chose to.
And here’s the same search with query expansion:
SELECT AlbumId, AlbumName FROM Albums WHERE MATCH(AlbumName) AGAINST('Blue' WITH QUERY EXPANSION);
Result:
+---------+-----------------+ | AlbumId | AlbumName | +---------+-----------------+ | 13 | Blue Night | | 19 | All Night Wrong | +---------+-----------------+
In this case, two results are returned. Notice how the second result doesn’t contain the search phrase (
blue
). But it does contain the word Night
which also happens to be in the first result. So the expanded query deems that good enough to be a match.
Album names probably aren’t a very good use case for query expansion mode. A better use case might be a search for say, database, where an expanded query might also return documents that contain names like MySQL, Oracle, etc even if they don’t contain the phrase database.
Here’s another example. However, this example uses double quotes to specify that the whole search term should be present.
If we do the search without query expansion:
SELECT title, description FROM film WHERE MATCH(title, description) AGAINST('"Feminist And a Mad Scientist"') LIMIT 3;
Result:
+------------------+--------------------------------------------------------------------------------------------------+ | title | description | +------------------+--------------------------------------------------------------------------------------------------+ | ACADEMY DINOSAUR | A Epic Drama of a Feminist And a Mad Scientist who must Battle a Teacher in The Canadian Rockies | +------------------+--------------------------------------------------------------------------------------------------+
Only one result is returned (it’s the only result that contains the whole phrase, exactly as typed).
But if we use blind query expansion, here’s what happens:
SELECT title, description FROM film WHERE MATCH(title, description) AGAINST('"Feminist And a Mad Scientist"' WITH QUERY EXPANSION) LIMIT 3;
Result:
+--------------------+------------------------------------------------------------------------------------------------------+ | title | description | +--------------------+------------------------------------------------------------------------------------------------------+ | ACADEMY DINOSAUR | A Epic Drama of a Feminist And a Mad Scientist who must Battle a Teacher in The Canadian Rockies | | DINOSAUR SECRETARY | A Action-Packed Drama of a Feminist And a Girl who must Reach a Robot in The Canadian Rockies | | VICTORY ACADEMY | A Insightful Epistle of a Mad Scientist And a Explorer who must Challenge a Cat in The Sahara Desert | +--------------------+------------------------------------------------------------------------------------------------------+
The first row is the same one as when we didn’t use query expansion. However, the query then goes and returns rows that only contain parts of our search term. This result is limited to three only because I used
LIMIT 3
. The actual results are much greater:SELECT COUNT(*) FROM film WHERE MATCH(title, description) AGAINST('"Feminist And a Mad Scientist"' WITH QUERY EXPANSION);
Result:
+----------+ | COUNT(*) | +----------+ | 1000 | +----------+
It’s quite possible that many of those results are completely irrelevant to the search term. Therefore, blind query expansion usually works best for shorter search terms.
For more information on using blind query expansion, see the MySQL documentation: Full-Text Searches with Query Expansion.
0 comments:
Post a Comment