4

We have a simple search on our site that uses MySQL fulltext search and for some reason it doesn't seem to be returning the correct results. I don't know if it's some kind of issue with Amazon RDS (where our database server resides) or with the query we are requesting.

Here is the structure of the database table:

CREATE TABLE `items` (
  `object_id` int(9) unsigned NOT NULL DEFAULT '0',
  `slug` varchar(100) DEFAULT NULL,
  `name` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`object_id`),
  FULLTEXT KEY `name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

And here is a simple fulltext search query on this table and the returned results:

select object_id ,slug,name from items where MATCH (name) AGAINST ('+ski*' IN BOOLEAN MODE) order by name;

+-----------+-----------------------------------+------------------+
| object_id | slug                              | name             |
+-----------+-----------------------------------+------------------+
|  10146041 | us/new-hampshire/dartmouth-skiway | Dartmouth Skiway |
+-----------+-----------------------------------+------------------+

If I instead use LIKE I get a different set of results:

select object_id,slug,name from items where name LIKE "%ski%" order by name;

+-----------+------------------------------------------+----------------------------------+
| object_id | slug                                     | name                             |
+-----------+------------------------------------------+----------------------------------+
|  10146546 | us/new-york/brantling-ski                | Brantling Ski                    |
|  10146548 | us/new-york/buffalo-ski-club             | Buffalo Ski Club                 |
|  10146041 | us/new-hampshire/dartmouth-skiway        | Dartmouth Skiway                 |
|  10146352 | us/montana/discover-ski                  | Discover Ski                     |
|  10144882 | us/california/donner-ski-ranch           | Donner Ski Ranch                 |
|  10146970 | us/new-york/hickory-ski-center           | Hickory Ski Center               |
|  10146973 | us/new-york/holimont-ski-area            | Holimont Ski Area                |
|  10146283 | us/minnesota/hyland-ski                  | Hyland Ski                       |
|  10145911 | us/nevada/las-vegas-ski-snowboard-resort | Las Vegas Ski & Snowboard Resort |
|  10146977 | us/new-york/maple-ski-ridge              | Maple Ski Ridge                  |
|  10146774 | us/oregon/mount-hood-ski-bowl            | Mt. Hood Ski Bowl                |
|  10145949 | us/new-mexico/sipapu-ski                 | Sipapu Ski                       |
|  10145952 | us/new-mexico/ski-apache                 | Ski Apache                       |
|  10146584 | us/north-carolina/ski-beech              | Ski Beech                        |
|  10147973 | canada/quebec/ski-bromont                | Ski Bromont                      |
|  10146106 | us/michigan/ski-brule                    | Ski Brule                        |
|  10145597 | us/massachusetts/ski-butternut           | Ski Butternut                    |
|  10145117 | us/colorado/ski-cooper                   | Ski Cooper                       |
|  10146917 | us/pennsylvania/ski-denton               | Ski Denton                       |
|  10145954 | us/new-mexico/ski-santa-fe               | Ski Santa Fe                     |
|  10146918 | us/pennsylvania/ski-sawmill              | Ski Sawmill                      |
|  10145299 | us/illinois/ski-snowstar                 | Ski Snowstar                     |
|  10145138 | us/connecticut/ski-sundown               | Ski Sundown                      |
|  10145598 | us/massachusetts/ski-ward                | Ski Ward                         |
+-----------+------------------------------------------+----------------------------------+

I'm at a complete loss as to why the query using fulltext search is not working. I'm hoping that some MySQL expert out there can point out the error in our query.

Thanks in advance for your help!

2 Answers 2

8

From MySQL docs

  • + A leading plus sign indicates that this word must be present in each row that is returned.

  • * The asterisk serves as the truncation (or wildcard) operator. Unlike the other operators, it should be appended to the word to be affected. Words match if they begin with the word preceding the * operator.

    If a word is specified with the truncation operator, it is not stripped from a boolean query, even if it is too short (as determined from the ft_min_word_len setting) or a stopword. This occurs because the word is not seen as too short or a stopword, but as a prefix that must be present in the document in the form of a word that begins with the prefix.

In Context:

MATCH(...) AGAINST(...)

MATCH (name) AGAINST ('+ski*' IN BOOLEAN MODE) means that you're searching for rows where a word in the name column must contain ski, and must begin with the word ski.

From the set you've posted, Dartmouth Skiway is the only name that conforms to these requirements: it contains the word ski, and is prefixed by the word ski.

The other name columns, though they match the first rule: must contain ski, they are not prefixed with ski, as stipulated in your rule. The row returned by your boolean search is the only one with a name column that contains a word that both contains ski and is a word prefixed by ski.

As suggested by ajreal, try decreasing the ft_min_len_word_setting in my.cnf. Your search might be failing to come up with the results you expect because of the default setting. Try reducing it to 3.

WHERE column LIKE %text%

WHERE name LIKE "%ski%" searches for rows with name columns that contain ski, no matter where the word occurs.

4
  • Thanks for the thorough explanation! Can you suggest how I would update my original MATCH(...) AGAINST(...) query to perform an identical search to the LIKE query? I now understand the problem but am still not clear on the solution.
    – Russell C.
    Commented Jan 30, 2011 at 1:15
  • If you aren't worried about whether anything comes after ski, then you could try removing the *. This should mean "find rows with name columns that contain a word that starts with ski. Commented Jan 30, 2011 at 1:20
  • The problem I'm having is that no matter how I update the query it returns 0 results except in the form posted above. Could this have something to do with the ft_min_work_len as @ajreal suggested below?
    – Russell C.
    Commented Jan 30, 2011 at 1:22
  • Like ajreal said, could you give it a try? You'll need to add the lines he mentions in my.cnf, found in /etc/my.cnf on standard linux MySQL installations, then restart mysql. Let us know if it works? Commented Jan 30, 2011 at 2:12
1

The minimum and maximum lengths of words to be indexed are defined by the ft_min_word_len and ft_max_word_len system variables. (See Section 5.1.4, “Server System Variables”.) The default minimum value is four characters; the default maximum is version dependent. If you change either value, you must rebuild your FULLTEXT indexes. For example, if you want three-character words to be searchable, you can set the ft_min_word_len variable by putting the following lines in an option file:

resource - http://dev.mysql.com/doc/refman/5.1/en/fulltext-fine-tuning.html

configuration:

[mysqld]
ft_min_word_len=3
1
  • The lines mentioned should go in my.cnf, that can be found in /etc/my.cnf on standard linux MySQL installations. Commented Jan 30, 2011 at 2:14

Not the answer you're looking for? Browse other questions tagged or ask your own question.