Scripting an internal search engine requires :
- a client form to submit requests
- database fields to search posted keywords
- database queries to match the form request
I want to focus on the database requirements to make such a search.
I used to build queries with sub-queries of the following type :
SELECT id,title,content from table_name WHERE title LIKE ‘%”.$search.”%’ OR content LIKE ‘%”.$search.”%’;
which performs a kind of regular expression search and can lead to MySql resource problems (large number of rows parsed, large number of results found,…).
Under MySql, a FULLTEXT search may solve this resource problem and achieve the search since it performs the search in indexed data. This feature allows
- stopwords (words which will be excluded by the server itself)
- boolean searches (using + or -, for example, to add to search or substract from search)
- relevancy scoring
Type of table that allows fulltext indexing
FULLTEXT search is available in MyISAM tables. Though the InnoDB Storage Engine offers transactions (commit and rollback error handling), the fulltext search feature is absent. So, let’s stick to MyISAM tables.
You can define a FULLTEXT index on one or more table fields of types CHAR, VARCHAR and TEXT. As stated in MySql reference on the subject, creating rows in a table with fulltext index will take longer. So, if lots of rows need to be treated with a fulltext index (a fulltext index will be most efficient in this very case), it is better to first insert the rows, then create the index.
You can add a fulltext index on the 2 fields “title” and “content” with the following query :
ALTER TABLE table_name ADD FULLTEXT index_name (‘title’ , ‘content’);
Once the table has a fulltext index defined, the query can be the following :
SELECT id,title,content from table_name WHERE MATCH(title,content) AGAINST (‘”.$search.”‘);
The fields in the MATCH statement have to be the same as defined in the table’s fulltext index, and you can define a fulltext index on one or more field.
Fulltext search parameters
MySql is configured with a set of automatic parameters :
- the results are ordered by the relevancy score MySql gives to each row found
- words with less than 4 characters will be ignored
- the fulltext index contains complete words only
- depending on the language MySql is set to, a list of stopwords will be automatically ignored
- the score for each row depends on the number of occurences of that word in the table. If a word occurs in more than half the lines, it will be ignored
Changing these default parameters will only be possible if your MySql Web Server allows you to modify those parameters.
Boolean mode search
The MATCH… AGAINST… statement allows you to perform a Boolean search :
SELECT id,title,content from table_name WHERE MATCH(title,content) AGAINST (‘”.$search.”‘ IN BOOLEAN MODE);
MySql will then return all rows matching the searched words even if more than 50% of the rows contain the keyword.
Boolean searches allow you to use operators like + or – to force MySql to return rows containing ‘works’ and hide rows containing ‘tables’ :
SELECT id,title,content from table_name WHERE MATCH(title,content) AGAINST (‘+works -tables’ IN BOOLEAN MODE);
You can also use the * operator as a wildcard in Boolean mode to search part of a word :
SELECT id,title,content from table_name WHERE MATCH(title,content) AGAINST (‘peopl*’ IN BOOLEAN MODE);