![]() |
||
![]() |
Home | Basic search | Fulltext search | Search by ID | Statistics | Documents | Help |
How FULLTEXT search works and what you can expect from it
The FULLTEXT search function matches a natural language query against a text collection
(which is simply the set of columns covered by a FULLTEXT index).
For every row in a table it returns relevance - a similarity measure between the text in that row
(in the columns that are part of the collection) and the query.
The rows returned are automatically sorted with relevance decreasing.
Relevance is a non-negative floating-point number. Zero relevance means no similarity.
Relevance is computed based on the number of words in the row, the number of unique words in that row,
the total number of words in the collection, and the number of documents (rows) that contain a particular word.
Any "word" that is present in the stopword list or just too short (3 characters or less) is ignored.
Every correct word in the collection and in the query is weighted, according to its significance in the query
or collection. This way, a word that is present in many documents will have lower weight
(and may even have a zero weight), because it has lower semantic value in this particular collection.
Otherwise, if the word is rare, it will receive a higher weight. The weights of the words are then combined
to compute the relevance of the row. Such a technique works best with large collections
(in fact, it was carefully tuned this way). For very small tables, word distribution does not reflect
adequately their semantical value, and this model may sometimes produce bizarre results.
Using keywords and boolean operatorsThis FULLTEXT search engine supports boolean mode. This means, it is possible to use keywords in combination with boolean operators. The boolean FULLTEXT search capability supports the following operators:
Examplesapple bananafind rows that contain at least one of these words. +apple +juice find rows that contain both words. +apple macintosh ... word "apple", but rank it higher if it also contains "macintosh". +apple -macintosh ... word "apple" but not "macintosh". +apple +(>pie <strudel) ... "apple" and "pie", or "apple" and "strudel" (in any order), but rank "apple pie" higher than "apple strudel". apple* ... "apple", "apples", "applesauce", "applet", ... "some words" ... "some words of wisdom", but not "some noise words". +"methane conversion" +"animal production" +methane -cattle ... "methane conversion" and "animal production" and methane, but not cattle. |