我有一个搜索表单。如果用户打错了ager
而不是anger
,它仍然会显示相关的结果,而不是显示0个找到的结果。
我遇到了PHP的levenshtein函数和例子,他们已经给出了数组正是我想要的[除了用户可以输入一个句子而不是一个单词],但我想实现它与数据库,但不知道如何去实现它与数据库。
这是我的代码:
if(!empty($search))
{
try {
$query = $this->_db->prepare($sql);
$query->execute();
if(!$query->rowCount()==0)
{
$foundRows = $this->_db->query("SELECT FOUND_ROWS()")->fetchColumn();
while($row = $query->fetch(PDO::FETCH_ASSOC))
{
$cQuote = $this->highlightWords(htmlspecialchars($row['cQuotes']),$search);
$search_result[] = array('success' => true, 'totalRows' => $foundRows, 'cQuotes' => $cQuote, 'vAuthor' => $this->h($row['vAuthor']), 'vBookName' => $this->h($row['vBookName']), 'vRef' => $this->h($row['vRef']));
}
$response = json_encode($search_result);
echo $response;
return TRUE;
}
else
{
$ex = "No results found for " .$search;
$this->errorMsg($ex);
}
$query->closeCursor();
}
catch (Exception $ex){
$ex = "Problem: " .$ex;
$this->errorMsg($ex);
}
}
else
{
$ex = "Please enter something";
$this->errorMsg($ex);
}
我应该补充一下,我正在使用MySQL + PDO。
要做到这一点,您需要三个条件:
- MySQL端的Levenshtein-distance实现
- 在每个文本中找到的单词索引
- 将搜索查询拆分为单词,每个单词一个
LEFT JOIN
和一个HAVING
子句
数据库模式示例:
+---------+----------------------------------------------+
| text_id | text |
+---------+----------------------------------------------+
| 1 | The quick brown fox jumps over the lazy dog |
| 2 | The slow brown foxes jump over the lazy dogs |
+---------+----------------------------------------------+
<<p> 词/strong> +-------+---------+
| word | text_id |
+-------+---------+
| fox | 1 |
| foxes | 2 |
| dog | 1 |
| dogs | 2 |
+-------+---------+
一旦你有了这个,假设有人搜索"foxs dogg
",你会建立一个像这样的查询:
SELECT text FROM text
LEFT JOIN word w1 ON w1.text_id = text.text_id AND LEVENSHTEIN(w1.word, "foxs") < 3
LEFT JOIN word w2 ON w2.text_id = text.text_id AND LEVENSHTEIN(w2.word, "dogg") < 3
GROUP BY text.text_id
HAVING COUNT(*) = 2
…地点:
- 每个单词有一个
LEFT JOIN
(例如:foxs
和dogg
) - 您有一个
HAVING
子句,包含单词总数(例如:HAVING COUNT(*) = 2
) - 指定每个单词的最大距离(例如:
LEVENSHTEIN(...) < 3
)
上面的操作将同时返回两个条目。
下面是另一个例子:
SELECT text FROM text
LEFT JOIN word w1 ON w1.text_id = text.text_id AND LEVENSHTEIN(w1.word, "foxs") < 3
LEFT JOIN word w2 ON w2.text_id = text.text_id AND LEVENSHTEIN(w2.word, "slows") < 3
GROUP BY text.text_id
HAVING COUNT(*) = 2
上面将只返回text_id = 2
。
虽然这是一个工作示例,但您确实应该寻找已经实现的搜索算法,如Solr的SpellCheck组件。