我正在尝试建立一个用户使用语言的可搜索数据库。
例如
$john = array("english", "french", "spanish");
$jack = array("french", "spanish");
$jill = array("english", "spanish");
我想把它们保存到MySQL数据库这样我以后就可以沿着(伪代码)
行运行一些东西SELECT * FROM users WHERE spoken_languages = "french" and "spanish"
我知道如果我有speaks_english
, speaks_french
和speaks_spanish
列,那么我可以搜索
SELECT * FROM users WHERE speaks_french = "true" and speaks_spanish = "true"
但是每次遇到新语言时添加新列的可伸缩性不是很好。我考虑过像
这样的表john | english
john | french
john | spanish
jack | french
jack | spanish
jill | english
jill | spanish
因为至少要得到用户使用的语言我可以运行
SELECT * FROM spoken_languages WHERE user = "jack"
但是为了搜索既说法语又说西班牙语的人我需要查询所有说法语的用户,所有说西班牙语的用户然后计算交集。这似乎非常低效。
那么我问你,我如何保存这个口语数组,以便我以后可以在不破坏服务器的情况下搜索数据库?
您的问题有正确的解决方案,person_language
表看起来像这样
john | english
john | french
jack | spanish
你可以这样查询。
SELECT person
FROM person_language
WHERE language IN ( 'english', 'spanish')
GROUP BY person
HAVING COUNT(*) = 2
在(language, person)
上设置一个索引,它会很好地放大。
如果你想让每个人都说西班牙语和至少一种其他语言,你可以这样做。
SELECT a.person
FROM person_language AS a
JOIN ( SELECT person
FROM person_language
GROUP BY person
HAVING COUNT(*) >= 2
) AS b ON a.person = b.person
WHERE a.language = 'spanish'
它使用JOIN指令获取说西班牙语的人和说两种或两种以上语言的人的交集。
您可以使用自连接:
以一种有效的方式执行此查询。SELECT * FROM users u1
JOIN users u2 USING (user)
WHERE (u1.lang, u2.lang) = ('french', 'spanish')
请参阅我的演示,SQL查询模式,优化中的关系划分解决方案示例。
如果您有正确的索引,我的测试表明这种自连接解决方案比groupby解决方案快20倍。
我将采用如下三种表设置
CREATE TABLE languages
(
`id` int not null auto_increment primary key,
`language` varchar(32) unique
);
CREATE TABLE users
(
`id` int not null auto_increment primary key,
`name` varchar(32)
);
CREATE TABLE user_language
(
`user_id` int,
`language_id` int,
primary key (user_id, language_id)
);
IMHO,如果你没有数百万的用户和所有可能的语言,寻找灵活性,而不是为毫秒而奋斗,特别是如果你一次检查超过2种语言,你可以通过在HAVING
子句中使用MAX()
或SUM()
聚合来实现它。
这里有一些示例查询:
-- Speaks both French AND Spanish
SELECT u.name
FROM user_language ul JOIN languages l
ON ul.language_id = l.id JOIN users u
ON ul.user_id = u.id
GROUP BY u.id
HAVING MAX(l.language = 'french') = 1
AND MAX(l.language = 'spanish') = 1;
输出:<>之前|名称||------||约翰||杰克|之前-- Speaks both French OR Spanish
SELECT u.name
FROM user_language ul JOIN languages l
ON ul.language_id = l.id JOIN users u
ON ul.user_id = u.id
GROUP BY u.id
HAVING MAX(l.language = 'french') +
MAX(l.language = 'spanish') > 0;
输出:<>之前|名称||------||约翰||杰克||吉尔|之前-- Speaks any language French OR Spanish BUT NOT English
SELECT u.name
FROM user_language ul JOIN languages l
ON ul.language_id = l.id JOIN users u
ON ul.user_id = u.id
GROUP BY u.id
HAVING MAX(l.language = 'french') +
MAX(l.language = 'spanish') > 0
AND MAX(l.language = 'english') = 0;
输出:<>之前|名称||------||杰克|之前-- Speaks any language but English
SELECT u.name
FROM user_language ul JOIN languages l
ON ul.language_id = l.id JOIN users u
ON ul.user_id = u.id
GROUP BY u.id
HAVING MAX(l.language = 'english') = 0;
输出:<>之前|名称||------||杰克|之前-- What languages does Jack speak
SELECT l.language
FROM user_language ul JOIN languages l
ON ul.language_id = l.id JOIN users u
ON ul.user_id = u.id
WHERE u.name = 'Jack';
输出:<>之前|语言||----------||法语||西班牙语|之前-- How many languages do users speak
SELECT u.name, COUNT(*) no_of_languages
FROM users u LEFT JOIN user_language ul
ON u.id = ul.user_id
GROUP BY u.id;
输出:<>之前| name | no_of_languages ||------|-----------------|【约翰| 3】[au:]杰克[2[au:]吉尔[2之前-- How many users do speak a particular language
SELECT l.language, COUNT(*) no_of_users
FROM languages l LEFT JOIN user_language ul
ON l.id = ul.language_id
GROUP BY l.id;
输出:<>之前| language | no_of_users ||----------|-------------||英语| 2 |法语| 2 ||西班牙语| 3 |之前现在在实际应用程序中,您很可能不会使用语言或用户名,而是处理来自UI的id(下拉框或其他)。因此,您将能够从等式中消除一个连接,您的查询将看起来像这样
-- Speaks both French AND Spanish with Ids
SELECT u.name
FROM user_language ul JOIN users u
ON ul.user_id = u.id
GROUP BY u.id
HAVING MAX(ul.language_id = 2) = 1
AND MAX(ul.language_id = 3) = 1;
这里是SQLFiddle demo