这个SQLite查询是否有效


Is this SQLite Query efficient?

我玩得很开心,编写了一个小访客计数器。 这是一个PHP5/SQLite3的混合。

制作了两个数据库表,一个用于访问者,一个用于命中。 结构和示例数据:

CREATE TABLE 'visitors' (
'id' INTEGER DEFAULT NULL PRIMARY KEY AUTOINCREMENT,
'ip' TEXT DEFAULT NULL,
'hash' TEXT DEFAULT NULL,
UNIQUE(ip)
);
INSERT INTO "visitors" ("id","ip","hash") VALUES ('1','1.2.3.4','f9702c362aa9f1b05002804e3a65280b');
INSERT INTO "visitors" ("id","ip","hash") VALUES ('2','1.2.3.5','43dc8b0a4773e45deab131957684867b');
INSERT INTO "visitors" ("id","ip","hash") VALUES ('3','1.2.3.6','9ae1c21fc74b2a3c1007edf679c3f144');
CREATE TABLE 'hits' (
'id' INTEGER DEFAULT NULL PRIMARY KEY AUTOINCREMENT,
'time' INTEGER DEFAULT NULL,
'visitor_id' INTEGER DEFAULT NULL,
'host' TEXT DEFAULT NULL,
'location' TEXT DEFAULT NULL
);
INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('1','1418219548','1','localhost','/some/path/example.php');
INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('2','1418219550','1','localhost','/some/path/example.php');
INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('3','1418219553','1','localhost','/some/path/example.php');
INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('4','1418219555','2','localhost','/some/path/example.php');
INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('5','1418219557','1','localhost','/some/path/example.php');
INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('6','1418219558','3','localhost','/some/path/example.php');

我现在想获取访问者数据,但仅限于那些在过去 30 秒内处于活动状态的人。 我需要以下数据作为输出,此处以用户 ID 1 为例:

$visitor = Array(
    [id] => 1
    [ip] => 1.2.3.4
    [hash] => f9702c362aa9f1b05002804e3a65280b
    [first_hit] => 1418219548
    [last_hit] => 1418219557
    [last_host] => localhost
    [last_location] => /some/path/example.php
    [total_hits] => 4
    [idle_since] => 11
)

我将通过当前的查询得到这个,一切都很好,但正如你所看到的,我需要很多子选择:

SELECT 
   visitors.id,
   visitors.ip,
   visitors.hash,
   (SELECT hits.time FROM hits WHERE hits.visitor_id = visitors.id ORDER BY hits.id ASC LIMIT 1) AS first_hit,
   (SELECT hits.time FROM hits WHERE hits.visitor_id = visitors.id ORDER BY hits.id DESC LIMIT 1) AS last_hit,
   (SELECT hits.host FROM hits WHERE hits.visitor_id = visitors.id ORDER BY hits.id DESC LIMIT 1) AS last_host,
   (SELECT hits.location FROM hits WHERE hits.visitor_id = visitors.id ORDER BY hits.id DESC LIMIT 1) AS last_location,
   (SELECT COUNT(hits.id) FROM hits WHERE hits.visitor_id = visitors.id) AS total_hits,
   (SELECT strftime('%s','now') - hits.time FROM hits WHERE hits.visitor_id = visitors.id ORDER BY hits.id DESC LIMIT 1) AS idle_since
FROM visitors
WHERE idle_since < 30
ORDER BY last_hit DESC

那么,这是否适合我的用例,或者您知道从这两个表中获取这些数据的更好方法吗? 我已经尝试了 JOINS,但无论我如何调整它,COUNT(( 都给了我错误的输出,例如用户 ID 1 只有一个总命中。

我想,如果我想正确使用 JOINS,我可能必须重新建模数据库。


更新:基于 AeroX 的答案,我已经构建了新查询。 它基本上只有一个小错误。 你不能在 WHERE 子句中使用 MAX((。 在分组后使用现在。我还使用解释和解释查询计划测试了旧的和新的。看起来好多了。谢谢你们!

SELECT
    V.id,
    V.ip,
    V.hash,
    MIN(H.time) AS first_hit,
    MAX(H.time) AS last_hit,
    strftime('%s','now') - MAX(H.time) AS idle_since,
    COUNT(H.id) AS total_hits,
    LH.host AS last_host,
    LH.location AS last_location
FROM visitors AS V
INNER JOIN hits AS H ON (V.id = H.visitor_id)
INNER JOIN (
    SELECT visitor_id, MAX(id) AS id
    FROM hits
    GROUP BY visitor_id
) AS L ON (V.id = L.visitor_id)
INNER JOIN hits AS LH ON (L.id = LH.id)
GROUP BY V.id, V.ip, V.hash, LH.host, LH.location
HAVING idle_since < 30
ORDER BY last_hit DESC
您可能

想要清理它,但这应该让您了解如何进行连接以及如何使用 GROUP BY 语句为每个访问者聚合命中表。 这应该比使用大量子查询更有效。

我已经在联接上添加了注释,以便您可以了解我为什么要制作它们。

SELECT 
   V.id,
   V.ip,
   V.hash,
  MIN(H.time) AS first_hit,
  MAX(H.time) AS last_hit,
  COUNT(H.id) AS total_hits,
  strftime('%s','now') - MAX(H.time) AS idle_since,
  LH.host AS last_host,
  LH.location AS last_location
FROM visitors AS V
-- Join hits table so we can calculate aggregates (MIN/MAX/COUNT)
INNER JOIN hits AS H ON (V.id = H.visitor_id)
-- Join a sub-query as a table which contains the most recent hit.id for each visitor.id
INNER JOIN (
  SELECT visitor_id, MAX(id) AS id
  FROM hits
  GROUP BY visitor_id
) AS L ON (V.id = L.visitor_id)
-- Use the most recent hit.id for each visitor.id to fetch that most recent row (for last_host/last_location)
INNER JOIN hits AS LH ON (L.id = LH.id)
GROUP BY V.id, V.ip, V.hash, LH.host, LH.location
HAVING idle_since < 30
ORDER BY last_hit DESC

衡量查询性能的最佳方法之一是使用 explain

从 sqlite

解释查询计划 SQL 命令用于获取高级别 SQLite 用于实现 特定的 SQL 查询。最重要的是,"解释查询计划"报告 查询使用数据库索引的方式。本文档是一个 理解和解释"解释查询计划"输出的指南。 背景信息另见:

Notes on the query optimizer.
How indexing works.
The next generation query planner. 

"解释查询计划"命令返回零行或多行,共四行 每列。列名称为"选择"、"顺序"、"发件人", "细节"。前三列包含一个整数值。决赛 列"详细信息"包含一个文本值,该值承载了大部分 有用的信息。

解释查询计划在 SELECT 语句上最有用,但也可能 与从数据库表中读取数据的其他语句一起出现 (例如,更新,删除,插入...选择(。

explain查询的一个示例如下:

EXPLAIN SELECT * FROM COMPANY WHERE Salary >= 20000;

http://www.tutorialspoint.com/sqlite/sqlite_explain.htm

下面是更复杂的使用示例。

如何分析 Sqlite 查询执行?