我正在尝试规范化一个表,以前的开发人员设计了一个包含管道分隔ID的列,这些ID链接到同一表中的其他行。
客户表
id | aliases (VARCHAR)
----------------------------
1 | |4|58|76
2 |
3 |
4 | |1|58|76
... |
58 | |1|4|76
... |
76 | |1|4|58
所以客户 1、4、58 和 76 都是彼此的"别名"。 客户 2 和 3 没有别名,因此该字段包含一个空字符串。
我想取消整个"别名"系统,并对数据进行规范化,以便我可以将这些其他客户全部映射到一条记录。因此,我希望客户 1、4、58 和 76 的相关表数据都仅映射到客户 1。
我想我会填充一个新表,稍后我可以加入并在其他表上执行更新。
联接表
id | customer_id | alias_id
-------------------------------
1 | 1 | 4
2 | 1 | 58
3 | 1 | 76
如何将第一个表中的数据转换为上述格式?如果这在纯SQL中绝对是一场噩梦,我将编写一个PHP脚本来尝试完成这项工作并插入数据。
当我开始回答这个问题时,我认为它会快速而简单,因为我曾经在SQL Server中做过非常相似的事情,但是证明翻译的概念迅速发展到这个完整的解决方案中。
从您的问题中不清楚的一个警告是,您是否有条件声明主 id 与别名 id。例如,此解决方案将允许 1 的别名为 4,4 的别名为 1,这与简化示例问题中提供的数据一致。
为了设置此示例的数据,我使用了以下结构:
CREATE TABLE notnormal_customers (
id INT NOT NULL PRIMARY KEY,
aliases VARCHAR(10)
);
INSERT INTO notnormal_customers (id,aliases)
VALUES
(1,'|4|58|76'),
(2,''),
(3,''),
(4,'|1|58|76'),
(58,'|1|4|76'),
(76,'|1|4|58');
首先,为了表示一个客户到多个别名的一对多关系,我创建了这个表:
CREATE TABLE customer_aliases (
primary_id INT NOT NULL,
alias_id INT NOT NULL,
FOREIGN KEY (primary_id) REFERENCES notnormal_customers(id),
FOREIGN KEY (alias_id) REFERENCES notnormal_customers(id),
/* clustered primary key prevents duplicates */
PRIMARY KEY (primary_id,alias_id)
)
最重要的是,我们将使用自定义SPLIT_STR
函数:
CREATE FUNCTION SPLIT_STR(
x VARCHAR(255),
delim VARCHAR(12),
pos INT
)
RETURNS VARCHAR(255)
RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(x, delim, pos),
LENGTH(SUBSTRING_INDEX(x, delim, pos -1)) + 1),
delim, '');
然后,我们将创建一个存储过程来完成所有工作。代码使用对源引用的注释进行批注。
DELIMITER $$
CREATE PROCEDURE normalize_customers()
BEGIN
DECLARE cust_id INT DEFAULT 0;
DECLARE al_id INT UNSIGNED DEFAULT 0;
DECLARE alias_str VARCHAR(10) DEFAULT '';
/* set the value of the string delimiter */
DECLARE string_delim CHAR(1) DEFAULT '|';
DECLARE count_aliases INT DEFAULT 0;
DECLARE i INT DEFAULT 1;
/*
use cursor to iterate through all customer records
http://burnignorance.com/mysql-tips/how-to-loop-through-a-result-set-in-mysql-strored-procedure/
*/
DECLARE done INT DEFAULT 0;
DECLARE cur CURSOR FOR
SELECT `id`, `aliases`
FROM `notnormal_customers`;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN cur;
read_loop: LOOP
/*
Fetch one record from CURSOR and set to customer id and alias string.
If not found then `done` will be set to 1 by continue handler.
*/
FETCH cur INTO cust_id, alias_str;
IF done THEN
/* If done set to 1 then exit the loop, else continue. */
LEAVE read_loop;
END IF;
/* skip to next record if no aliases */
IF alias_str = '' THEN
ITERATE read_loop;
END IF;
/*
get number of aliases
https://pisceansheart.wordpress.com/2008/04/15/count-occurrence-of-character-in-a-string-using-mysql/
*/
SET count_aliases = LENGTH(alias_str) - LENGTH(REPLACE(alias_str, string_delim, ''));
/* strip off the first pipe to make it compatible with our SPLIT_STR function */
SET alias_str = SUBSTR(alias_str, 2);
/*
iterate and get each alias from custom split string function
https://stackoverflow.com/questions/18304857/split-delimited-string-value-into-rows
*/
WHILE i <= count_aliases DO
/* get the next alias id */
SET al_id = CAST(SPLIT_STR(alias_str, string_delim, i) AS UNSIGNED);
/* REPLACE existing values instead of insert to prevent errors on primary key */
REPLACE INTO customer_aliases (primary_id,alias_id) VALUES (cust_id,al_id);
SET i = i+1;
END WHILE;
SET i = 1;
END LOOP;
CLOSE cur;
END$$
DELIMITER ;
最后,您可以通过调用以下命令简单地运行它:
CALL normalize_customers();
然后,您可以在控制台中检查数据:
mysql> select * from customer_aliases;
+------------+----------+
| primary_id | alias_id |
+------------+----------+
| 4 | 1 |
| 58 | 1 |
| 76 | 1 |
| 1 | 4 |
| 58 | 4 |
| 76 | 4 |
| 1 | 58 |
| 4 | 58 |
| 76 | 58 |
| 1 | 76 |
| 4 | 76 |
| 58 | 76 |
+------------+----------+
12 rows in set (0.00 sec)
更新 2(单查询解决方案(
假设别名列表始终排序,则只需一个查询即可获得结果:
CREATE TABLE aliases (
id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
customer_id INT UNSIGNED NOT NULL,
alias_id INT UNSIGNED NOT NULL
) AS
SELECT NULL AS id, c1.id AS customer_id, c2.id AS alias_id
FROM customers c1
JOIN customers c2
ON c2.aliases LIKE CONCAT('|', c1.id , '|%') -- c1.id is the first alias of c2.id
WHERE c1.id < (SUBSTRING(c1.aliases,2)+0) -- c1.id is smaller than the first alias of c2.id
如果对aliases
列编制索引,它也会快得多,因此范围搜索将支持 JOIN。
SQLfiddle
原答案
如果将管道替换为逗号,则可以使用 FIND_IN_SET 函数。
我将首先创建一个临时表(不需要是技术上的临时表(来存储逗号分隔的别名列表:
CREATE TABLE tmp (`id` int, `aliases` varchar(50));
INSERT INTO tmp(`id`, `aliases`)
SELECT id, REPLACE(aliases, '|', ',') AS aliases
FROM customers;
然后使用 JOINs ON 子句中的FIND_IN_SET填充规范化表:
CREATE TABLE aliases (`id` int, `customer_id` int, `alias_id` int) AS
SELECT t.id as customer_id, c.id AS alias_id
FROM tmp t
JOIN customers c ON find_in_set(c.id, t.aliases);
如果需要 - 删除具有较高customer_id的重复项(仅保留最低(:
DELETE FROM aliases
WHERE customer_id IN (SELECT * FROM(
SELECT DISTINCT a1.customer_id
FROM aliases a1
JOIN aliases a2
ON a2.customer_id = a1.alias_id
AND a1.customer_id = a2.alias_id
AND a1.customer_id > a1.alias_id
)derived);
如果需要 - 创建AUTO_INCREMENT id:
ALTER TABLE aliases ADD column id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;
aliases
表现在将如下所示:
| id | customer_id | alias_id |
|----|-------------|----------|
| 1 | 1 | 4 |
| 2 | 1 | 58 |
| 3 | 1 | 76 |
SQLfiddle
不要忘记定义正确的索引。
更新 1
您可以跳过创建临时表并使用 LIKE 而不是FIND_IN_SET填充aliases
表:
CREATE TABLE aliases (`customer_id` int, `alias_id` int) AS
SELECT c2.id as customer_id, c1.id AS alias_id
FROM customers c1
JOIN customers c2
ON CONCAT(c1.aliases, '|') LIKE CONCAT('%|', c2.id , '|%');
SQLfiddle
使用整数表 (0-9( - 尽管您可以使用(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3...etc.)
实现相同的目标...
SELECT DISTINCT id old_id /* the technique below inevitably creates duplicates. */
/* DISTINCT discards them. */
, SUBSTRING_INDEX(
SUBSTRING_INDEX(SUBSTR(aliases,2),'|',i+1) /* isolate text between */
,'|',-1) x /* each pipe and the next */
FROM customers
, ints /* do this for the first 10 pipes in each string */
ORDER
BY id,x+0 /* implicit CASTING */
+--------+------+
| old_id | x |
+--------+------+
| 1 | 4 |
| 1 | 58 |
| 1 | 76 |
| 2 | NULL |
| 3 | NULL |
| 4 | 1 |
| 4 | 58 |
| 4 | 76 |
| 58 | 1 |
| 58 | 4 |
| 58 | 76 |
| 76 | 1 |
| 76 | 4 |
| 76 | 58 |
+--------+------+
(编辑:添加了行内注释(