慢速PHP脚本,用于格式化、排序和从列表中删除重复项


Slow PHP script which formats, sorts, and removes duplicates from list

我有一个由jQuery.post调用的PHP脚本,它将提交一个电子邮件地址列表,对列表执行一些操作(删除重复、排序、格式化等),然后将每个操作存储到MySQL数据库中。

我遇到的问题是,在一个大列表上执行此操作需要很长时间。我正在测试一个包含15000个电子邮件地址的列表,在300秒内(5分钟),它只添加了大约5000个地址。

我的代码中是否有需要很长时间处理的内容?给你。我知道我做了很多格式化,但那只是因为一些电子邮件地址包含奇怪的字符等。

// form posts
$addresses = $_POST['email_addresses'];
// cleanse and format
$addresses = trim($addresses);
$addresses = trim($addresses, "'xC2'xA0");
$addresses = str_replace(" ", "", $addresses);
$addresses = preg_replace("/(^['r'n]*|['r'n]+)['s't]*['r'n]+/", "'n", $addresses);
$addresses = str_replace("'n", ",", $addresses);
$addresses = preg_replace('/[^('x20-'x7F)]*/','', $addresses); 
$addresses = strtolower($addresses);
$array_addresses = explode(",", $addresses);
// get unique values
$unique_addresses = array();
foreach($array_addresses as $key => $value) {
    if(filter_var($value, FILTER_VALIDATE_EMAIL)){ 
        $unique_addresses[$value] = $value;
    }
}
sort($unique_addresses);
foreach($unique_addresses as $arr) {
    if ($insert_addresses_stmt = $mysqli->prepare("INSERT INTO email_addresses (lid, email_addresses) VALUES (?, ?)")) {
        $insert_addresses_stmt->bind_param("ss", $new_lid, $arr);
        $insert_addresses_stmt->execute();
        $insert_addresses_stmt->close();
    }
}

想对此发表评论,但还不能。。无论如何,纯逻辑给了我一个线索,这样一个巨大的多重过滤/检查/替换的帖子会给服务器CPU/RAM带来一些问题,并超载PHP内存,这是我在大量查询中遇到的。。。所以让我们给它一些空气。

你可以试一试。做一个接一个(电子邮件)的工作,而不是一堆15000。等待大约30秒以完成此任务。

// form posts
$add = $_POST['email_addresses']; //15.000 emails in this I suppose
//do walk in a park work first
$add = trim($add, "'xC2'xA0");
//lets get emails one by one..
$addr = explode(",", $add);
unset($add); //flush sufficient data
  {
    while (list($key,$add) = each($addr))
    {
// cleanse and format 1by1
$addresses = str_replace(' ', '', $add);
$addresses = preg_replace("/(^['r'n]*|['r'n]+)['s't]*['r'n]+/", "'n", $addresses);
$addresses = str_replace("'n", ",", $addresses);
$addresses = preg_replace('/[^('x20-'x7F)]*/','', $addresses); 
//finally lower chars
$addresses = strtolower($addresses);
//check email
    if(filter_var($addresses, FILTER_VALIDATE_EMAIL)){ 
//do mysql query (first) if email already exists in your DB. if not...
-> insert this email to DB
   }

 }
}
//done

当然,这种"方法"并不是在DB条目之前使用sort()函数,因为这是1by1插入,而不是在1个DB行中插入15k串(其中sort()是必不可少的)。我想这是没有必要的,因为我们以后可以按顺序绘制数据。