PHP-创建一个新文件,其中包含file1中的所有行,而不包含file2中的任何行的文本


PHP - Create new file containing all lines from file1 that do not contain any of the text from lines in file2

我在StackExchange上读了很多帖子,但找不到我需要的确切内容。注意:这不仅仅是为了删除重复项。我需要浏览File1.csv并创建一个新文件-Results.csv,其中包含的每一行都不包含File2.txt中的一行。

File1.csv包含个人详细信息和电子邮件地址,每行1个:

"mr","Happy","Man","mrhappy@example.com"
"mr","Sad","Man","mrsad@example.com"
"mr","Grumpy","Man","mrgrumpy@example.com"
"mr","Strong","Man","mrstrong@example.com"

File2.txt包含电子邮件地址,每行1个:

mrhappy@example.com
mrsomeoneelse@example.com
mrsomeoneelse2@example.com

预期结果:Results.csv应包含:

"mr","Sad","Man","mrsad@example.com"
"mr","Grumpy","Man","mrgrumpy@example.com"
"mr","Strong","Man","mrstrong@example.com"

令人困惑的是,当File2.txt包含一行时,我的代码可以正常工作。但当它包含多行时,Results.txt包含File1.csv中的所有行(包括应该删除的行),并多次重复这些行(与File2.txt中的行一样多)。

我的代码:

<?php
$to_be_searched = "File1.csv";
$items_to_catch = file("File2.txt");
// create empty array to store lines we want to keep - i.e. lines that dont contain emails we're checking for
$good_lines = array();
// open $to_be_searched
$handle = fopen($to_be_searched, "r");
if ($handle) {
  // go line by line until end of file
  while (($line = fgets($handle)) !== false) {
    // check if line contains any items from $items_to_catch
    foreach($items_to_catch as $key => $value) {
      if(strpos($line, $value) === false) {
        // email wasn't found on the line so we want this line in the results file, therefore add to $good_lines array
        $good_lines[] = $line;
      } 
    }
  }
  fclose($handle);
} else {
  echo "Couldn't open " . $to_be_searched;
  exit();
}
// write $array_of_good_lines into new file
$new_file = "Results.csv";
foreach($good_lines as $key => $value) {
    file_put_contents($new_file, $value, FILE_APPEND | LOCK_EX);
}
?>

我做错了什么?

它目前不起作用,因为在foreach中,您将多次向$good_lines添加同一行。

要解决此问题,可以在循环中添加一个标志变量。

while (($line = fgets($handle)) !== false) {
    // Declare our flag variable as false by default
    $found = false;
    // Loop through each item to see if the email has been found
    foreach($items_to_catch as $key => $value) {
        // If the email was found, stop looping in the second file
        if(strpos($line, $value) !== false){
            $found = true;
            break;
        } 
    }
    // If the email was not found in the second file, add it to the good_lines array
    if(!$found)
        $good_lines[] = $line;
}

更新

除了循环之外,在读取File2.txt时还有另一个问题,因为它将换行符添加到字符串中,因此,当稍后将字符串与strpos进行比较时,它不起作用。解决方法:

$items_to_catch = file("File2.txt", FILE_IGNORE_NEW_LINES);

这是$items_to_catch的var_dump,不带标志:

array (size=3)
    0 => string 'mrhappy@example.com
    ' (length=20)
    1 => string 'mrsomeoneelse@example.com
    ' (length=26)
    2 => string 'mrsomeoneelse2@example.com
    ' (length=27)

这是$items_to_catch的var_dump,其标志为:

array (size=3)
    0 => string 'mrhappy@example.com' (length=19)
    1 => string 'mrsomeoneelse@example.com' (length=25)
    2 => string 'mrsomeoneelse2@example.com' (length=26)

请注意每个电子邮件中的额外字符,即换行符。

file()返回文件的每一行,包括结束的终端行。如果你使用Symfony的VarDumper组件查看$items_to_catch,你会发现它看起来像:

array:3 [
   0 => "mrhappy@example.com'n"
   1 => "mrsomeoneelse@example.com'n"
   2 => "mrsomeoneelse2@example.com'n"
]

这不是您想要的,因为您稍后的比较不包括终点线结束。顺便说一句,Symfony的VarDumper组件比print_rvar_dump好几个数量级:我强烈建议将其组合到您的项目中。

因此,用修剪掉新的终端线路

$items_to_catch = array_map('trim', file('File2.txt'));

一个最小的工作示例:

$excludedLinesWithTheseEmails = array_map('trim', file('File2.txt'));
$out = fopen('Results.csv', 'w') or die('Cannot open Results.csv');
$in = fopen('File1.csv', 'r') or die('Cannot open File1.csv');
while (false !== ($row = fgetcsv($in))) {
    if (! in_array($row[3], $excludedLinesWithTheseEmails)) {
        fputcsv($out, $row);
    }
}
fclose($out);
fclose($in);