如何在csv文件中检查行中的特定值,然后将其用作标题行,如果存在重复,则删除它们


How would I check the csv file for a specifc value in the row and then use that as the header row AND if there is duplicates remove them?

我使用以下函数将csv文件导入mysql:

function csv_2_mysql($source_file, $target_table, $max_line_length=10000) {
    if (($handle = fopen("$source_file", "r")) !== FALSE) {
        $columns = fgetcsv($handle, $max_line_length, ",");
        foreach ($columns as &$column) {
            $column = preg_replace('/[^a-z0-9]/i', '', $column);
        }
        $insert_query_prefix = "INSERT INTO $target_table (".join(",",$columns).")'nVALUES";
        while (($data = fgetcsv($handle, $max_line_length, ",")) !== FALSE) { 
         while (count($data)<count($columns))
                array_push($data, NULL);
            $query = "$insert_query_prefix (".join(",",quote_all_array($data)).");";
            mysql_query($query);
        }
        fclose($handle);
    }
}
function quote_all_array($values) {
    foreach ($values as $key=>$value)
        if (is_array($value))
            $values[$key] = quote_all_array($value);
        else
            $values[$key] = quote_all($value);
        return $values;
}
function quote_all($value) {
    if (is_null($value))
        return "NULL";
    $value = "'" . mysql_real_escape_string($value) . "'";
    return $value;
}

问题是,有时由于在源位置剪切和合并csv文件,标题不在第一行,因此例如,它可能最终看起来像这样:

value1,value2,value3,value4
value1,value2,value3,value4
value1,value2,value3,value4
header1,header2,header3,header4
value1,value2,value3,value4
value1,value2,value3,value4
value1,value2,value3,value4
value1,value2,value3,value4
header1,header2,header3,header4
value1,value2,value3,value4
value1,value2,value3,value4
value1,value2,value3,value4

value1是唯一的,所以我知道除了页眉之外,从来没有重复的行。如何调整函数,以便删除重复的标题行(如果存在),并确保剩余的标题行用于$columns?我只需要手动设置列值,除了每个csv可能有不同数量的列(除了header1和value1,它们总是存在,因为它是一个唯一的时间戳)。

更新:

好吧,我想明白了,但同时使用fopen和file_get_contents感觉不对。我会在大型csv上遇到这个问题吗?

function csv_2_mysql($source_file, $target_table, $uid, $nid, $max_line_length=10000) {
    if (($handle = fopen("$source_file", "r")) !== FALSE) {
      $handle2  = file_get_contents($source_file) or exit;
      $handle_row = explode("'n", $handle2);
      foreach ($handle_row as $key => $val) {
          $row_array = explode(',', $val);
          foreach ($row_array as $key => $val) {
              $row_array[$key] = trim(str_replace('"', '', $val));
              }
              if(!in_array('header1', $row_array)) {
                unset ($row_array);
              }
              else {
                $columns = $row_array;
              }
          }
        foreach ($columns as &$column) {
            $column = preg_replace('/[^a-z0-9]/i', '', $column);
        }
        $insert_query_prefix = "INSERT INTO $target_table (".join(",",$columns).")'nVALUES";
        while (($data = fgetcsv($handle, $max_line_length, ",")) !== FALSE) {
         while (count($data)<count($columns))
                array_push($data, NULL);
            $query = "$insert_query_prefix (".join(",",quote_all_array($data)).");";
            mysql_query($query);        
        }
        fclose($handle);
    }
}
function quote_all_array($values) {
    foreach ($values as $key=>$value)
        if (is_array($value))
            $values[$key] = quote_all_array($value);
        else
            $values[$key] = quote_all($value);
        return $values;
}
function quote_all($value) {
    if (is_null($value))
        return "NULL";
    $value = "'" . mysql_real_escape_string($value) . "'";
    return $value;
}

我想你可以用一个数组来推送第一列的值(因为你说它是唯一的),并检查它是否重复。如果是重复,请忽略该行并继续。