如何使用PHP从CSV文件中读取多字节字符


How to read multibyte characters from a CSV file using PHP

我有一个CSV文件,其中混合了英文和中文字符(它是从Mozilla Thunderbird电子邮件程序导出的联系人列表)。我正在尝试创建一个函数,可以从这个文件中提取信息。函数fgetcsv()似乎不支持多字节字符。由于我运行的是PHP5.2,所以我无法访问str_getcsv()。

尽管上述情况涉及英语和汉语,但我正在寻找一种适用于任何语言的解决方案。

现在,我有一个函数namecards_import_str_getcsv()作为我的CSV解析函数,它试图模仿str_getcsv()。

function namecards_import_str_getcsv($input, $delimiter = ',', $enclosure = '"', $escape = '''', $eol = ''n') {
  if (!function_exists('str_getcsv')) {
    if (is_string($input) && !empty($input)) {
      $output = array();
      $tmp    = preg_split("/".$eol."/",$input);
      if (is_array($tmp) && !empty($tmp)) {
        while (list($line_num, $line) = each($tmp)) {
          if (preg_match("/" . $escape . $enclosure . "/", $line)) {
            while ($strlen = strlen($line)) {
              $pos_delimiter = strpos($line, $delimiter);
              $pos_enclosure_start = strpos($line, $enclosure);
              if (is_int($pos_delimiter) && is_int($pos_enclosure_start) && ($pos_enclosure_start < $pos_delimiter)) {
                $enclosed_str = substr($line, 1);
                $pos_enclosure_end = strpos($enclosed_str, $enclosure);
                $enclosed_str = substr($enclosed_str, 0, $pos_enclosure_end);
                $output[$line_num][] = $enclosed_str;
                $offset = $pos_enclosure_end + 3;
              } 
              else {
                if (empty($pos_delimiter) && empty($pos_enclosure_start)) {
                  $output[$line_num][] = substr($line, 0);
                  $offset = strlen($line);
                } 
                else {
                  $output[$line_num][] = substr($line,0,$pos_delimiter);
                  $offset = (!empty($pos_enclosure_start) && ($pos_enclosure_start < $pos_delimiter))? $pos_enclosure_start : $pos_delimiter + 1;
                }
              }
              $line = substr($line,$offset);
            }
          } 
          else {
            $line = preg_split("/" . $delimiter . "/", $line);
            /*
             * Validating against pesky extra line breaks creating false rows.
            */
            if (is_array($line) && !empty($line[0])) {
              $output[$line_num] = $line;
            }
          }
        }
        return $output;
      } 
      else {
        return false;
      }
    } 
    else {
      return false;
    }
  }
  else {
    return str_getcsv($input);
  }
}

此函数由以下代码行调用:

  $file = $_SESSION['namecards_csv_file'];
  if (file_exists($file->uri)) {
    // Load raw csv content into a handler variable.
    $handle = fopen($file->uri, "r");
    $cardinfo = array();
    while (($data = fgets($handle)) !== FALSE) {
      $data = namecards_import_str_getcsv($data);
      dsm($data);
      $cardinfo[] = $data[0];
    }
    fclose($handle);
  }
  else {
    drupal_set_message(t('CSV file doesn''t exist'), 'error');
  }

在结果数组中,通过显示为符号(例如"С��".

在此之前我尝试过的另一种方法是简单地使用fgetcsv()(参见下面的示例)。但是在这种情况下,返回数组的元素是空的。

$file = $_SESSION['namecards_csv_file'];
if (file_exists($file->uri)) {
  // Load raw csv content into a handler variable.
  $handle = fopen($file->uri, "r");
  $cardinfo = array();
  while (($data = fgetcsv($handle, 5000, ",")) !== FALSE) {
    dsm($data);
    $cardinfo[] = $data;
  }
  fclose($handle);
}
else {
  drupal_set_message(t('CSV file doesn''t exist'), 'error');
}

如果你感兴趣,这里是CSV文件的内容:

First Name,Last Name,Display Name,Nickname,Primary Email,Secondary Email,Screen Name,Work Phone,Home Phone,Fax Number,Pager Number,Mobile Number,Home Address,Home Address 2,Home City,Home State,Home ZipCode,Home Country,Work Address,Work Address 2,Work City,Work State,Work ZipCode,Work Country,Job Title,Department,Organization,Web Page 1,Web Page 2,Birth Year,Birth Month,Birth Day,Custom 1,Custom 2,Custom 3,Custom 4,Notes,
Ben,Gunn,Ben Gunn,Benny,ben1@asdf.com,ben2@asdf.com,,+94 (10) 11111111,+94 (10) 22222222,+94 (10) 33333333,,+94 44444444444,12 Benny Lane,,Beijing,Beijing,100028,China,13 asdfsdfs,,sdfsf,sdfsdf,134323,China,Manager,Sales,Benny Inc,,,,,,,,,,,
乔,康,乔 康,小康,,,,,,,,,,,,,,,北京市朝阳区,,,,,,,,,,,,,,,,,,,

只是把评论中的内容写出来作为答案:

fgetcsv是区域设置敏感的,因此请确保将setlocale设置为UTF-8区域设置。