PHP 的str_getcsv在制表符分隔的列表中中断,没有附件和单独的双引号


php's str_getcsv breaking on tab separated list with no enclosure and individual double quotes

我正在使用str_getcsv来解析从nosql查询返回的制表符分隔值,但是我遇到了一个问题,我发现的唯一解决方案是不合逻辑的。

这里有一些示例代码要演示(仅供参考,此处显示时似乎没有保留选项卡)...

$data = '0  16  Gruesome Public Executions In North Korea - 80 Killed       http://www.youtube.com/watch?v=Dtx30AQpcjw&feature=youtube_gdata        "North Korea staged gruesome public executions of 80 people this month, some for offenses as minor as watching South Korean entertainment videos or being fou...    1384357511  http://gdata.youtube.com/feeds/api/videos/Dtx30AQpcjw   0   The Young Turks                 1   2013-11-13 12:53:31 9ab8f5607183ed258f4f98bb80f947b4    35afc4001e1a50fb463dac32de1d19e7';
$data = str_getcsv($data,"'t",NULL);
echo '<pre>'.print_r($data,TRUE).'</pre>';

特别注意一列(以"朝鲜......"开头)的事实。实际上以双引号开头"但不以双引号结尾。这就是为什么我提供NULL作为第三个参数(外壳)来覆盖 defaut " 外壳值的原因。

结果如下:

Array
(
[0] => 0
[1] => 16
[2] => Gruesome Public Executions In North Korea - 80 Killed
[3] => http://www.youtube.com/watch?v=Dtx30AQpcjw&feature=youtube_gdata
[4] => 
[5] => North Korea staged gruesome public executions of 80 people this month, some for offenses as minor as watching South Korean entertainment videos or being fou...  1384357511  http://gdata.youtube.com/feeds/api/videos/Dtx30AQpcjw   0   The Young Turks                 1   2013-11-13 12:53:31 9ab8f5607183ed258f4f98bb80f947b4    35afc4001e1a50fb463dac32de1d19e7
)

如您所见,报价正在破坏函数。 从逻辑上讲,我认为我可以使用 NULL 或和空字符串 '' 作为str_getcsv(外壳)的第三个参数,但两者都不起作用?!?!

我唯一可以用来让str_getcsv正常工作的就是一个空格字符' '。 这对我来说没有任何意义,因为没有一列有空格开始和/或结束它们。

$data = '0  16  Gruesome Public Executions In North Korea - 80 Killed       http://www.youtube.com/watch?v=Dtx30AQpcjw&feature=youtube_gdata        "North Korea staged gruesome public executions of 80 people this month, some for offenses as minor as watching South Korean entertainment videos or being fou...    1384357511  http://gdata.youtube.com/feeds/api/videos/Dtx30AQpcjw   0   The Young Turks                 1   2013-11-13 12:53:31 9ab8f5607183ed258f4f98bb80f947b4    35afc4001e1a50fb463dac32de1d19e7';
$data = str_getcsv($data,"'t",' ');
echo '<pre>'.print_r($data,TRUE).'</pre>';

现在的结果是:

Array
(
[0] => 0
[1] => 16
[2] => Gruesome Public Executions In North Korea - 80 Killed
[3] => http://www.youtube.com/watch?v=Dtx30AQpcjw&feature=youtube_gdata
[4] => 
[5] => "North Korea staged gruesome public executions of 80 people this month, some for offenses as minor as watching South Korean entertainment videos or being fou...
[6] => 1384357511
[7] => http://gdata.youtube.com/feeds/api/videos/Dtx30AQpcjw
[8] => 0
[9] => The Young Turks
[10] => 
[11] => 
[12] => 
[13] => 
[14] => 1
[15] => 2013-11-13 12:53:31
[16] => 9ab8f5607183ed258f4f98bb80f947b4
[17] => 35afc4001e1a50fb463dac32de1d19e7
)

所以我的问题是,为什么它适用于空格作为外壳,而不是 NULL 或空字符串? 这有什么影响吗?

更新1:这似乎减少了我在日志中收到的错误数量,但它并没有消除它们,所以我猜测我用作外壳的副作用引起了意想不到的副作用,尽管没有上一个问题那么麻烦。 但我的问题保持不变,为什么我不能使用 NULL 或空白空间作为外壳,其次,有没有更好的方法来处理/执行此操作?

只是为了给出一个起点...

您可能需要考虑使用字符串本身,而不是使用类似 TABS 的函数。

但请注意,如果您选择这条路线,至少存在一些陷阱(可能是您唯一的选择):

  • 转义字符的处理
  • 数据中的换行符(不用作分隔符)

如果您知道除了结束字段的chr(0)之外,您的字符串中没有任何其他,并且除了分隔行的换行符之外没有任何换行符,那么您可能可以这样做:

$data = explode("'n", $the_whole_csv_string_block);
foreach ($data as $line)
{
    $arr = explode("'t", $line);
    // $arr[0] will have every first field of every row, $arr[1] the 2nd, ...
    // Usually this is what I want when working with a csv file
    // But if you rather want a multidimensional array, you can simply add 
    // $arr to a different array and after this loop you are good to go.
}

否则,这只是您的一个起点,开始并根据自己的个人情况进行调整,希望对您有所帮助。

只需将CC_13用作外壳并逃脱:

$data = str_getcsv($data, "'t", chr(0), chr(0));