在 PHP 中从这个复杂的字符串中提取有意义的数据


Extracting meaningful data from this complicated string in PHP

我正在为我的PHP应用程序接收一些结构化数据,但格式有些不可预测且难以处理。我对数据的初始格式没有发言权。我得到的是一个字符串(下面给出的示例)。

[9484,'Víctor Valdés',8,[[['accurate_pass',[15]],['touches',[42]],['saves',[4]],['total_pass',[24]],['good_high_claim',[2]],['formation_place',[1]]]],1,'GK',1,0,0,'GK',31,183,78],[1320,'Carles Puyol',7.76,[[['accurate_pass',[50]],['touches',[75]],['aerial_won',[3]],['total_pass',[55]],['total_tackle',[1]],['formation_place',[6]]]],2,'DC',5,0,0,'D(CLR)',35,178,80],[5780,'Dani Alves',8.21,[[['accurate_pass',[58]],['touches',[99]],['total_scoring_att',[1]],['total_pass',[66]],['total_tackle',[6]],['aerial_lost',[1]],['fouls',[4]],['formation_place',[2]]]],2,'DR',22,0,0,'D(CR)',30,173,64],[83686,'Marc Bartra',8.31,[[['accurate_pass',[64]],['touches',[88]],['won_contest',[1]],['total_scoring_att',[1]],['aerial_won',[1]],['total_pass',[66]],['total_tackle',[5]],['aerial_lost',[1]],['fouls',[1]],['formation_place',[5]]]],2,'DC',15,0,0,'D(C)',22,181,70],[13471,'Adriano',6.72,[[['accurate_pass',[16]],['touches',[28]],['aerial_won',[2]],['total_pass',[18]],['total_tackle',[1]],['formation_place',[3]]]],2,'DL',21,1,31,'D(CLR),M(LR)',29,172,67]

以上是5名足球运动员的数据。这是我需要得到的:

[9484,'Víctor Valdés',8,[[['accurate_pass',[15]],['touches',[42]],['saves',[4]],['total_pass',[24]],['good_high_claim',[2]],['formation_place',[1]]]],1,'GK',1,0,0,'GK',31,183,78]
[1320,'Carles Puyol',7.76,[[['accurate_pass',[50]],['touches',[75]],['aerial_won',[3]],['total_pass',[55]],['total_tackle',[1]],['formation_place',[6]]]],2,'DC',5,0,0,'D(CLR)',35,178,80]
[5780,'Dani Alves',8.21,[[['accurate_pass',[58]],['touches',[99]],['total_scoring_att',[1]],['total_pass',[66]],['total_tackle',[6]],['aerial_lost',[1]],['fouls',[4]],['formation_place',[2]]]],2,'DR',22,0,0,'D(CR)',30,173,64]
[83686,'Marc Bartra',8.31,[[['accurate_pass',[64]],['touches',[88]],['won_contest',[1]],['total_scoring_att',[1]],['aerial_won',[1]],['total_pass',[66]],['total_tackle',[5]],['aerial_lost',[1]],['fouls',[1]],['formation_place',[5]]]],2,'DC',15,0,0,'D(C)',22,181,70]
[13471,'Adriano',6.72,[[['accurate_pass',[16]],['touches',[28]],['aerial_won',[2]],['total_pass',[18]],['total_tackle',[1]],['formation_place',[3]]]],2,'DL',21,1,31,'D(CLR),M(LR)',29,172,67]

现在,我在上面的例子中手动完成的工作需要用 PHP 可靠地完成。如您所见,每个玩家都有一组数据。为了将大字符串拆分为单个玩家,我不能只用"],["来分解它,因为该子字符串在每个玩家的数据中出现的频率也是不可预测的。

每个球员都有一定数量的统计数据(accurate_pass、触球等),但他们并不都有相同的统计数据。例如,玩家 #1 有"保存",而其他人没有。玩家 #4 有"won_contest",其他人没有。没有办法知道谁将拥有哪些统计数据。这意味着我不能只计算逗号,直到新玩家或类似的东西。

每个玩家的名字前面都有一个数字,但该数字具有不可预测的数字数,并且无法将其与字符串中可能出现的其他数字区分开来。

我认为所有玩家经常出现的是最后一点:在最后一个闭括号之前,总是有 3 个整数除以逗号。这种类型的子字符串(INT,INT,INT])似乎不会出现在任何其他情况下。也许这可能有一些用处?

一个"困难"的方法是括号计数(在 PHP 中不太常见,在文本解析语言中更常见)......

<?php
$str = "[9484,'Víctor Valdés',8,[[['accurate_pass',[15]],['touches',[42]],['saves',[4]],['total_pass',[24]],['good_high_claim',[2]],['formation_place',[1]]]],1,'GK',1,0,0,'GK',31,183,78],[1320,'Carles Puyol',7.76,[[['accurate_pass',[50]],['touches',[75]],['aerial_won',[3]],['total_pass',[55]],['total_tackle',[1]],['formation_place',[6]]]],2,'DC',5,0,0,'D(CLR)',35,178,80],[5780,'Dani Alves',8.21,[[['accurate_pass',[58]],['touches',[99]],['total_scoring_att',[1]],['total_pass',[66]],['total_tackle',[6]],['aerial_lost',[1]],['fouls',[4]],['formation_place',[2]]]],2,'DR',22,0,0,'D(CR)',30,173,64],[83686,'Marc Bartra',8.31,[[['accurate_pass',[64]],['touches',[88]],['won_contest',[1]],['total_scoring_att',[1]],['aerial_won',[1]],['total_pass',[66]],['total_tackle',[5]],['aerial_lost',[1]],['fouls',[1]],['formation_place',[5]]]],2,'DC',15,0,0,'D(C)',22,181,70],[13471,'Adriano',6.72,[[['accurate_pass',[16]],['touches',[28]],['aerial_won',[2]],['total_pass',[18]],['total_tackle',[1]],['formation_place',[3]]]],2,'DL',21,1,31,'D(CLR),M(LR)',29,172,67]";
$line = ',';
$paren_count = 0;
$lines = array();
for($i=0; $i<strlen($str); $i++)
{
    $line.= $str{$i};
    if($str{$i} == '[') $paren_count++;
    elseif($str{$i} == ']')
    {
        $paren_count--;
        if($paren_count == 0)
        {
            $lines[] = substr($line,1);
            $line = '';
        }
    }
}
print_r($lines);
?>

看起来@Boundless答案是正确的,您可以使用json_decode,但是您需要先对获得的字符串执行一些操作,这似乎也是一个有效的json格式字符串。

这对我有用:

<?php
$str = "[9484,'Víctor Valdés',8,[[['accurate_pass',[15]],['touches',[42]],['saves',[4]],['total_pass',[24]],['good_high_claim',[2]],['formation_place',[1]]]],1,'GK',1,0,0,'GK',31,183,78],[1320,'Carles Puyol',7.76,[[['accurate_pass',[50]],['touches',[75]],['aerial_won',[3]],['total_pass',[55]],['total_tackle',[1]],['formation_place',[6]]]],2,'DC',5,0,0,'D(CLR)',35,178,80],[5780,'Dani Alves',8.21,[[['accurate_pass',[58]],['touches',[99]],['total_scoring_att',[1]],['total_pass',[66]],['total_tackle',[6]],['aerial_lost',[1]],['fouls',[4]],['formation_place',[2]]]],2,'DR',22,0,0,'D(CR)',30,173,64],[83686,'Marc Bartra',8.31,[[['accurate_pass',[64]],['touches',[88]],['won_contest',[1]],['total_scoring_att',[1]],['aerial_won',[1]],['total_pass',[66]],['total_tackle',[5]],['aerial_lost',[1]],['fouls',[1]],['formation_place',[5]]]],2,'DC',15,0,0,'D(C)',22,181,70],[13471,'Adriano',6.72,[[['accurate_pass',[16]],['touches',[28]],['aerial_won',[2]],['total_pass',[18]],['total_tackle',[1]],['formation_place',[3]]]],2,'DL',21,1,31,'D(CLR),M(LR)',29,172,67]";
$str = '[' . $str . ']';
$str = str_replace('''','"', $str);

//convert string to array
$arr = json_decode($str);
//now it's a php array so you can access any value
//echo '<pre>';
//print_r( $arr );
//echo '</pre>';
echo $arr [0][1]; //prints "Victor Valdes"
?>

您的字符串看起来像 JSON,但它不是有效的 JSON,因此json_decode()不起作用。

通过将字符串包装在一对[]中并将单引号替换为双引号,可以将您的特定情况转换为有效的 JSON:

$string = str_replace("'", '"', $your_string);
var_dump(json_decode('[' . $string . ']'));

请参阅此示例。

当然,最好的解决方案是确保提供有效的JSON,因为如果您的文本字符串包含例如双引号,这将很容易中断。

尝试解析为 json,然后提取您想要的内容。 假设数据以 4 个块的形式出现,您可以尝试:

$arr = json_decode($str);
for($i = 0; $i < count($arr) - 3; $i += 4)
{
  $arr[] = new array($arr[$i], $arr[$i + 1], $arr[$i + 2], $arr[$i + 3]);
}

为什么不循环计算[?这是一个快速的未经测试的循环,可以帮助您入门。

$output = array('');
$brackets = 0;
$index = 0;
foreach (str_split($input) as $ch) {
    if ($ch == '[') {
        $brackets++;
    }
    $output[$index] .= $ch;
    if ($ch == ']') {
        $brackets--;
        if ($brackets === 0) {
            $index++;
            $output[$index] = '';
        }
    }
}

虽然不是很优雅...