从输入文件中筛选有用的数据


Filter useful data from an input file

我有一个相当大且非常混乱的数据文件,我希望从中过滤有用的数据。它的结构看起来像这样:

!bla bla
more bla
some useless data
something interesting
 something interesting
 something interesting
some useless data
something interesting
 something interesting
some useless data
bla bla

我的计划是用file_get_contents()读取文件,然后用str_replace()替换一些数据并将其用作标记。接下来,我尝试将无用的数据从文件的开头删除到marker1,然后从marker2删除到marker3,然后从marker4删除到文件的末尾,这样我只能在输出中获得有用的数据(目前我还不确定是否需要数据中的标记)。我尝试使用strstr(),但无法使其工作。

    !bla bla
    more bla
    some useless data
    ==marker1==
    something interesting
     something interesting
     something interesting
    ==marker2==
    some useless data
    ==marker3==
    something interesting
     something interesting
    ==marker4==
    some useless data
    bla bla

我将使用explode()将生成的有用数据传输到我的数据库中。

编辑:我就这样解决了。

preg_match('/(==marker1==)(.*?)(==marker2==)/s', $input, $marker1to2);
$marker1to2 = trim($marker1to2[2]); 
$marker1to2 = preg_replace('/something /', '==marker1== something ', $marker1to2, 1); 
echo $marker1to2;

您需要正则表达式:

$data = "!bla bla
more bla
some useless data
==marker1==
something interesting
 something interesting
 something interesting
==marker2==
some useless data
==marker3==
something interesting
 something interesting
==marker4==
some useless data
bla bla";
preg_match("/(==marker1==)(.*)(==marker2==)/s", $data, $marker1to2);
$marker1to2 = trim($marker1to2[2]);
preg_match("/(==marker3==)(.*)(==marker4==)/s", $data, $marker3to4);
$marker3to4 = trim($marker3to4[2]);
echo "Marker 1 to 2:'n$marker1to2'n'n";
echo "Marker 3 to 4:'n$marker3to4'n'n";

输出:

标记1至2:有趣的事有趣的事有趣的事标记3至4:有趣的事有趣的事