正则表达式帮助 - PHP / XML.


Regex Help - PHP / XML

我需要在管道之间分隔每个短语(|)
我已经有一些代码可以在开始时删除一些无用的垃圾,但不知道下一步该去哪里。

我的代码:

<style>
table, th, td {
border:1px solid black;
}
</style>
<?php
echo 'OK';
$xmlstr = file_get_contents('http://api.wolframalpha.com/v2/query?input=planes+seen+from+dallas&appid=2UJ62E-Q6RT3T89P8');

$queryresult = new SimpleXMLElement($xmlstr);
echo $queryresult->pod[0]->subpod[0]->plaintext . "<hr>"; //assumption / input
$work1 = $queryresult->pod[1]->subpod[0]->plaintext . "<hr>"; //result plaintext
$work2 = substr($work1, 19);
$work3 = utf8_decode($work2);
$work4 = str_replace(utf8_decode('Â'), '', $work3);
echo $work4;
//echo "<table >" . "<tr><th>Plane Name</th><th>Altitude</th><th>Angle</th></tr>" . "<tr><td>Sample</td></tr></table>"; //Late
echo "<br><br><h6>" . $queryresult . "</h6>";
?>



这是 XML 文件 注意:内容是<plaintext>

<queryresult success="true" error="false" numpods="3" datatypes="Flight" timedout="" timedoutpods="" timing="3.39" parsetiming="0.391" parsetimedout="false" recalculate="" id="MSPa1551i7b4400e01ci84e0000651c4h328e38277i" host="http://www4b.wolframalpha.com" server="7" related="http://www4b.wolframalpha.com/api/v2/relatedQueries.jsp?id=MSPa1561i7b4400e01ci84e00005a6da8807558i27f&s=7" version="2.6"><pod title="Input interpretation" scanner="Identity" id="Input" position="100" error="false" numsubpods="1"><subpod title=""><plaintext>flights seen from Dallas, Texas</plaintext><img src="http://www4b.wolframalpha.com/Calculate/MSP/MSP1571i7b4400e01ci84e0000145h82ai4h4a9hae?MSPStoreType=image/gif&s=7" alt="flights seen from Dallas, Texas" title="flights seen from Dallas, Texas" width="199" height="18"/></subpod></pod><pod title="Result" scanner="Data" id="Result" position="200" error="false" numsubpods="1" primary="true"><subpod title=""><plaintext> | altitude | angle
ENY flight 3056 | 11500 feet | 23° up
Republic Airlines flight 4302 | 30000 feet | 11° up
Southwest Airlines flight 4966 | 7200 feet | 10° up
Delta Air Lines flight 1115 | 32000 feet | 6.9° up
NetJets flight 579 | 5900 feet | 6.9° up
 | type | slant distance
ENY flight 3056 | Embraer ERJ-145 | 5.5 miles NNW
Republic Airlines flight 4302 | Embraer 175 | 30 miles SE
Southwest Airlines flight 4966 | Boeing 737-800 | 7.7 miles N
Delta Air Lines flight 1115 | Boeing 757-200 | 48 miles ENE
NetJets flight 579 | Cessna Citation Excel | 9.3 miles ESE
(locations based on projections of delayed data)
(angles with respect to nominal horizon)</plaintext><img src="http://www4b.wolframalpha.com/Calculate/MSP/MSP1581i7b4400e01ci84e00003ea5887iecaa0i2a?MSPStoreType=image/gif&s=7" alt=" | altitude | angle ENY flight 3056 | 11500 feet | 23° up Republic Airlines flight 4302 | 30000 feet | 11° up Southwest Airlines flight 4966 | 7200 feet | 10° up Delta Air Lines flight 1115 | 32000 feet | 6.9° up NetJets flight 579 | 5900 feet | 6.9° up  | type | slant distance ENY flight 3056 | Embraer ERJ-145 | 5.5 miles NNW Republic Airlines flight 4302 | Embraer 175 | 30 miles SE Southwest Airlines flight 4966 | Boeing 737-800 | 7.7 miles N Delta Air Lines flight 1115 | Boeing 757-200 | 48 miles ENE NetJets flight 579 | Cessna Citation Excel | 9.3 miles ESE (locations based on projections of delayed data) (angles with respect to nominal horizon)" title=" | altitude | angle ENY flight 3056 | 11500 feet | 23° up Republic Airlines flight 4302 | 30000 feet | 11° up Southwest Airlines flight 4966 | 7200 feet | 10° up Delta Air Lines flight 1115 | 32000 feet | 6.9° up NetJets flight 579 | 5900 feet | 6.9° up  | type | slant distance ENY flight 3056 | Embraer ERJ-145 | 5.5 miles NNW Republic Airlines flight 4302 | Embraer 175 | 30 miles SE Southwest Airlines flight 4966 | Boeing 737-800 | 7.7 miles N Delta Air Lines flight 1115 | Boeing 757-200 | 48 miles ENE NetJets flight 579 | Cessna Citation Excel | 9.3 miles ESE (locations based on projections of delayed data) (angles with respect to nominal horizon)" width="496" height="456"/></subpod><states count="2"><state name="More" input="Result__More"/><state name="Show metric" input="Result__Show metric"/></states></pod><pod title="Sky map" scanner="Data" id="SkyMap:FlightData" position="300" error="false" numsubpods="1"><subpod title=""><plaintext/><img src="http://www4b.wolframalpha.com/Calculate/MSP/MSP1591i7b4400e01ci84e000035g7ag9dd130609a?MSPStoreType=image/gif&s=7" alt="" title="" width="400" height="400"/></subpod></pod><assumptions count="1"><assumption type="SubCategory" word="dallas" template="Assuming ${desc1}. Use ${desc2} instead" count="9"><value name="{Dallas, Texas, UnitedStates}" desc="Dallas (Texas, USA)" input="*DPClash.CityE.dallas-_**Dallas.Texas.UnitedStates--"/><value name="{Dallas, Georgia, UnitedStates}" desc="Dallas (Georgia, USA)" input="*DPClash.CityE.dallas-_**Dallas.Georgia.UnitedStates--"/><value name="{Dallas, Oregon, UnitedStates}" desc="Dallas (Oregon, USA)" input="*DPClash.CityE.dallas-_**Dallas.Oregon.UnitedStates--"/><value name="{Dallas, NorthCarolina, UnitedStates}" desc="Dallas (North Carolina, USA)" input="*DPClash.CityE.dallas-_**Dallas.NorthCarolina.UnitedStates--"/><value name="{Dallas, Pennsylvania, UnitedStates}" desc="Dallas (Pennsylvania, USA)" input="*DPClash.CityE.dallas-_**Dallas.Pennsylvania.UnitedStates--"/><value name="{Dallas, BritishColumbia, Canada}" desc="Dallas (Canada)" input="*DPClash.CityE.dallas-_**Dallas.BritishColumbia.Canada--"/><value name="{Dallas, Wisconsin, UnitedStates}" desc="Dallas (Wisconsin, USA)" input="*DPClash.CityE.dallas-_**Dallas.Wisconsin.UnitedStates--"/><value name="{Dallas, Maine, UnitedStates}" desc="Dallas (Maine, USA)" input="*DPClash.CityE.dallas-_**Dallas.Maine.UnitedStates--"/><value name="{Dallas, SouthDakota, UnitedStates}" desc="Dallas (South Dakota, USA)" input="*DPClash.CityE.dallas-_**Dallas.SouthDakota.UnitedStates--"/></assumption></assumptions><sources count="2"><source url="http://www.wolframalpha.com/sources/CityDataSourceInformationNotes.html" text="City data"/><source url="http://www.wolframalpha.com/sources/FlightDataSourceInformationNotes.html" text="Flight data"/></sources></queryresult>

[编辑]似乎实体问题只是由于@ThW注意到的复制/粘贴/拼写错误造成的。因此,要走的方法是使用 XMLReader 来提取数据。

优点:它是PHP中最快的XML解析器(因为它不需要构建DOM树,它是一个基于事件的解析器)。即使它比正则表达式方法慢一点,它也使用更少的内存并且不需要加载完整的文档。

提取字符串:

$url = 'http://api.wolframalpha.com/v2/query?input=planes+seen+from+dallas&appid=2UJ62E-Q6RT3T89P8';
$parser = new XMLReader;
$parser->open($url);
while ($parser->read()) {
    if ($parser->nodeType === XMLReader::ELEMENT) {
        while ($parser->name === 'pod' && $parser->getAttribute('title') !== 'Result')
            $parser->next('pod'); // jump to the next pod node 
        if ($parser->name === 'plaintext') {
            $str = $parser->readString();
            $parser->close();    
            break;
        }
    }
}

然后,您可以从提取的字符串中生成结果,例如通过外部测试版生成多维数组:

$lines = explode("'n", $str);
$result = array();
foreach ($lines as $line) {
    $fields = explode(' | ', $line);
    $flight = array_shift($fields);
    if ($flight === '')
        $cols = $fields;
    elseif (isset($fields[1])) {
        $result[$flight][$cols[0]] = $fields[0];
        $result[$flight][$cols[1]] = $fields[1];
    } 
}
print_r($result);

[旧答案]不幸的是,实体(属性中带有&字符)存在问题,导致XMLReader失败(这是您最好的方法)。

所以,一个快速的肮脏方式:

$pattern = '~title="Result".*?<plaintext>'K[^<]+~s';
if (preg_match($pattern, $xml, $m)) {
    $result = array_map("ltrim", preg_split('~['n|]'s*~', $m[0], -1, PREG_SPLIT_NO_EMPTY));
    print_r($result);
}

或者为了提高效率,您可以将模式更改为:

~title="Result"(?:[^<]+|<(?!plaintext))*+<plaintext>'K[^<]+~s