Xml 属性作为 php 中的数组索引


Xml attribute as array indices in php

我有一些XML,它在属性中包含很多信息,这里有一个小例子。

<?xml version="1.0" encoding="UTF-8"?>
 <collection xmlns="http://www.loc.gov/MARC21/slim">
  <record>
    <leader>04170npc a22003613u 4500</leader>
    <controlfield tag="001">vtls003932502</controlfield>
    <controlfield tag="003">WlAbNL</controlfield>
    <datafield tag="035" ind1=" " ind2=" ">
        <subfield code="a">(WlAbNL)1002</subfield>
    </datafield>
    <datafield tag="040" ind1=" " ind2=" ">
        <subfield code="a">WlAbNL</subfield>
        <subfield code="b">eng</subfield>
        <subfield code="c">WlAbNL</subfield>
    </datafield>
    <datafield tag="245" ind1="0" ind2="0">
        <subfield code="a">Scott Blair Collection,</subfield>
        <subfield code="f">1910 -</subfield>
    </datafield>
    <datafield tag="653" ind1=" " ind2=" ">
        <subfield code="a">rheology</subfield>
    </datafield>
  </record>
  <record>
    <leader>04229npc a22005893u 4500</leader>
    <controlfield tag="001">vtls003932503</controlfield>
    <datafield tag="035" ind1=" " ind2=" ">
        <subfield code="a">(WlAbNL)1004</subfield>
    </datafield>
    <datafield tag="040" ind1=" " ind2=" ">
       <subfield code="a">WlAbNL</subfield>
       <subfield code="b">eng</subfield>
       <subfield code="c">WlAbNL</subfield>
    </datafield>
    <datafield tag="245" ind1="0" ind2="0">
       <subfield code="a">Celtic Collection,</subfield>
       <subfield code="f">17th century -</subfield>
    </datafield>
    <datafield tag="653" ind1=" " ind2=" ">
        <subfield code="a">Scottish Gaelic language</subfield>
    </datafield>
 </record>
</collection>

目前我有一个 php 脚本,它只加载整个文档

$xml = simplexml_load_file("Mapping_coll_wales.xml");
$records = $xml->record;

这将创建一个看起来像这样的记录数组(我已将其削减为一条记录)

  SimpleXMLElement Object
(
[leader] => 04170npc a22003613u 4500
[controlfield] => Array
    (
        [0] => vtls003932502
        [1] => WlAbNL
    )
 [datafield] => Array
    (
        [0] => SimpleXMLElement Object
            (
                [@attributes] => Array
                    (
                        [tag] => 035
                        [ind1] =>  
                        [ind2] =>  
                    )
                [subfield] => (WlAbNL)1002
            )
        [1] => SimpleXMLElement Object
            (
                [@attributes] => Array
                    (
                        [tag] => 040
                        [ind1] =>  
                        [ind2] =>  
                    )
                [subfield] => Array
                    (
                        [0] => WlAbNL
                        [1] => eng
                        [2] => WlAbNL
                    )
            )
        [2] => SimpleXMLElement Object
            (
                [@attributes] => Array
                    (
                        [tag] => 245
                        [ind1] => 0
                        [ind2] => 0
                    )
                [subfield] => Array
                    (
                        [0] => Scott Blair Collection,
                        [1] => 1910 -
                    )
            )
        [3] => SimpleXMLElement Object
            (
                [@attributes] => Array
                    (
                        [tag] => 653
                        [ind1] =>  
                        [ind2] =>  
                    )
                [subfield] => rheology
            )
    )
)

目前,我只是通过假设它在数组中的位置并循环访问每条记录(大约有 500 条)来拉出我需要的字段

for ($i =0; $i <5; $i++) {
echo '<strong>Title</strong> = : ' . $records[$i]->datafield[2]->subfield . '<br />';
echo '<strong>tag</strong>  = :' . $records[$i]->datafield[3]->subfield . '<br />';

echo '<br />------------------------------------------------------------------------<br />';
}

但是,xml可能包含其他标签,所以我不想依赖它是索引2等的子字段。 理想情况下,我希望能够使用类似的东西来调用它

echo '<strong>Title</strong> = : ' . $records[$i]->datafield[245][a] . '<br />';

确定它相当简单,我只是缺少一些东西,但是能够将标签加载为数组索引或有某种方法直接通过其标签获取数据字段和通过其代码获取子字段会很好,因为这不会改变。

希望这是有道理的。

保罗

可以使用 XPath 来匹配满足特定条件的元素。

但是,由于使用的是命名空间节点,因此必须在要使用命名空间路径表达式xpath()的每个节点上注册命名空间。

请参阅下面的示例,该示例在循环中起作用。

$nsp = 'marc';
$nsuri = 'http://www.loc.gov/MARC21/slim';

$records = $xml->record;

foreach($records as $record) {
    $record->registerXPathNamespace($nsp, $nsuri);
    $datafields = $record->xpath('marc:datafield[@tag=245]');
    foreach ($datafields as $datafield) {
        $datafield->registerXPathNamespace($nsp, $nsuri);
        $subfields = $datafield->xpath('marc:subfield[@code="a"]');
        var_dump($subfields);
    }
}

或者,您可以仅使用 xpath 而不是 simplexml 对象访问向下递归。 以下是两种将给出相同结果的方法:

$records = $xml->record;
$records->registerXPathNamespace($nsp, $nsuri);
$tags = array('245', '653');
$codes = array('a', 'f');
// METHOD 1: run an xpath for each tag/code combination
$desiredfields = array();
foreach ($tags as $tag) {
    $desiredsubfields = array();
    foreach($codes as $code) {
        $subfields = $records->xpath("marc:datafield[@tag='$tag']/marc:subfield[@code='$code']");
        $desiredsubfields[$code] = (string) $subfields[0];
    }
    $desiredfields[$tag] = $desiredsubfields;
}
var_export($desiredfields);
// METHOD 2: create a single xpath expression that matches every subfield you want
// Then visit each subfield retrieving tag from parent
$tagexpr = implode(' or ', array_map(function($t){return "@tag='{$t}'";}, $tags));
$codeexpr = implode(' or ', array_map(function($c){return "@code='{$c}'";}, $codes));
$xpath = "marc:datafield[{$tagexpr}]/marc:subfield[{$codeexpr}]";
$desiredfields = array();
$subfields = $records->xpath($xpath);
foreach ($subfields as $subfield) {
    $datafield = $subfield->xpath('..');
    $datafieldcode = (string) $datafield[0]['tag'];
    $desiredfields[$datafieldcode][(string) $subfield['code']] = (string) $subfield;
}
var_export($desiredfields);