如何在PHP中从XML中删除实体声明


How to remove entity declarations from XML in PHP

我试图从XML文件中删除<!ENTITY定义而没有成功,我认为通过使用以下代码片段,输出将不包含实体定义的痕迹,但我错了。

我怎样才能达到这个目标?

DOMDocument::loadXML(): xmlns: URI &ns_svg; is not absolute in Entity

旁边没有错误信息

小上下文:

我嵌入一个SVG在另一个,但<!ENTITY给了我各种各样的问题,所以我正在考虑使用LIBXML_NOENT和删除所有的<!ENTITY定义。

PHP:

<?php
header('Content-Type: text/plain');
$str = file_get_contents(dirname(__FILE__) . '/test2.svg');

$document = new DOMDocument();
$document->loadXML($str);
foreach ($document->doctype->entities as $entity) {
    $entity->parentNode->removeChild($entity); // I thought this would remove the <!ENTITY declaration
}
echo $document->saveXML(); // --> I want the XML without <!ENTITY ns_svg "http://www.w3.org/2000/svg"> and <!ENTITY ns_xlink "http://www.w3.org/1999/xlink">
XML:

<?xml version="1.0" encoding="utf-8"?>
<!-- Generator: Adobe Illustrator 12.0.1, SVG Export Plug-In . SVG Version: 6.00 Build 51448)  -->
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd" [
    <!ENTITY ns_svg "http://www.w3.org/2000/svg">
    <!ENTITY ns_xlink "http://www.w3.org/1999/xlink">
]>
<svg  version="1.1"
     id="Bundesschild" sodipodi:version="0.32" xmlns:cc="http://web.resource.org/cc/" xmlns:sodipodi="http://inkscape.sourceforge.net/DTD/sodipodi-0.dtd" inkscape:version="0.43" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:svg="http://www.w3.org/2000/svg" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape" sodipodi:docbase="D:'Kuvat'Wikipedia" sodipodi:docname="Flag_of_Germany_(state).svg"
     xmlns="&ns_svg;" xmlns:xlink="&ns_xlink;" width="250" height="275" viewBox="0 0 250 275"
     overflow="visible" enable-background="new 0 0 250 275" xml:space="preserve">
<path id="Schild" fill="#FFCE00" stroke="#000000" d="M235.885,2.558c0,0,0,131.825,0,171.735
    c0,54.121-50.504,98.265-112.501,98.265c-61.996,0-112.5-44.144-112.5-98.265c0-39.91,0-171.735,0-171.735H235.885z"/>
</svg>
输出:

<?xml version="1.0" encoding="utf-8"?>
<!-- Generator: Adobe Illustrator 12.0.1, SVG Export Plug-In . SVG Version: 6.00 Build 51448)  -->
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd" [
<!ENTITY ns_svg "http://www.w3.org/2000/svg"> <!-- Why this is still here???? -->
<!ENTITY ns_xlink "http://www.w3.org/1999/xlink"> <!-- Why this is still here???? -->
]>
<svg xmlns:cc="http://web.resource.org/cc/" xmlns:sodipodi="http://inkscape.sourceforge.net/DTD/sodipodi-0.dtd" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:svg="http://www.w3.org/2000/svg" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape" xmlns="&ns_svg;" xmlns:xlink="&ns_xlink;" version="1.1" id="Bundesschild" sodipodi:version="0.32" inkscape:version="0.43" sodipodi:docbase="D:'Kuvat'Wikipedia" sodipodi:docname="Flag_of_Germany_(state).svg" width="250" height="275" viewBox="0 0 250 275" overflow="visible" enable-background="new 0 0 250 275" xml:space="preserve">
<path id="Schild" fill="#FFCE00" stroke="#000000" d="M235.885,2.558c0,0,0,131.825,0,171.735  c0,54.121-50.504,98.265-112.501,98.265c-61.996,0-112.5-44.144-112.5-98.265c0-39.91,0-171.735,0-171.735H235.885z"/>
</svg>

只能删除整个文档类型节点。实体似乎不是它的子节点。但在此之前,您应该使用LIBXML_NOENT:

替换实体
$document = new DOMDocument();
$document->loadXML($xml, LIBXML_NOENT);
$document->removeChild($document->doctype);
echo $document->saveXML();

保存我找到的文档类型的唯一方法是创建一个具有文档类型的新文档,并将所有节点复制到其中。

$document = new DOMDocument();
$document->loadXML($xml, LIBXML_NOENT);
$existingDocumentType = $document->doctype;
if (NULL !== $existingDocumentType) {
  $implementation = new DOMImplementation;
  $newDocumentType = $implementation->createDocumentType(
    $existingDocumentType->name, 
    $existingDocumentType->publicId, 
    $existingDocumentType->systemId
  );
  $newDocument = $implementation->createDocument(null, null, $newDocumentType);
  foreach ($document->childNodes as $node) {
    $copy = $newDocument->importNode($node, TRUE);
    if ($copy) {
      $newDocument->appendChild($copy);
    }
  }
  $document = $newDocument;
}
echo $document->saveXML();

文档类型节点不能导入到另一个文档中,DOMDcoument::importNode()将为其返回FALSE

如果你看标准DOMDocumentType::entitites是一个DOMNamedNodeMap,应该有一个方法removeNamedItem()。但如果你叫它PHP输出:

Warning: DOMNamedNodeMap::removeNamedItem(): Not yet implemented in ... on line ...