根据用户输入创建XML。其中一个xml节点有一个CDATA节。如果在CDATA节中插入的字符之一是"特殊的"(我认为是一个控制字符),那么整个xml将无效。
的例子:
$dom = new DOMDocument('1.0', 'utf-8');
$dom->appendChild($dom->createElement('root'))
->appendChild($dom->createCDATASection(
"This is some text with a SOH char 'x01."
));
$test = new DOMDocument;
$test->loadXml($dom->saveXML());
echo $test->saveXml();
会给
Warning: DOMDocument::loadXML(): CData section not finished
This is some text with a SOH cha in Entity, line: 2 in /newfile.php on line 17
Warning: DOMDocument::loadXML(): PCDATA invalid Char value 1 in Entity, line: 2 in /newfile.php on line 17
Warning: DOMDocument::loadXML(): Sequence ']]>' not allowed in content in Entity, line: 2 in /newfile.php on line 17
Warning: DOMDocument::loadXML(): Sequence ']]>' not allowed in content in Entity, line: 2 in /newfile.php on line 17
Warning: DOMDocument::loadXML(): internal errorExtra content at the end of the document in Entity, line: 2 in /newfile.php on line 17
<?xml version="1.0"?>
在php中是否有一个好的方法来确保CDATA部分是有效的?
CDATA节允许的字符范围为
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
所以你必须净化你的字符串,只包括那些字符
因为"'x01"不是一个可打印字符。你可以这样解决这个问题:
$dom = new DOMDocument('1.0', 'utf-8');
$dom->appendChild($dom->createElement('root'))
->appendChild($dom->createCDATASection(
urlencode("This is some text with a SOH char 'x01.")
));
$test = new DOMDocument;
$test->loadXml($dom->saveXML());
echo urldecode($test->saveXml());
根据Gordon的回答,我做了:
/**
* Removes invalid characters from an HTML string
*
* @param string $content
*
* @return string
*/
function sanitize_html($content) {
if (!$content) return '';
$invalid_characters = '/[^'x9'xa'x20-'xD7FF'xE000-'xFFFD]/';
return preg_replace($invalid_characters, '', $content);
}
使用:
看看simplexml_load_file
(http://php.net/manual/en/function.simplexml-load-file.php) LIBXML_NOCDATA
选项(http://www.php.net/manual/en/libxml.constants.php)。这很可能会回答你的问题。