这个问题以前似乎经常被问到，但我发现没有有效的数据解决方案，它很长并且包含特殊的字符，如"<"或"{"或......等等。

我正在服务器上向PHP提交一些巨大的XML数据，如下所示：

<root><id>1</id><text>Here is a very long text with
line breaks, white-spaces and many very unsual charchaters, e.g. < % & }
the text can be more then 5000 characters long
</text></root>

在服务器端，我正在尝试获取文本标签之间的"原始数据"。"文本标签"内的原始数据可以包括任何您可以成像的内容：空格、换行符、奇怪的字符。我提交的是源代码和文本，由CKEditor和代码语法突出显示器格式化。

我通读了这篇文章，基本上每个人都说"使用 XML 解析器"，就像 domDocument 不使用正则表达式一样。

例如，首先，我尝试了几个正则表达式语句。这不是我唯一尝试过的方法。当数据连接括号且数据太长时，它将失败：

//#<text[^>]*>['s'S]*?</text>#
$regex = "#<".$element_name."[^>]*>['s'S]*?</".$element_name.">#";
$found = preg_match($regex, $xml, $matches);
if ($found != false) 
{
    $result = $matches[0];
    return $result;
}

其次，我尝试了这个，如果标签内的数据不太奇怪，它可以工作。我认为解析器不喜欢括号"<"并认为 xml 无效。

 function getTextBetweenTags($tag, $html, $strict=0)
{
    /*** a new dom object ***/
    $dom = new domDocument;
    /*** load the html into the object ***/
    if($strict==0)
    {
        $dom->loadXML($html);
    }
    else
    {
        $dom->loadHTML($html);
    }
    /*** discard white space ***/
    $dom->preserveWhiteSpace = false;
    /*** the tag by its tag name ***/
    $content = $dom->getElementsByTagname($tag);
    /*** the array to return ***/
    //$out = array();
    foreach ($content as $item)
    {
        /*** add node value to the out array ***/
        //$out[] = $item->nodeValue;
        /*** return only the first found element value ***/
        return $item->nodeValue;
    }
    /*** return empty string if nothing found ***/
    return "";
}

所以我的问题是：

如果我确切地知道，数据中只有一个开始和结束的"文本"标签，那么使用 PHP 读取原始数据的最佳方法是什么？

如果有人给我一个有效的正则表达式或代码片段，那就太好了。

对不起，我的英语中等。

===对答案的回应===对答案的回应==

=对答案的回应===

好的，BogdanM

和Steven的答案都有效，但我最喜欢的答案来自BogdanM。

我做了什么。要使它工作：

我在客户端站点上创建自己的 XML，现在使用 CDATA 告诉解析器数据开始和结束的位置
在服务器端，我使用SimpleXML来解析数据。使用 CDATA，解析它就没有问题了。无论数据多么"奇怪"。
我消除了一个常见的"菜鸟错误"，使用HTTP-GET发送大数据。我现在只是使用HTTP-POST来没有限制

再次感谢您的帮助。

你也在生成XML吗？因为如果是，则应将文本数据放在 CDATA 之间。然后用 simplexml 或您选择的一些解析器加载您的 xml，并获取文本标记内容。确保你没有 UTF-8 字符，或者一些在 XML 中根本不允许的字符：http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char

否则，您可以这样做：

preg_match('#<text>(.+?)</text>#is', $xml, $matches);
echo $matches[1]; // your data between <text> and </text>

首先，您原始的正则表达式模式没问题，应该可以正常工作：

#<".$item_name."[^>]*>(['s'S]*?)</".$item_name.">#

但是，您可以更改它以使其更具可读性/功能等......

可能性

正则表达式 1

#<text>(.*)</text>#is

只需捕获text标签之间的所有内容。使用修饰符i允许TEXT和text标记和s使.与新行匹配。

正则表达式 2

#<text.*?>(.*)</text>#is

您的原始正则表达式意味着您希望在开始text标签中收到额外的字符。开始标签中的.*?允许这样做 - ?使其在第一个>停止。

正则表达式 3

#<(text).*?>(.*)</'1>#is

由于开始和结束标签名称相同（即 text ），您可以在开始标记周围放置括号以使其成为捕获组，只需在结束标记中引用 '1 - 因为它是第一个捕获组。

这意味着拼写错误的机会少了一次！

正则表达式 4

#<('.$item_name.').*?>(.*)</'1>#is

使其更具活力。您可以将单词 text 替换为变量（根据您的原始变量）。将其与捕获组混合并像在正则表达式 3 中一样引用，您只需插入一次变量即可获得更清晰、更易读的代码。

比较与原版

#<('.$item_name.').*?>(.*)</'1>#is
#<".$item_name."[^>]*>(['s'S]*?)</".$item_name.">#

工作示例

使用上面的正则表达式 4

$string = "
<root><id>1</id><text>Here is a very long text with
line breaks, white-spaces and many very unsual charchaters, e.g. < % & }
the text can be more then 5000 characters long 
</text></root>";
preg_match('#<('.$item_name.').*?>(.*)</'1>#is', $string, $matches);
var_dump($matches);
/**
Output:
array(3) {
  [0]=>
  string(167) "<text>Here is a very long text with
line breaks, white-spaces and many very unsual charchaters, e.g. < % & }
the text can be more then 5000 characters long 
</text>"
  [1]=>
  string(4) "text"
  [2]=>
  string(154) "Here is a very long text with
line breaks, white-spaces and many very unsual charchaters, e.g. < % & }
the text can be more then 5000 characters long 
"
}
*/

注意：如果您无法获得上述工作示例...工作...那么，您能否提供（通过编辑您的问题或链接）一个不起作用的示例案例？

PHP 和 RegEx:在 XML 标记之间获取原始数据，即使整个 XML 似乎无效