如何替换外面的 <pre>标签,而不是内部,用PHP


How to replace outside <pre> tags , not inside, with PHP

我有一个字符串,例如:

This is text outside 'r 'n of pre tags 
<pre class="myclass"> Text inside 'r 'n pre tags</pre> 
This is text 'r 'n  'r'n outside of pre tags

谁能帮我如何替换和删除'r 'n,但只在<pre>标签之外,(<pre class="myclass"></pre>的内容不会被替换)?如何用php正则表达式和preg_replace()或其他方式做到这一点?

我有文本在var $text = 'text<pre class="myclass">text</pre>text';

谢谢你的帮助

更新:感谢所有的回复,对我有帮助,我会考虑DOM,我已经尝试过preg_split(),似乎它适用于我需要的,也许会对某人有帮助-取代<pre class="myclass"></pre>标签外的'r'n:

 function ReplaceOutsidePreTags($text) {
         $parts = preg_split('/('<pre class="myclass"'>.+?'<'/pre'>)/s',$text,-1,PREG_SPLIT_DELIM_CAPTURE);
         $text_new =  '';
         foreach ($parts as $key=>$value) {
           if (preg_match('[<pre class="myclass">|</pre>]',$value) == true) { 
              $text_new .= $value;  
           } else {
            $text_new .= str_replace(array("''r''n","''n","''r"),array("","",""), $value);
           } 
      }
     return $text_new;  
   }
 $text = 'this is text'r'n'r'r'n'n outside pre tag'r'n 
     <pre class="myclass">graphics,'r'n'r'nprogramming </pre>
     this is text outside'r'n pre tag'r'n  
     <pre class="myclass">graphics,'r'n'r'nprogramming </pre>
     this is text outside'r'n pre tag'r'n 
     <pre class="myclass">graphics,'r'n'r'nprogramming </pre>
     this is text outside pre tag'r'n';

           $text_new = $this->ReplaceOutsidePreTags($text);
        echo $text_new;
结果>

this is text outside pre tag 
     <pre class="myclass">graphics,'r'n'r'nprogramming </pre>
     this is text outside pre tag  
     <pre class="myclass">graphics,'r'n'r'nprogramming </pre>
     this is text outside pre tag 
     <pre class="myclass">graphics,'r'n'r'nprogramming </pre>
     this is text outside pre tag

通用"替换内容,但不能在其他内容内"解决方案:

$out = preg_replace("(<pre(?:'s+'w+(?:='w+|'"[^'"]+'"|'[^']+')?)*>.*?</pre>(*SKIP)(*FAIL)"
           ."|'r|'n)is", "", $in);

匹配<pre>标记(带有属性,可以是布尔、无引号、单引号或双引号,因为HTML没有反斜杠转义使问题复杂化),然后跳过并失败。然后匹配换行符并用空字符串替换它们。

作为一个更一般的规则,但是,考虑查看dom解析系统,如DOMDocument。遍历节点,忽略<pre>标记并从剩余的文本节点中删除换行符。

我实际上使用了一个类似于上面的正则表达式,以便在重要的地方保留空白,并从其他地方删除它,但我使用<!-- WSP_BEGIN --> ... <!-- WSP_END -->标记来绕过HTML解析的丑陋-因为用户提供的内容是HTML转义的,它不会与注释冲突,所以没有问题。

编辑:作为参考,这是我使用的代码,它通过剥离不必要的空白,每天为我节省了兆字节到千兆字节的带宽。我将其称为"预压缩空白":

$c = preg_replace_callback(
    "(<!-- WSP_BEGIN -->(.*?)<!-- WSP_END -->|'r|'n|'t)",
    function($m) {
        if( $m[1]) return $m[1]; // effectively strips markers
        else return " "; // condense whitespace
    },
    $c
);

你可以在php中不使用正则表达式:

//we need the string we want to fix, and the 2 limits of the substring we don't want to edit.
function get_string($string, $start, $end){
    //split until '<pre class="myclass">'
    $parts = explode($start,$string);
    //split the remaining part until </pre>
    $parts1 = explode($end,$parts[1]);
    //replace the 2 parts and build an array with the new strings
    $parts[0] = str_replace(array("'n","'r"),array("",""),$parts[0]);
    $parts[1] = $parts1[0];
    $parts[2] = str_replace(array("'n","'r"),array("",""),$parts1[1]);
    return implode(" ", $parts);
}
$fullstring = 'This is text outside 'r 'n of pre tags 
<pre class="myclass"> Text inside 'r 'n pre tags</pre> 
This is text 'r 'n  'r'n outside of pre tags';
$replaced = get_string($fullstring, '<pre class="myclass">', '</pre>');