PHP在body内获取标签并删除每个标签内的文本内容 - PHP get tags inside body and remove the text content inside of each tag

PHP get tags inside body and remove the text content inside of each tag

本文关键字：标签文本 body 获取 PHP 删除 | 更新日期: 2023-09-27

我想抓取body内的所有内容。

<html>
<head><title>Test</title>
</head>
<body>
<div id="dummy">Your contents</div>
<p class="p">Paragraph</p>
<div id="example">My Content</div>
</body>
</html>

和我想要的最终结果:

<div id="dummy"></div>
<p class="p"></p>
<div id="example"></div>

不像这样:

<div id="dummy">Your contents</div>
<p class="p">Paragraph</p>
<div id="example">My Content</div>

$content = '<html>
<head><title>Test</title>
</head>
<body>
<div id="dummy">Your contents</div>
<p class="p">Paragraph</p>
<div id="example">My Content</div>
</body>
</html>';
preg_match('/(?:<body[^>]*>)(.*)<'/body>/isU', $content, $matches);
$bodycontent = $matches[1];
echo htmlspecialchars($bodycontent);
preg_match_all('/<[^>]*>/isU', $bodycontent, $matches2);
$tags = implode("",$matches2[0]);
echo htmlspecialchars($tags);

尽管这样可以:

if (preg_match('%<(body)[^>]*>(.*)<'s*/'1's*>%s', $subject, $regs)) {
    $result = $regs[2];
}

我不推荐它。你有更好的工具为这项工作与php。例如，使用以下解析器:

# create and load the HTML  
include('simple_html_dom.php');  
$html = new simple_html_dom();  
$html->load("<html>
               <head><title>Test</title></head>
               <body>
                 <div id="dummy">Your contents</div>
                 <p class="p">Paragraph</p>
                 <div id="example">My Content</div>
               </body>
            </html>");  

# get an element representing the body  
$element = $html->find("body");

编辑:

既然你坚持……

$result = preg_replace('%(<(div)[^>]*>).*<'s*/'2's*>%', ''1</'2>', $subject);

这将删除一个div标签的内容。也可以用其他标记交换div标记。虽然我真的不知道你在哪里得到这个和我不不建议