取消链接图像,删除未关闭的p';s并删除所有样式


Unlink images, remove unclosed p's and remove all styles

我的Wordpress帖子有一些问题,我正在尝试使用DOMDocument来解决这些问题。

第一个问题是我的图像(<img><a>标签内,我想删除<a>标签

我还想删除所有未关闭的<p>标签(没有</p>),并从所有元素中删除style

我可以发布一些我尝试过的代码,但我认为这根本没有帮助,因为我没有任何进展。我现在只尝试过从图片中删除链接,但似乎什么都不起作用。我真的不知道如何很好地使用DOMDocument子元素。

在这里,您可以看到一个需要修复的HTML示例:

<img width="750" height="500" src="http://fancycribs.com/wp-content/uploads/2013/05/Modern-Riverside-Apartment-–-A-Stylish-and-Elegant-Residence-6.jpg" class="attachment-large wp-post-image" alt="Modern Riverside Apartment – A Stylish and Elegant Residence (6)" />        <p>This modern seventh floor riverside apartment is placed in the luxurious and modern Montevetro Building, which is close to Battersea Square with access to Chelsea, Fulham and Kings Road by crossing Battersea Bridge, London. This residence has become one of the iconic buildings in the Battersea area.</p>
<p>It offers spectacular views over the serene tranquility of the river. This apartment offers comfort and luxury throughout its double reception room, three bedrooms, three bathrooms and large decked balcony. The design details are astonishing: mahogany wood floors, original hand painted walls, large floor to ceiling windows offering a spectacular view over the river. The apartment is spacious, the space between living room and dining room is fluid, having continuity. The hall is large and has a lot of storage spaces, having the quality to link rooms one to another. The kitchen space is large and has plenty of storage capacity. It is dressed up in mahogany wood, offering personality and contrast and access to the large balcony.</p>
<p>The master bedroom is a masterpiece of style and elegance, with nice and simple furniture, a bathroom and accompanied by two further double bedrooms, a family bathroom and a shower room. The residence overwhelms you through its luxury and the splendid view.</p>
<p style="text-align: center"><a href="http://fancycribs.com/37216-modern-riverside-apartment-a-stylish-and-elegant-residence.html/modern-riverside-apartment-a-stylish-and-elegant-residence-7" rel="attachment wp-att-39033" class="local-link"><img class="aligncenter size-medium wp-image-39033" alt="Modern Riverside Apartment – A Stylish and Elegant Residence" src="http://fancycribs.com/wp-content/uploads/2013/05/Modern-Riverside-Apartment-–-A-Stylish-and-Elegant-Residence-7-670x446.jpg" width="670" height="446" title="Modern Riverside Apartment – A Stylish and Elegant Residence" /></a></p>
<p style="text-align: center">

后期编辑:

这是我尝试的,它似乎取消了图像的链接,但只有图像编号1,3,5,7,而2,4,6保持不变。

$html = new DOMDocument;
$html->preserveWhiteSpace = false;
$html->loadHTML('<meta http-equiv="content-type" content="text/html; charset=utf-8">'.$content);
foreach($html->getElementsByTagName('a') as $a) {
    if($a->hasChildNodes()) {
        $img = $a->getElementsByTagName('img')->item(0);
        $a->parentNode->replaceChild($img,$a);
    }
}
$text = $html->saveHTML();
echo $text;

感谢

我已经用DOMDocument和HTML Purifier做到了。

这是代码:

require_once 'library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.TidyLevel','heavy');
$config->set('AutoFormat.RemoveEmpty','true');
$config->set('AutoFormat.RemoveEmpty.RemoveNbsp','true');
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($content);
$html = new DOMDocument;
$html->preserveWhiteSpace = false;
$html->loadHTML('<meta http-equiv="content-type" content="text/html; charset=utf-8">'.$clean_html);
$as = $html->getElementsByTagName('a');
$ctr = $html->getElementsByTagName('a')->length;
for($i=$ctr;$i>0;--$i) {
    $a = $html->getElementsByTagName('a')->item($i-1);
    if($a->hasChildNodes()) {
        $img = $a->getElementsByTagName('img')->item(0);
        if($img != null) {
            $a->parentNode->replaceChild($img,$a);
        }
    }
}
foreach($html->getElementsByTagName('p') as $p) {
    $p->removeAttribute('style');
}
$text = $html->saveHTML();
echo $text;

你能试着运行这段代码,看看你是否满意吗。这找到<a ...><img ... and replaces it to just <img ...

$p = "/<a's[^>]*href=('"??)([^'" >]*?)''1[^>]*>(.*<img.*)<'/a>/siU";
$newHtml = preg_replace($p, '$3', $html , PREG_SET_ORDER );