筛选xml CDATA响应


filter xml CDATA response

好吧,我做了一个return simplexml_load_string($data, 'SimpleXMLElement', LIBXML_COMPACT | LIBXML_NOCDATA | LIBXML_NOBLANKS | LIBXML_NOEMPTYTAG );并解析xml响应。

问题是[描述]的内容真的很混乱,我需要选择我需要的数据。

[description] => 
    <a href="http://www.metacafe.com/watch/cb-YpE1z5IhjWrmCM62DSTU8jQ9X4IZryVR/the_dish_with_doc_willoughby/"><img src="http://s4.mcstatic.com/thumb/8000947/21507982/4/directors_cut/0/1/the_dish_with_doc_willoughby.jpg?v=8" align="right" border="0" alt="THE Dish with Doc Willoughby" vspace="4" hspace="4" width="134" height="78" /></a>
    <p>
        Doc Willoughby, guru of &quot;America's Test Kitchen,&quot; stopped by &quot;CBS The Morning: Saturday&quot; to share his ultimate dish with Rebecca Jarvis and Jeff Glor: Roast Beef Tenderloin with Dried Fruit and Nut Stuffing.                 <br>Ranked <strong>4.00</strong> / 5 | 2 views | <a href="http://www.metacafe.com/watch/cb-YpE1z5IhjWrmCM62DSTU8jQ9X4IZryVR/the_dish_with_doc_willoughby/">0 comments</a><br/>
    </p>
    <p>
            <a href="http://www.metacafe.com/watch/cb-YpE1z5IhjWrmCM62DSTU8jQ9X4IZryVR/the_dish_with_doc_willoughby/"><strong>Click here to watch the video</strong></a> (04:58)<br/>
        Submitted By:                       <a href="http://www.metacafe.com/channels/CBS/">CBS</a><br/>
        Tags:
        <a href="http://www.metacafe.com/topics/cbsepisode/">Cbsepisode</a>&nbsp;<a href="http://www.metacafe.com/topics/dish/">Dish</a>&nbsp;<a href="http://www.metacafe.com/topics/doc_willoughby/">Doc Willoughby</a>&nbsp;<a href="http://www.metacafe.com/topics/america%27s_test_kitchen/">America's Test Kitchen</a>&nbsp;<a href="http://www.metacafe.com/topics/roast_beef_tenderloin/">Roast Beef Tenderloin</a>&nbsp;<a href="http://www.metacafe.com/topics/dried_fruit/">Dried Fruit</a>&nbsp;<a href="http://www.metacafe.com/topics/nut_stuffing/">Nut Stuffing</a>&nbsp;<a href="http://www.metacafe.com/topics/cbs_this_morning/">CBS This Morning</a>&nbsp;                      <br/>
    Categories: <a href='http://www.metacafe.com/videos/news_and_events/'>News &amp; Events</a>                     </p>

正如你所看到的,它真的很糟糕,我想知道如何才能获得第一个<p>的数据,例如,直到"
排名…"
和标签也是

编辑:

好的,这是我正在使用的php代码:

        $dom = new DOMDocument();
        @$dom->loadHTML($result->description); // or you can use loadXML
        $dom->normalizeDocument();
        /*$dom->resolveExternals = false;
        $dom->substituteEntities = false;*/
        $xml = simplexml_import_dom($dom);
        $data['viewData']['data']['description'] = $xml;

        $paragraph = $dom->getElementsByTagName('p');  -> this doesn't work
        //$xml = simplexml_import_dom($dom);
        $data['viewData']['data']['description'] = $paragraph;

这是输出:

[description] => SimpleXMLElement Object
                (
                    [body] => SimpleXMLElement Object
                        (
                            [a] => SimpleXMLElement Object
                                (
                                    [@attributes] => Array
                                        (
                                            [href] => http://www.metacafe.com/watch/cb-YpE1z5IhjWrmCM62DSTU8jQ9X4IZryVR/the_dish_with_doc_willoughby/
                                        )
                                    [img] => SimpleXMLElement Object
                                        (
                                            [@attributes] => Array
                                                (
                                                    [src] => http://s4.mcstatic.com/thumb/8000947/21507982/4/directors_cut/0/1/the_dish_with_doc_willoughby.jpg?v=8
                                                    [align] => right
                                                    [border] => 0
                                                    [alt] => THE Dish with Doc Willoughby
                                                    [vspace] => 4
                                                    [hspace] => 4
                                                    [width] => 134
                                                    [height] => 78
                                                )
                                        )
                                )
                            [p] => Array
                                (
                                    [0] => 
                    Doc Willoughby, guru of "America's Test Kitchen," stopped by "CBS The Morning: Saturday" to share his ultimate dish with Rebecca Jarvis and Jeff Glor: Roast Beef Tenderloin with Dried Fruit and Nut Stuffing.                 Ranked  / 5 | 2 views | 
                                    [1] => SimpleXMLElement Object
                                        (
                                            [a] => Array
                                                (
                                                    [0] => SimpleXMLElement Object
                                                        (
                                                            [@attributes] => Array
                                                                (
                                                                    [href] => http://www.metacafe.com/watch/cb-YpE1z5IhjWrmCM62DSTU8jQ9X4IZryVR/the_dish_with_doc_willoughby/
                                                                )
                                                            [strong] => Click here to watch the video
                                                        )
                                                    [1] => CBS
                                                    [2] => Cbsepisode
                                                    [3] => Dish
                                                    [4] => Doc Willoughby
                                                    [5] => America's Test Kitchen
                                                    [6] => Roast Beef Tenderloin
                                                    [7] => Dried Fruit
                                                    [8] => Nut Stuffing
                                                    [9] => CBS This Morning
                                                    [10] => News & Events
                                                )
                                            [br] => Array
                                                (
                                                    [0] => SimpleXMLElement Object
                                                        (
                                                        )
                                                    [1] => SimpleXMLElement Object
                                                        (
                                                        )
                                                    [2] => SimpleXMLElement Object
                                                        (
                                                        )
                                                )
                                        )
                                )

有没有什么方法可以"让输出更漂亮"?我的意思是秩序更好。。。我也尝试过使用getElementsByTagName('p'),但没有成功

看看我之前的回答。它将字符串解析为XML对象,这样您就可以轻松地访问任何节点。

在你的情况下得到第一段:

$dom = new DOMDocument();
$dom->loadHTML($description); // $description - your string from the feed
if (!$dom) {
    die('Error loading HTML string.');
}
$xml = simplexml_import_dom($dom);
$p = (string)$xml->body->p;
echo '<pre>'; print_r($p); echo '</pre>';