如何仅获取HTML页面的一部分


how fetch only part of html page

嗨,我所有的示例代码如下:

  <?php 
  $html = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  <html xmlns="http://www.w3.org/1999/xhtml">
  <head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <title>samplecode</title>
  </head>
  <body>
    <div id="warrper">
      <div class="box-title">This title is sample</div>
      <div class="box-maim">
        <div class="box-element-1">
           <ul>
              <li>sample 1</li>
              <li>sample 2</li>
              <li>sample 3</li>
              <li>sample 4</li>
              <li>sample 5</li>
           </ul>
        </div>
        <div class="box-element-1">
           <ul>
              <li>sample 1</li>
              <li>sample 2</li>
              <li>sample 3</li>
              <li>sample 4</li>
              <li>sample 5</li>
           </ul>   
        </div>
        <div class="box-element-1">
           <ul>
              <li>sample 1</li>
              <li>sample 2</li>
              <li>sample 3</li>
              <li>sample 4</li>
              <li>sample 5</li>
           </ul>
        </div>
      </div>
    </div>
  </body>
  </html> ';
   preg_match( '/<div class="box-maim">(.*?)<'/div>/si' , $html , $match );
   print_r($match);
  ?>

从 url 加载 html 后我的目标是,只获取所选类的一部分,例如下面的代码:

  <div class="box-element-1">
     <ul>
        <li>sample 1</li>
        <li>sample 2</li>
        <li>sample 3</li>
        <li>sample 4</li>
        <li>sample 5</li>
     </ul>
  </div>
  <div class="box-element-1">
     <ul>
        <li>sample 1</li>
        <li>sample 2</li>
        <li>sample 3</li>
        <li>sample 4</li>
        <li>sample 5</li>
     </ul>   
  </div>
  <div class="box-element-1">
     <ul>
        <li>sample 1</li>
        <li>sample 2</li>
        <li>sample 3</li>
        <li>sample 4</li>
        <li>sample 5</li>
     </ul>
  </div>

但是我不知道该部分此操作的正确方法.

正如所有建议使用 DOM 一样,请尝试以下代码:

<?php
$html = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  <html xmlns="http://www.w3.org/1999/xhtml">
  <head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <title>samplecode</title>
  </head>
  <body>
    <div id="warrper">
      <div class="box-title">This title is sample</div>
      <div class="box-maim">
        <div class="box-element-1">
           <ul>
              <li>sample 1</li>
              <li>sample 2</li>
              <li>sample 3</li>
              <li>sample 4</li>
              <li>sample 5</li>
           </ul>
        </div>
        <div class="box-element-1">
           <ul>
              <li>sample 1</li>
              <li>sample 2</li>
              <li>sample 3</li>
              <li>sample 4</li>
              <li>sample 5</li>
           </ul>   
        </div>
        <div class="box-element-1">
           <ul>
              <li>sample 1</li>
              <li>sample 2</li>
              <li>sample 3</li>
              <li>sample 4</li>
              <li>sample 5</li>
           </ul>
        </div>
      </div>
    </div>
  </body>
  </html> ';
$dom = new DOMDocument();    
$dom->loadHTML($html);    
$xpath = new DOMXPath($dom);    
$div = $xpath->query('//div[@class="box-maim"]');    
$div = $div->item(0);    
echo $dom->saveXML($div);    
?>

它完美地工作:)

如果我正确地得到你,就这样做:

preg_match_all('/<div's[^>]*class='"box-element-([^'"]*)'"[^>]*>(.*)<'/div>/siU', $html, $matches, PREG_SET_ORDER);
echo '<pre>';
print_r($matches);