preg_match_all从特定的html标记开始并以它结束 - preg_match_all starting from specific html tag and ending with it

preg_match_all starting from specific html tag and ending with it

本文关键字：开始结束 all match preg html | 更新日期: 2023-09-27

我有一个源页面，其中有一个由15行组成的表，内容如下：

<tr class="hlRow" onclick="window.location=link11.href" onmouseover="rowOver(11)" onmouseout="rowOut(11,'#cad9ea')">
  <td class="row3">Latest news</td>
  <td class="row3" id="row_6_11"><a onclick="servOC(11,'/link-to-page.html','',ihTri11)"><img class="tog" id="ihTri11" src="up.png" title="Toggle" height="19" width="19" /></a>14.7w</td>
  <td class="row3" id="name11"><a href="/link-to-page.html" style="float: right; color: green; font-weight: bold;" title="+2 rating, 2 comments">+2<img src="star.png" alt="rating" style="margin-left: 1px;" height="12" width="12" /> 2<img src="bubble.png" alt="comments" style="margin-left: 2px;" height="10" width="10" /></a><a id="link11" href="/link-to-page.html">Got to page</a></td>
  <td class="row3" title="11 files">10 days</td>
  <td class="row3">104</td>
  <td class="row3">108</td>
</tr>

基本上，我需要从源站点获取<tr>到</tr>标签之间的那些行，并在我的网站上显示它们。我尝试过使用preg_match_all()，但由于我的regex经验非常有限，我无法正确使用它。

preg_match_all('<tr class="hlRow"(.*?)</td></tr>/i', $turinys, $linkai, PREG_SET_ORDER);
foreach ($linkai as $linkas) {$a1 = $linkas[1]; echo "<table><tr class='"hlRow'"".$a1."'"></td></tr></table>";}

更棒的是，只从<td>标签内部获取内容，然后在我的页面上预览这些内容。

要回答实际的正则表达式问题：

您需要/s DOTALL模式才能与(.*?)跨多行匹配。

并且</td>和</tr>之间有一个空间，因此需要

 preg_match_all('#<tr class="hlRow"(.*?)</td>'s*</tr>#is'

如果使用/作为文字字符，请注意#分隔符。

实现这一点的方法当然是使用DOM扩展，而不是使用regex。用正则表达式解析HTML会让你发疯。

DOM代码可能如下所示。。。

$dom = new DOMDocument;
$dom->loadHtmlFile('your source url');
$xpath = new DOMXPath($dom);
$rows = $xpath->query('//tr[@class="hlRow"]');
$rowNumber = 1;
foreach ($rows as $row) {
    echo "Row number: ", $rowNumber++, "'n";
    foreach ($row->childNodes as $td) {
        if ($td->nodeName === 'td') {
            echo $td->nodeValue, "'n";
        }
    }
    echo "End of row'n'n";
}