PHP在两个delimeter之间获取字符串 - PHP getting string between two delimeters

PHP getting string between two delimeters

我正试图从以下字符串中获取href值：

<td valign="top" width="300"class="topborder"><a href="/path/to/somewhere" class="bigger">random1</a><br/>
<td valign="top" width="300"class="topborder"><a href="/path/to/somewhere2" class="bigger">random2</a><br/>

在这种情况下，我应该得到"/path/to/somewhere"answers"/path_to/somewhere 2"

我试着做了以下操作，但只得到了空字符串。

$htmlc = str_replace(' ', '', $htmlc);
//$htmlc contains the string I am parsing with the spaces removed
preg_match_all('/width='"300'"class='"topborder'"><ahref='"([^'"class='"bigger'"]+)/', $htmlc, $hrefvals);

$hrefvals此时包含空字符串。我的preg_match_all做错了什么？

您所需要的只是DOM和XPath。正则表达式不是为HTML解析而设计的。

<?php
$html = <<<HTML
<td valign="top" width="300"class="topborder"><a href="/path/to/somewhere" class="bigger">random1</a><br/>
<td valign="top" width="300"class="topborder"><a href="/path/to/somewhere2" class="bigger">random2</a><br/>
HTML;
$dom = new DOMDocument;
$dom->loadHTML($html);
// replace with @$dom->loadHTMLFile('http://...') with you want to parse an URL
$xpath = new DOMXPath($dom);
$links = array_map(function ($node) {
        return $node->getAttribute('href');
    }, iterator_to_array($xpath->query("//td[@class='topborder']/a[@class='bigger']")));
var_dump($links);

这给了我以下信息：

array(2) {
  [0]=>
  string(18) "/path/to/somewhere"
  [1]=>
  string(19) "/path/to/somewhere2"
}

尝试类似的模式

/width='"300'"class='"topborder'"><ahref='"(.*?)"/

"(.*?)"将匹配任何字符，但为"懒惰"字符。这意味着，一旦它找到组之后的第一个"（在这种情况下：href标签的末尾），该组将结束

演示

或者您可以尝试：

$htmlc = '
<td valign="top" width="300"class="topborder"><a href="/path/to/somewhere" class="bigger">random1</a><br/>
<td valign="top" width="300"class="topborder"><a href="/path/to/somewhere2" class="bigger">random2</a><br/>
';
preg_match_all('~(?<=<a'shref=")[^"]*~', $htmlc, $hrefvals);
var_dump($hrefvals);

<script>
$(document).ready(function(){
  $("button").click(function(){
    alert($("#blah").attr("href"));
  });
});
</script>

然后。。。

<a href="http://www.blah.com" id="blah">Blah</a></p>
<button>Show href Value</button>

这就是你的意思吗？