PHP empty matches


PHP empty matches

我正在尝试从字符串(整个网站的源代码获取CURL -)

<tr>
    <td><a href="http://www.gpw.pl/karta_spolki/LT0000128555/">AAL</a></td>
<td><a href="http://www.gpw.pl/karta_spolki/LT0000128555/">AVIAAM LEASING AB</a></td>
</tr>
<tr class="even">
    <td><a href="http://www.gpw.pl/karta_spolki/PLTRNSU00013/">AAT</a></td>
    <td><a href="http://www.gpw.pl/karta_spolki/PLTRNSU00013/">ALTA SPÓŁKA AKCYJNA</a></td>

我想让所有的3字符锚匹配在一个数组中,例如AALAAT(有更多)

我有的是:

$subject = curl_exec($ch);        
$pattern = '`<td><a href="http://www'.gpw'.pl/karta_spolki/[0-9A-Za-z ]+/">[0-9A-Z]{3}</a></td>`';
preg_match_all($pattern, $subject, $matches, PREG_PATTERN_ORDER);
print_r($matches);

结果我得到

Array ( [0] => Array ( ) ) 
你能给我一些建议如何解决这个问题吗?

你可以使用一个DOMDocument对象来构建你的数组,像这样:

$doc = new DOMDocument();
$doc->LoadHTML($str);
$matches = array();
foreach($doc->getElementsByTagName('a') as $a) {
    $text = $a->nodeValue;
    if(strlen($text) === 3) $matches[] = $text;
}

这将遍历HTML字符串中的所有锚元素,并构建如下数组:

Array
(
    [0] => AAL
    [1] => AAT
)

我刚试过你的例子&您的regex与所提供的小示例一样工作:

$subject = <<<EOT
<tr>
    <td><a href="http://www.gpw.pl/karta_spolki/LT0000128555/">AAL</a></td>
<td><a href="http://www.gpw.pl/karta_spolki/LT0000128555/">AVIAAM LEASING AB</a></td>
</tr>
<tr class="even">
    <td><a href="http://www.gpw.pl/karta_spolki/PLTRNSU00013/">AAT</a></td>
    <td><a href="http://www.gpw.pl/karta_spolki/PLTRNSU00013/">ALTA SPÓŁKA AKCYJNA</a></td>
EOT;
$pattern = '`<td><a href="http://www'.gpw'.pl/karta_spolki/[0-9A-Za-z ]+/">[0-9A-Z]{3}</a></td>`';
preg_match_all($pattern, $subject, $matches, PREG_PATTERN_ORDER);
echo '<pre>';
print_r($matches);
echo '</pre>';
结果:

Array
(
    [0] => Array
        (
            [0] => AAL
            [1] => AAT
        )
)

但是,我实际上挖掘了我认为是curl请求的源URL,当我测试它时它失败了。所以我将正则表达式调整为:

/(?<=>)[0-9A-Z]{3}(?=<'/a><'/td>)/is

现在事情似乎与我的代码一起工作得很好,试图重新创建您正在制作的curl请求。

// Set the URL.
$url="http://www.gpw.pl/lista_spolek_en";
// The actual curl request.
$curl_timeout = 5;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $curl_timeout);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$subject = curl_exec($ch);
curl_close($ch);
// Set the regex pattern.
$pattern = '/(?<=>)[0-9A-Z]{3}(?=<'/a><'/td>)/is';
// Run the preg match all command with the regex pattern.
preg_match_all($pattern, $subject, $matches, PREG_PATTERN_ORDER);
// Return the results.
echo '<pre>';
print_r($matches);
echo '</pre>';

从我的角度来看,输出似乎工作得很好:

Array
(
    [0] => Array
        (
            [0] => AAL
            [1] => AAT
            [2] => ABC
            [3] => ABE
            [4] => ABM
            [5] => ABS
            [6] => ACE
            [7] => ACG
            [8] => ACP
            [9] => ACS
            [10] => ACT
            [11] => ADS
            [12] => AGO
            [13] => AGT
            [14] => ALC
            [15] => ALM
            [16] => ALR
            [17] => ALT
            [18] => AMB
            [19] => AMC
            [20] => APL
            [21] => APN
            [22] => APT
            [23] => ARC
            [24] => ARR
            [25] => ASB
            [26] => ASE
            [27] => ASG
            [28] => AST
            [29] => ATC
            [30] => ATD
            [31] => ATG
            [32] => ATL
            [33] => ATM
            [34] => ATP
            [35] => ATR
            [36] => ATS
            [37] => AWB
            [38] => AWG
            [39] => EAT
            [40] => ACP
            [41] => ALR
            [42] => BZW
            [43] => EUR
            [44] => JSW
            [45] => KER
            [46] => KGH
            [47] => LPP
            [48] => LTS
            [49] => LWB
            [50] => MBK
            [51] => OPL
            [52] => PEO
            [53] => PGE
            [54] => PGN
            [55] => PKN
            [56] => PKO
            [57] => PZU
            [58] => SNS
            [59] => TPE
        )
)