我的正则表达式不能存储2个值有什么问题


Whats wrong with my regex not being able to store 2 values?

在某些情况下,这可以很好地工作,而在下面这样的其他情况下,则不然。

$xml_url = 'http://campusdining.compass-usa.com/Hofstra/Pages/SignageXML.aspx?location=Student%20Center%20Cafe';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $xml_url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; U; Linux i686; ru; rv:1.9.3a5pre) Gecko/20100526 Firefox/3.7a5pre");
$data = curl_exec($ch);
$ce = curl_error($ch);
curl_close($ch);
// this is how I was doing it prior to today and it worked before
// preg_match_all("/<MealPeriod name='"(.+?)'">(['w'W'r'n]*?)<'/MealPeriod>/i", $data, $output_array);
// this way doesnt show all the meal periods, 
// but I need to know whats in between the MealPeriod tags
// preg_match_all('/<MealPeriod name="(.*?)">(.*?)<'/MealPeriod>/i', $data, $output_array); 
// shows all the meal period names, 
// but I need the above to work to store whats in between the MealPeriod tags in the $output_array[2]
preg_match_all('/<MealPeriod name="(.*?)">/i', $data, $output_array); 
echo '<pre> '.print_r($output_array[1],1).'</pre>';

我在几个regex live网站上尝试过,其中1个网站返回了我需要的内容,而第二个网站没有
http://www.phpliveregex.com/--确实有效
https://regex101.com/--不起作用

$output_array[1]的预期输出如下:

 Array
(
    [0] => Breakfast
    [1] => Every Day
    [2] => Outtakes
    [3] => Salad Bar
)

但它也应该包含$output_array[2] 中MealPeriod标签之间的内容

如有任何帮助,将不胜感激

下面的代码有效,我所做的只是更改正则表达式和printing.

屏幕上的输出看起来相当奇怪,因为捕获<MealPeriod></MealPeriod>之间的所有内容的第二个(.*?)也是捕获所有xml标记。如果您查看源代码,您可以清楚地看到这一点。

我鼓励您使用XML Parser来处理文档。在使用解析器将XML文档转换为对象之前,我当然使用过regex来提取部分XML文档,但解析器比regex更适合处理XML(突飞猛进)。

所有的东西都被捕获了,但没有用<pre>标签打印到屏幕上。但是,如果您查看源代码,所有都在那里

<?php
$xml_url = 'http://campusdining.compass-usa.com/Hofstra/Pages/SignageXML.aspx?location=Student%20Center%20Cafe';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $xml_url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; U; Linux i686; ru; rv:1.9.3a5pre) Gecko/20100526 Firefox/3.7a5pre");
$data = curl_exec($ch);
$ce = curl_error($ch);
curl_close($ch);
// this is how I was doing it prior to today and it worked before
// preg_match_all("/<MealPeriod name='"(.+?)'">(['w'W'r'n]*?)<'/MealPeriod>/i", $data, $output_array);
// this way doesnt show all the meal periods, 
// but I need to know whats in between the MealPeriod tags
// preg_match_all('/<MealPeriod name="(.*?)">(.*?)<'/MealPeriod>/i', $data, $output_array); 
// shows all the meal period names, 
// but I need the above to work to store whats in between the MealPeriod tags in the $output_array[2]
preg_match_all('/<MealPeriod name="(.*?)">(.*?)<'/MealPeriod>/i', $data, $output_array); 
echo '<pre> '.print_r($output_array,1).'</pre>';
?>

我找到了答案,这要归功于以下堆栈溢出post-phpregex或|operator

我需要将正则表达式更改为以下内容,我终于能够在正确的数组中返回的所有用餐时间和内容。

'/<MealPeriod name="(.*?)">(.*?)<'/?MealPeriod>/i'

母鸡?在<'/?Meal