如何使用PHP取消捕获for循环中的元素范围


How to uncapture the range of elements in for loop using PHP?

示例:

@article{boonzaier2009development,<br/>
 author = "Boonzaier, A. and Schubach, K. and Troup, K. and Pollard, A. and Aranda, S. and  Schofield, P.",<br/>
 title = "Development of a psychoeducational intervention ",<br/>
 journal = "Journal of Psychosocial Oncology",<br/>
 volume = "27",<br/>
 number = "1",<br/>
 pages = "136-153",<br/>
 year = 2009<br/>
}<br/>
@book{bottoff2008women,<br/>
  author = "Bottoff, J. L. and Oliffe, J. L. and Halpin, M. and Phillips, M. and McLean, G. and Mroz, L.",<br/>
  title = "Women and prostate cancer support groups: {The} gender connect? {Social} {Science} & {Medicine}",<br/>
  publisher = "66",<br/>
  pages = "1217-1227",<br/>
  year = 2008<br/>
}<br/>
@article{bottorff2012gender,<br/>
 author = "Bottorff, J. L. and Oliffe, J. L. and Kelly, M.",<br/>
 title = "The gender (s) in the room",<br/>
 journal = "Qualitative Health Research",<br/>
 volume = "22",<br/>
 number = "4",<br/>
 pages = "435-440",<br/>
 year = 2012<br/>
}

我只想捕捉@article部分的双引号之间的字符串。我正在获取@article的计数和@article字段的范围,以获得@article元素的值。使用for循环am获取@article的值(对于循环值:从@article到下一个@article等的范围)问题是,例如第一个字符串@article在第10行,第二个字符串在第18行,我在这个范围之间进行for循环并获取值,但在@book之间也存在,因此如何消除for循环中的@book行范围。因为它捕获了@book元素,因为它在@article的范围内。

php代码:

<?php
$file=file("master.bib");
$typeart=array();
$cont=array();
//count of article
$key = '@article';
foreach ($file as $l => $line) {
    if (strpos($line,$key) !== false) {
       $l++;
       $typeart[]= $l;
          }
}//end-count of article
$counttypeart=count($typeart);
for($j=0;$j<$counttypeart;$j++){
    for($i=$typeart[$j];$i<$typeart[$j+1];$i++){
if(strpos($file[$i],'author')){
preg_match('/'"(.*?)'"/',$file[$i],$cont);             
$author= $cont[1];
echo $author;
echo "<br>";    
}   
if(strpos($file[$i],'title')){
preg_match('/'"(.*?)'"/',$file[$i],$cont);             
$title= $cont[1];
echo $title;
echo "<br>";
}
if(strpos($file[$i],'journal')){
preg_match('/'"(.*?)'"/',$file[$i],$cont);             
$journal= $cont[1];
echo $journal;
echo "<br>";
}
if(strpos($file[$i],'volume')){
preg_match('/'"(.*?)'"/',$file[$i],$cont);             
$volume= $cont[1];
echo $volume;
echo "<br>";
}
if(strpos($file[$i],'number')){
preg_match('/'"(.*?)'"/',$file[$i],$cont);             
$number= $cont[1];
echo $number;
echo "<br>";
}
if(strpos($file[$i],'pages')){
preg_match('/'"(.*?)'"/',$file[$i],$cont);             
$pages= $cont[1];
echo $pages;
echo "<br>";
echo "<br>";
}
}
}
?>

预期输出(来自上述示例):

Boonzaier, A. and Schubach, K. and Troup, K. and Pollard, A. and Aranda, S. and Schofield P.
Development of a psychoeducational intervention for men with prostate cancer
Journal of Psychosocial Oncology
27
1
136-153

Bottorff, J. L. and Oliffe, J. L. and Kelly, M.
The gender (s) in the room
Qualitative Health Research
22
4
435-440

您的代码捕获@book元素的原因似乎是因为您没有记录@article元素终止的行。因此,当您在@article元素内的所有行上迭代时,您从@article元素开始的行开始,并在下一个@article元素开始的那行结束。

有两种可供选择的方法来修复代码:

  1. 当您最初扫描文件中的所有行时,记录@article元素的开始行和结束行。例如:

    // count of article
    $key_start = '@article';
    $key_end = '}<br/>';
    foreach ($file as $l => $line) {
        if (strpos($line,$key_start) !== false) {
            $start = ++$l;
            next;
        }
        if (strpos($line,$key_end) !== false) {
            $typeart[] = array($start, --$l);
            next;
        }
    }
    // end-count of article
    

    现在,您应该能够通过简单地执行以下操作来迭代属于@article元素的行:

    for($j=0;$j<$counttypeart;$j++){
        list($start, $end) = $typeart[$j];
        for ($i=$start; $i<=$end; $i++) {
        …
    
  2. 一到@article的结束标记,就可以尽早脱离第二个for循环。因此,避免在所有行上迭代到以下@article元素,例如:

    for($i=$typeart[$j];$i<$typeart[$j+1];$i++){
        $key_end = '}<br/>';
        break if (strpos($line,$key_end) !== false);
        …
    

然而,这两种解决方案都不理想,因为这两种方案都会导致代码重复,难以维护。此外,它还依赖于您了解@article元素中的每个属性,以便获取其值。除非你有很好的理由以这种特定的方式构建你的结构,否则我会选择另一种解决方案…

替代解决方案:

  1. 一次阅读所有参考书目
  2. 使用正则表达式捕获所有@article元素的内容
  3. 使用另一个正则表达式在单个@article元素的捕获内容中捕获参数名称及其值

以下是我所说内容的简要实现:

<?php
// Use file_get_contents() instead of file() as it is the preferred way
// read the contents of a file into a string. It will also use memory mapping
// techniques if supported by your OS to enhance performance.
$file_content = file_get_contents('master.bib');
// Capture all article container from file content. We use a regular 
// expression on a multi-line string to do that:
preg_match_all(
    '%@article{'w+,<br/>'s+(.*)'s+}(<br/>)?%sUu',
    $file_content,
    $articles,
    PREG_PATTERN_ORDER
);
// Initialise empty results (plural) container which will store results data 
// for all @article elements
$results = array();
// At this point $articles[0] is an array of all captured @article blocks 
// and $articles[1] is an array of all captured first parenthesis within 
// the above regular expression.
foreach ($articles[1] as $article) {
    // Initialise empty result (singular) container which will store results
    // for the current @article element
    $result = array();
    // Now we will take the content of the first paranthesis, split it into
    // individual lines and pick out reqired data from those lines.
    foreach (explode("'n", $article) as $line) {
        $found = preg_match(
            '%'s*('w+)'s*='s*"?([^"]+)"?,?<br/>'s*%Uu',
            $line,
            $matches
        );
        // At this point $matches is populated with our desired data, unless
        // $found is 0 (no matches where found) or false (an error occurred)
        if ($found != false and $found > 0) {
            $result[$matches[1]] = trim($matches[2]);
        }
    }
    // Add current @article results to the list of all results, but avoid
    // doing so if current results are empty
    if (!empty($result)) {
        $results[] = $result;
    }
}
// Print results
foreach ($results as $article) {
    print "{$article['author']}'n"
        . "{$article['title']}'n"
        . "{$article['journal']}'n"
        . "{$article['volume']}'n"
        . "{$article['number']}'n"
        . "{$article['pages']}'n"
        . "'n'n";
}