示例:
@article{boonzaier2009development,<br/>
author = "Boonzaier, A. and Schubach, K. and Troup, K. and Pollard, A. and Aranda, S. and Schofield, P.",<br/>
title = "Development of a psychoeducational intervention ",<br/>
journal = "Journal of Psychosocial Oncology",<br/>
volume = "27",<br/>
number = "1",<br/>
pages = "136-153",<br/>
year = 2009<br/>
}<br/>
@book{bottoff2008women,<br/>
author = "Bottoff, J. L. and Oliffe, J. L. and Halpin, M. and Phillips, M. and McLean, G. and Mroz, L.",<br/>
title = "Women and prostate cancer support groups: {The} gender connect? {Social} {Science} & {Medicine}",<br/>
publisher = "66",<br/>
pages = "1217-1227",<br/>
year = 2008<br/>
}<br/>
@article{bottorff2012gender,<br/>
author = "Bottorff, J. L. and Oliffe, J. L. and Kelly, M.",<br/>
title = "The gender (s) in the room",<br/>
journal = "Qualitative Health Research",<br/>
volume = "22",<br/>
number = "4",<br/>
pages = "435-440",<br/>
year = 2012<br/>
}
我只想捕捉@article部分的双引号之间的字符串。我正在获取@article的计数和@article字段的范围,以获得@article元素的值。使用for循环am获取@article的值(对于循环值:从@article到下一个@article等的范围)问题是,例如第一个字符串@article在第10行,第二个字符串在第18行,我在这个范围之间进行for循环并获取值,但在@book之间也存在,因此如何消除for循环中的@book行范围。因为它捕获了@book元素,因为它在@article的范围内。
php代码:
<?php
$file=file("master.bib");
$typeart=array();
$cont=array();
//count of article
$key = '@article';
foreach ($file as $l => $line) {
if (strpos($line,$key) !== false) {
$l++;
$typeart[]= $l;
}
}//end-count of article
$counttypeart=count($typeart);
for($j=0;$j<$counttypeart;$j++){
for($i=$typeart[$j];$i<$typeart[$j+1];$i++){
if(strpos($file[$i],'author')){
preg_match('/'"(.*?)'"/',$file[$i],$cont);
$author= $cont[1];
echo $author;
echo "<br>";
}
if(strpos($file[$i],'title')){
preg_match('/'"(.*?)'"/',$file[$i],$cont);
$title= $cont[1];
echo $title;
echo "<br>";
}
if(strpos($file[$i],'journal')){
preg_match('/'"(.*?)'"/',$file[$i],$cont);
$journal= $cont[1];
echo $journal;
echo "<br>";
}
if(strpos($file[$i],'volume')){
preg_match('/'"(.*?)'"/',$file[$i],$cont);
$volume= $cont[1];
echo $volume;
echo "<br>";
}
if(strpos($file[$i],'number')){
preg_match('/'"(.*?)'"/',$file[$i],$cont);
$number= $cont[1];
echo $number;
echo "<br>";
}
if(strpos($file[$i],'pages')){
preg_match('/'"(.*?)'"/',$file[$i],$cont);
$pages= $cont[1];
echo $pages;
echo "<br>";
echo "<br>";
}
}
}
?>
预期输出(来自上述示例):
Boonzaier, A. and Schubach, K. and Troup, K. and Pollard, A. and Aranda, S. and Schofield P.
Development of a psychoeducational intervention for men with prostate cancer
Journal of Psychosocial Oncology
27
1
136-153
Bottorff, J. L. and Oliffe, J. L. and Kelly, M.
The gender (s) in the room
Qualitative Health Research
22
4
435-440
您的代码捕获@book
元素的原因似乎是因为您没有记录@article
元素终止的行。因此,当您在@article
元素内的所有行上迭代时,您从@article
元素开始的行开始,并在下一个@article
元素开始的那行结束。
有两种可供选择的方法来修复代码:
-
当您最初扫描文件中的所有行时,记录
@article
元素的开始行和结束行。例如:// count of article $key_start = '@article'; $key_end = '}<br/>'; foreach ($file as $l => $line) { if (strpos($line,$key_start) !== false) { $start = ++$l; next; } if (strpos($line,$key_end) !== false) { $typeart[] = array($start, --$l); next; } } // end-count of article
现在,您应该能够通过简单地执行以下操作来迭代属于
@article
元素的行:for($j=0;$j<$counttypeart;$j++){ list($start, $end) = $typeart[$j]; for ($i=$start; $i<=$end; $i++) { …
-
一到@article的结束标记,就可以尽早脱离第二个
for
循环。因此,避免在所有行上迭代到以下@article
元素,例如:for($i=$typeart[$j];$i<$typeart[$j+1];$i++){ $key_end = '}<br/>'; break if (strpos($line,$key_end) !== false); …
然而,这两种解决方案都不理想,因为这两种方案都会导致代码重复,难以维护。此外,它还依赖于您了解@article
元素中的每个属性,以便获取其值。除非你有很好的理由以这种特定的方式构建你的结构,否则我会选择另一种解决方案…
替代解决方案:
- 一次阅读所有参考书目
- 使用正则表达式捕获所有
@article
元素的内容 - 使用另一个正则表达式在单个
@article
元素的捕获内容中捕获参数名称及其值
以下是我所说内容的简要实现:
<?php
// Use file_get_contents() instead of file() as it is the preferred way
// read the contents of a file into a string. It will also use memory mapping
// techniques if supported by your OS to enhance performance.
$file_content = file_get_contents('master.bib');
// Capture all article container from file content. We use a regular
// expression on a multi-line string to do that:
preg_match_all(
'%@article{'w+,<br/>'s+(.*)'s+}(<br/>)?%sUu',
$file_content,
$articles,
PREG_PATTERN_ORDER
);
// Initialise empty results (plural) container which will store results data
// for all @article elements
$results = array();
// At this point $articles[0] is an array of all captured @article blocks
// and $articles[1] is an array of all captured first parenthesis within
// the above regular expression.
foreach ($articles[1] as $article) {
// Initialise empty result (singular) container which will store results
// for the current @article element
$result = array();
// Now we will take the content of the first paranthesis, split it into
// individual lines and pick out reqired data from those lines.
foreach (explode("'n", $article) as $line) {
$found = preg_match(
'%'s*('w+)'s*='s*"?([^"]+)"?,?<br/>'s*%Uu',
$line,
$matches
);
// At this point $matches is populated with our desired data, unless
// $found is 0 (no matches where found) or false (an error occurred)
if ($found != false and $found > 0) {
$result[$matches[1]] = trim($matches[2]);
}
}
// Add current @article results to the list of all results, but avoid
// doing so if current results are empty
if (!empty($result)) {
$results[] = $result;
}
}
// Print results
foreach ($results as $article) {
print "{$article['author']}'n"
. "{$article['title']}'n"
. "{$article['journal']}'n"
. "{$article['volume']}'n"
. "{$article['number']}'n"
. "{$article['pages']}'n"
. "'n'n";
}