截断文本中搜索关键字之前的内容


Truncate content before search keyword in text

我使用下面的代码截断我的内容在我的文本中的第一个搜索关键字之前和之后(这是我的搜索页面)一切都工作,除了代码在截断开始时将单词切成两半,它不会在截断结束时切断单词。

的例子:

lients at the centre of the relationship and to offer a first class service to them, which includes tax planning, investment management and estate planning. We believe that our customer focused and...

(编辑:有时单词中缺少一个以上的字符)

你会看到它已经把'客户端'上的'c'砍掉了。这只发生在文章的开头,而不是结尾。我该如何解决这个问题?我相信我已经成功了一半。目前代码:

function neatest_trim($content, $chars, $searchquery,$characters_before,$characters_after) {
            if (strlen($content) > $chars) {
                 $pos = strpos($content, $searchquery);
                 $start = $characters_before < $pos ? $pos - $characters_before : 0;
                $len = $pos + strlen($searchquery) + $characters_after - $start;
                $content = str_replace('&nbsp;', ' ', $content);
                $content = str_replace("'n", '', $content);
                $content = strip_tags(trim($content));
                $content = preg_replace('/'s+?('S+)?$/', '', mb_substr($content, $start, $len));
                $content = trim($content) . '...';
                $content = strip_tags($content);
                $content = str_ireplace($searchquery, '<span class="highlight" style="background: #E6E6E6;">' . $searchquery . '</span>', $content);
            }
            return $content;
        }

 $results[] = Array(
  'text' => neatest_trim($row->content,200,$searchquery,120,80)
            );

你在开头保留的120个字符不会检查第120个字符是空格还是字母,无论如何都只是在那里剪掉字符串。

我将做这样的改变,搜索离我们开始的位置最近的"空间"。

$start = $characters_before < $pos ? $pos - $characters_before : 0;
// add this line:
$start = strpos($content, ' ', $start);
$len = $pos + strlen($searchquery) + $characters_after - $start;

这样$start是一个空格的位置,而不是一个单词的字母。

你的函数将变成:

function neatest_trim($content, $chars, $searchquery,$characters_before,$characters_after) {
    if (strlen($content) > $chars) {
    $pos = strpos($content, $searchquery);
    $start = $characters_before < $pos ? $pos - $characters_before : 0;
    $start = strpos($content, " ", $start);
    $len = $pos + strlen($searchquery) + $characters_after - $start;
    $content = str_replace('&nbsp;', ' ', $content);
    $content = str_replace("'n", '', $content);
    $content = strip_tags(trim($content));
    $content = preg_replace('/'s+?('S+)?$/', '', mb_substr($content, $start, $len));
    $content = trim($content) . '...';
    $content = strip_tags($content);
    $content = str_ireplace($searchquery, '<span class="highlight" style="background: #E6E6E6;">' . $searchquery . '</span>', $content);
    }
    return $content;
  }

为什么不使用替换正则表达式?

$result = preg_replace('/.*(.{10}'bword'b.{10}).*/s', '$1', $subject);

那么这将在关键字'word'

的前后各裁剪10个字符

说明:

# .*(.{10}'bword'b.{10}).*
# 
# Options: dot matches newline
# 
# Match any single character «.*»
#    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# Match the regular expression below and capture its match into backreference number 1 «(.{10}'bword'b.{10})»
#    Match any single character «.{10}»
#       Exactly 10 times «{10}»
#    Assert position at a word boundary «'b»
#    Match the characters “word” literally «word»
#    Assert position at a word boundary «'b»
#    Match any single character «.{10}»
#       Exactly 10 times «{10}»
# Match any single character «.*»
#    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»

那么这个regex所做的就是找到你指定的单词(并且只有那个单词,因为它包含在'b -单词边界中),它还找到蚂蚁存储(包括单词)单词之前的10个字符以及单词之后的10个字符。您可以自己用前后字符的变量,当然还有关键字来构造正则表达式。regex也匹配其他所有内容,但替换只使用反向引用$1,这是您想要的输出。