不要在读取时间和字/字符计数器中包含bbCode


Don't include bbCode in reading time and word/character counters

我在PHP中使用不同的函数来帮助我计算单词、字符和阅读时间。但它们都有一个小"错误":函数计算所有内容-包括bbCode(带笑脸)。我不想那样!

function calculate_readingtime($string) {
    $word = str_word_count(strip_tags($string));
    $m = floor($word / 200);
    $s = floor($word % 200 / (200 / 60));
    $minutes = ($m != 0 ? $m.' min.' : '');
    $seconds = (($m != 0 AND $s != 0) ? ' ' : '') . $s.' sec.';
    return $minutes . $seconds;
}
$content = 'This is some text with [b]bbCode[/b]! Oh, so pretty :D And here''s is a link too: [url="https://example.com/"]das linkish[/url]. What about an image? That''s pretty to, you know. [img src="https://example.com/image.jpg" size="128" height="128" width="128"] And another one: [img src="https://example.com/image.jpg" height="128"]';
$reading_time = calculate_readingtime($content);
$count_words = str_word_count($content, 1, 'àáãâçêéíîóõôúÀÁÃÂÇÊÉÍÎÓÕÔÚÅåÄäÖö');
$count_chars_with_spaces = mb_strlen($content);
echo 'Reading time: '.$reading_time.'<br>';
echo 'Words: '.count($count_words).'<br>';
echo 'Characters with spaces: '.$count_chars_with_spaces;
# OUTPUT
Reading time: 16 sec.
Words: 55
Characters with spaces: 326

我希望计数器(包括阅读时间)更准确,不包括bbCode,但包括bbCode内的文本(例如:包括[b]bbCode[/b]的文本bbCode)。

我怎样才能做到这一点?

使用preg_replace从字符串中解析BBCode实际上相对容易,特别是在像PHP这样支持PCRE库的语言中。假设您的BBCode语法有一些问题,下面是最短的方法:

preg_replace('@'[(?:'w+(?:="(?>.*?"))?(?: 'w+="(?>.*?"))*|/'w+)]@s', '', $content);

Regex101的演示

或者使用结束标签和嵌套的更好的方法:

function parse($str) {
    return preg_replace_callback('@'[('w+)(?:="(?>.*?"))?(?: 'w+="(?>.*?"))*](?:(.*?)'[/'1])?@s',
        function($matches) { return $matches[2] ? parse($matches[2]) : ''; },
        $str
    );
}

Demo on Ideone