如何在Php中将多字节字符串分割成单词


How to split a multibyte string into words in Php?

如何将多字节字符串拆分为Php中的单词?这是我到目前为止所做的,但我想改进代码…

   mb_internal_encoding( 'UTF-8');
   mb_regex_encoding( 'UTF-8');
   $arr = mb_split( '['s'[']().,;:-_]', $str );

是否有办法说明单词是一个"alpha"字符序列(不使用a-z符号,因为我想包括非拉丁字符)

试试这个宝贝:

preg_match_all('/['p{L}'p{M}]+/u', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
    # Matched text = $result[0][$i];
}

匹配所有可能的字母及其重音作为单词:

     "
['p{L}'p{M}]       # Match a single character present in the list below
                   # A character with the Unicode property “letter” (any kind of letter from any language)
                   # A character with the Unicode property “mark” (a character intended to be combined with another character (e.g. accents, umlauts, enclosing boxes, etc.))
   +               # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"

许多语言不使用单词(中文)。在这种情况下,函数应该返回整个字符串吗?在PHP中,explosion()是二进制安全的,所以如果你只需要一个分隔符,使用它可能会更快。

也许您应该使用'w ?