PHP:UTF-8多字节字符串拆分器


PHP: UTF-8 multi-byte string splitter?

这能用于每十个字符拆分一个多字节字符串吗?

$string = 'Star Wars Episode Seven Sucked';    
mb_split('.', $string, 10);

PHP手册说str_split()处理的是字节,而不是多字节字符串中的字符。这意味着mb_split()似乎是一个自然的"过载"替代,但这两个函数(str_split()mb_split())具有不同的函数签名,可以说不是"过载伙伴"。然后,我想了想,这个怎么样?

mb_internal_encoding("UTF-8");
$string = 'Star Wars Episode Seven Sucked';  
$tokens = [];
for($i = 0, $length = mb_strlen($string); $i < $length; $i += 10)
{
    $tokens[] = mb_substr($string, $i, 10, 'UTF-8');
}
print_r($tokens);
function mb_utf8_split($string, $interval)
{
     $tokens = [];
     mb_internal_encoding('UTF-8');
     $stringEncoding = mb_detect_encoding($string, 'UTF-8, ISO-8859-1', true);
     if(!$stringEncoding)
     {
          throw new RuntimeException("Unable to identify character encoding.");
     }
     if($stringEncoding !== 'UTF-8')
     {
          $string = mb_convert_encoding($string, 'UTF-8', $stringEncoding);
     }
     for($i = 0, $length = mb_strlen($string); $i < $length; $i += $interval)
     {
         $tokens[] = mb_substr($string, $i, $interval, 'UTF-8');
     }
     return $tokens;
}