多字节安全计数字符串中的不同字符


Multibyte-safe count distinct characters in a string

我想找到一种聪明而有效的方法来计算一个字符串中有多少不同的alpha字符。例子:

$str = "APPLE";
echo char_count($str) // should return 4, because APPLE has 4 different chars 'A', 'P', 'L' and 'E'
$str = "BOB AND BOB"; // should return 5 ('B', 'O', 'A', 'N', 'D'). 
$str = 'PLÁTANO'; // should return 7 ('P', 'L', 'Á', 'T', 'A', 'N', 'O')

应该支持UTF-8字符串!

如果你正在处理UTF-8(你真的应该考虑,imho)没有一个发布的解决方案(使用strlen, str_split或count_chars)将工作,因为它们都将一个字节视为一个字符(这对于UTF-8显然不是真的)。

<?php
$treat_spaces_as_chars = true;
// contains hälöwrd and a space, being 8 distinct characters (7 without the space)
$string = "hällö wörld"; 
// remove spaces if we don't want to count them
if (!$treat_spaces_as_chars) {
  $string = preg_replace('/'s+/u', '', $string);
}
// split into characters (not bytes, like explode() or str_split() would)
$characters = preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY);
// throw out the duplicates
$unique_characters = array_unique($characters);
// count what's left
$numer_of_characters = count($unique_characters);

如果你想去掉所有非单词字符:

<?php
$ignore_non_word_characters = true;
// contains hälöwrd and PIE, as this is treated as a word character (Greek)
$string = "h,ä*+l•π‘°’lö wörld"; 
// remove spaces if we don't want to count them
if ($ignore_non_word_characters) {
  $string = preg_replace('/'W+/u', '', $string);
}
// split into characters (not bytes, like explode() or str_split() would)
$characters = preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY);
// throw out the duplicates
$unique_characters = array_unique($characters);
// count what's left
$numer_of_characters = count($unique_characters);
var_dump($characters, $unique_characters, $numer_of_characters);

使用count_chars:

echo count(array_filter(count_chars($str)));

count_chars()返回的数组还将告诉您每个字符在字符串中有多少个。

count_chars返回所有ASCII字符的映射,告诉您每个字符在字符串中有多少个。这是您自己实现的一个起点。

function countchars($str, $ignoreSpaces) {
  $map = array();
  $len = strlen($str);
  for ($i=0; $i < $len; $i++) {
    if (!isset($map[$str{$i}])) {
      $map[$str{$i}] = 1;
    } else {
      $map[$str{$i}]++;
    }    
  }
  if ($ignoreSpaces) {
    unset($map[' ']);
  }
  return $map;
}
print_r(countchars('Hello World'));

下面是一个函数,它使用了关联数组的魔力。在线性时间内有效。(大O = log(n))

function uniques($string){
   $arr = array();
   $parts = str_split($string);
   foreach($parts as $part)
      $arr["$part"] = "yup";
   return count($arr);
}
$str = "APPLE";
echo uniques($str);  // outputs 4

我的看法,

$chars = array_count_values(str_split($input));

这将为您提供一个唯一字母的关联数组作为键,并将出现的次数作为值。

如果您对出现的次数不感兴趣,

$chars = array_unique(str_split($input));
$numChars = count($chars);