为什么下面的代码对于不同的多bye字符串表现不同?
echo preg_replace('@(?='pL)@u', '*', 'م'); // prints: '*م' ✓
echo preg_replace('@(?='pL)@u', '*', 'ض'); // prints: '*ض' ✓
echo preg_replace('@(?='pL)@u', '*', 'غ'); // prints: '*�*�' ✗
echo preg_replace('@(?='pL)@u', '*', 'ص'); // prints: '*�*�' ✗
请参阅:http://3v4l.org/fvab1
您还需要包含修饰字母(Lm
)。请参阅以下在整个阿拉伯语unicode块上迭代的脚本:
<?php
function uchar_2($dec)
{
$utf = chr(192 + (($dec - ($dec % 64)) / 64));
$utf .= chr(128 + ($dec % 64));
return $utf;
}
$issues = 0;
$count = 0;
for ($dec = 1536; $dec <= 1791; $dec++) {
$char = uchar_2($dec);
if (preg_replace('@^(?='pLm)$@u', '*', $char) !== $char) {
printf("Issue with %s (%s)'n", $dec, $char);
$issues++;
}
$count++;
}
printf("Found %d issues in %d rows'n", $issues, $count);
如果没有Lm
,这将在大约一半的字符中失败。