preg_replace with :alnum: and UTF-8

本文关键字：alnum UTF-8 and with replace preg | 更新日期: 2023-09-27

我发现使用u修饰符有时在处理UTF-8字符串时是有帮助的，但在我的Linux服务器上，它用-代替了变音符，而不是像我的Windows服务器上一样。

mb_internal_encoding('UTF-8');
function clean($string) {
    return preg_replace('/[^[:alnum:]]/ui', '-', $string);
}
echo clean("Test: föG");

Linux:Test--f-G

Windows(正如它应该的):Test--föG

来自PCRE模块的PHP文档:

在UTF-8模式下，大于128的字符不匹配任何POSIX字符类。

这可能是出于效率的原因:有许多 Unicode字符。您可以使用Unicode字符属性而不是POSIX字符类来编写正则表达式。这将稍微慢一些。

<?php
mb_internal_encoding('UTF-8');
function clean($string) {
        return preg_replace('/[^''p{L}''p{N}]/ui', '-', $string);
}
echo clean("Test: föG");