为什么这个测试用例不起作用?
<?php
// cards with cyrillic inidices and suits in UTF-8 encoding
$a = array('7♠', 'Д♠', 'К♠', '8♦', 'В♦', 'Д♦', '10♣', '10♥', 'В♥', 'Т♥');
foreach ($a as $card) {
$suit = substr($card, -1);
$card = preg_replace('/('d+)♥/', '<span class="red">$1♥</span>', $card);
$card = preg_replace('/('d+)♦/', '<span class="red">$1♦</span>', $card);
$card = preg_replace('/('d+)♠/', '<span class="black">$1♠</span>', $card);
$card = preg_replace('/('d+)♣/', '<span class="black">$1♣</span>', $card);
printf("suit: %s, html: %s'n", $suit, $card);
}
?>
输出:
suit: ▒, html: <span class="black">7♠</span>
suit: ▒, html: Д♠
suit: ▒, html: К♠
suit: ▒, html: <span class="red">8♦</span>
suit: ▒, html: В♦
suit: ▒, html: Д♦
suit: ▒, html: <span class="black">10♣</span>
suit: ▒, html: <span class="red">10♥</span>
suit: ▒, html: В♥
suit: ▒, html: Т♥
即我在 PHP 脚本中遇到 2 个问题:
- 为什么没有正确提取最后一个 UTF-8 字符?
- 为什么只有第一套西装被
preg_replace
取代?
使用 PHP 5.3.3,PostgreSQL 8.4.12 在 CentOS 6.2 上持有 UTF-8 JSON(带有俄语文本和卡片花色)。
如果 1. 是 PHP 5.3.3 中的错误,那么有没有好的解决方法?(我不想升级库存包)。
更新:
<?php
$a = array('7♠', 'Д♠', 'К♠', '8♦', 'В♦', 'Д♦', '10♣', '10♥', 'В♥', 'Т♥');
foreach ($a as $card) {
$suit = mb_substr($card, -1, 1, 'UTF-8');
$card = preg_replace('/('d+)♥/u', '<span class="red">$1♥</span>', $card);
$card = preg_replace('/('d+)♦/u', '<span class="red">$1♦</span>', $card);
$card = preg_replace('/('d+)♠/u', '<span class="black">$1♠</span>', $card);
$card = preg_replace('/('d+)♣/u', '<span class="black">$1♣</span>', $card);
printf("suit: %s, html: %s'n", $suit, $card);
}
?>
新输出:
suit: ♠, html: <span class="black">7♠</span>
suit: ♠, html: Д♠
suit: ♠, html: К♠
suit: ♦, html: <span class="red">8♦</span>
suit: ♦, html: В♦
suit: ♦, html: Д♦
suit: ♣, html: <span class="black">10♣</span>
suit: ♥, html: <span class="red">10♥</span>
suit: ♥, html: В♥
substr
是朴素的PHP核心函数之一,它假设1字节= 1个字符。 substr(..., -1)
从字符串中提取最后一个字节。"♠" 虽然长于一个字节。您应该改用mb_substr($card, -1, 1, 'UTF-8')
。
您需要将 u
(PCRE_UTF8) 修饰符添加到正则表达式中,以使其正确处理 UTF-8 编码的表达式和字符串:
preg_replace('/('d+)♥/u', ...