Preg按字符和每条换行符进行拆分


Preg split by character and each newline

我正在使用preg_split,带有"u"修饰符来拆分到php中的字符。我有一个问题,新行车厢没有在一个条目中拆分,所以使用这行:

preg_split('//u',"a töxt'n{{image}}", -1,PREG_SPLIT_NO_EMPTY);

例如,我得到以下结果:

   Array (
    [0] => a
    [1] =>  
    [2] => t
    [3] => ö
    [4] => x
    [5] => t
    [6] =>  { //this line is orginally wrapped and not a space
    [7] => {
    [8] => i
    [9] => m
    [10] => a
    [11] => g
    [12] => e
    [13] => }
    [14] => } )

如果我在之前对字符串进行编码以检查有效字符,我会得到:

Array
(
    [data] => töxt
{{image}}
    [chars] => {t}{�}{�}{x}{t}{
}{{}{{}{i}{m}{a}{g}{e}{}}{}}
    [hex] => {74}{C3}{B6}{78}{74}{0A}{7B}{7B}{69}{6D}{61}{67}{65}{7D}{7D}
    [mb_chars] => {t}{ö}{x}{t}{
}{{}{{}{i}{m}{a}{g}{e}{}}{}}
    [mb_hex] => {74}{F6}{78}{74}{0A}{7B}{7B}{69}{6D}{61}{67}{65}{7D}{7D}
)

因此,任何如何实现结果的想法。。这不仅是回车,而且实际上是最重要的。。

还需要处理多字节字符

使用str_split function将字符串拆分为字符数组:

$str = "A'nBC";
$chrArray = str_split($str);
print_r($chrArray);

选项2:

preg_match_all('/./u', "a töxt'n{{image}}", $m);

输出

Array
(
    [0] => A
    [1] => 
    [2] => B
    [3] => C
)

更新:在PHP 5.2.5中尝试后,我得到了这个

Warning: preg_split(): Compilation failed: this version of PCRE is not compiled with PCRE_UTF8 support at offset 0 on line 4

我相信您需要使用另一种方法将unicode字符串分解为字符数组。

现在我为自己找到了一个解决问题的方案:

$arr_content = preg_split("/(.|''''n)/u",$html_cont, -1,PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

感谢任何帮助找到麻烦的人;)