处理文本:
This is some text
which I am working on.
This text has whitespace before the new line but after this word
Another line.
我正在使用preg_split来拆分 unicode 空格和除换行符以外的所有特殊字符,如下所示:
preg_split("/'p{Z}|[^'S'n]/u", $data, -1, PREG_SPLIT_OFFSET_CAPTURE);
该标志是因为我绝对需要保留字符串的位置。
我想让preg_split保留换行符及其前面的单词。例如,换行符可以出现在下一个单词的开头,甚至可以单独出现。
正常工作时的预期输出:
This
is
some
text'n
which
I
am
working
on.'n
This
text
has
whitespace
before
the
new
line
but
after
this
word'n
Another
line.
谁能解释一下如何做到这一点?谢谢
使用后视来匹配换行符后面存在的边界。
<?php
$str = <<<EOT
This is some text
which I am working on.
This text has whitespace before the new line but after this word
Another line.
EOT;
$splits = preg_split("~(?<='n)|'p{Z}+(?!'n)~", $str);
print_r($splits);
?>
输出:
Array
(
[0] => This
[1] => is
[2] => some
[3] => text
[4] => which
[5] => I
[6] => am
[7] => working
[8] => on.
[9] => This
[10] => text
[11] => has
[12] => whitespace
[13] => before
[14] => the
[15] => new
[16] => line
[17] => but
[18] => after
[19] => this
[20] => word
[21] => Another
[22] => line.
)