这就是我拥有的
$str = 'Just a <span class="green">little</span> -text åäö width 123#';
这就是我需要的
跨距和空格中的结果也可能是换行符。
$result = '<span></span><span></span><span></span><span></span> <span></span> <span class="green"><span></span><span></span><span></span><span></span><span></span><span></span></span> <span></span><span></span><span></span><span></span><span></span> <span></span><span></span><span></span> <span></span><span></span><span></span><span></span><span></span> <span></span><span></span><span></span>';
你可能想知道我需要这个干什么。我想构建一个任何字符都用块表示的东西。看起来有点像Windows XP上的碎片整理。
问题
- 将每个字符替换为
<span></span>
- 不要触摸字符串中已经存在的HTML跨度(可能很难?(。可以有多个HTML元素
- 不要触摸空格和换行符
- Regexp应该这么做吗?还是Xpath
到目前为止我做了什么
我发现了一些关于regexp的文章,但没有替换每个字符(摘录空间和换行符(
$result = preg_replace("/???/", "<span></span>", $str);
print_r($result);
您可以使用preg_replace_callback()
$str = 'Just a <span class="green">little</span> -text åäö width 123#';
function replacement($matches) {
if (strlen($matches[0]) == 1)
{
return "<span></span>";
}
else
{
return $matches[0];
}
}
$result = preg_replace_callback("~<span.*?<'s*/'s*span>|'S~", "replacement", $str);
print_r($result);
这只是根据匹配来计算替换字符串。如果匹配的长度为1(找到了一个非空白字符(,则替换为"span"标记,否则找到了span标记,请重新插入。
不需要破解的正则表达式解决方案。一个带有状态机的简单for循环应该做得很好:
define('STATE_READING', 1);
define('STATE_TAG', 2);
$str = 'Just a <span class="green">little</span> -text åäö width 123#';
$result = '';
$state = STATE_READING;
for($i = 0, $len = strlen($str); $i < $len; $i++) {
$chr = $str[$i];
if($chr == '<') {
$state = STATE_TAG;
$result .= $chr;
} else if($chr == '>') {
$state = STATE_READING;
$result .= $chr;
} else if($state == STATE_TAG || strlen(trim($chr)) === 0) {
$result .= $chr;
} else {
$result .= '<span></span>';
}
}
这个循环只是跟踪我们是否正在读取一个标签或一个字符。如果是标记(或空白(,请附加实际字符,否则请附加<span></span>
。
结果:
<span></span><span></span><span></span><span></span> <span></span> <span class="green"><span></span><span></span><span></span><span></span><span></span><span></span></span> <span></span><span></span><span></span><span></span><span></span> <span></span><span></span><span></span><span></span><span></span><span></span> <span></span><span></span><span></span><span></span><span></span> <span></span><span></span><span></span><span></span>
是否要求只使用一个正则表达式?
如果不是-您可以用一些唯一字符替换需要安全的子字符串,执行regexp替换,放入子字符串而不是该唯一字符。
就像这样:
$str2 = str_replace('<span class="green">little</span>', '$', $str);
$str3 = preg_replace("/([^'s'n'$])/", "<span></span>", $str2);
$result = str_replace('$', '<span class="green">little</span>', $str3);
查看现场演示http://codepad.viper-7.com/7wu9fd
UPD:
也许这只是一个暗示。我的建议是存储需要存储的子字符串,替换所有需要的内容,将存储的值放回字符串中。
$str = 'Just a <span class="green">little</span> -text åäö width 123#';
preg_match_all('/<[^>]+>/', $str, $matches);
$storage=array();
for($i=0, $n=count($matches[0]); $i<$n; $i++)
{
$key=str_repeat('$', $i+1);
$value=$matches[0][$i];
$storage[$key]=$value;
$str=str_replace($value, $key, $str);
}
$storage=array_reverse($storage);
$str = preg_replace("/([^'s'n'$])/", "<span></span>", $str);
foreach($storage as $k=>$v)
{
$str=str_replace($k, $v, $str);
}
echo htmlspecialchars($str);
工作演示在那里http://codepad.viper-7.com/L4YZOz
虽然这可能使用正则表达式,但我会使用循环。下面的示例代码适用于单字节字符集,但可以针对多字节(如UTF-16(或可变字节(如UTF-8(字符集进行修改。
$input = 'Just a <span class="green">little</span> -text åäö width 123#';
$output = '';
$length = strlen($input);
$i = 0;
$matches = array(); // preg_match variable
// While for finer control
while($i < $length) {
// Check for start of span tag, check for < character first for speed-up
if($input[$i] == "<" && preg_match("#<span[^>]*>.*</span>#siU", substr($input, $i), $matches) == 1) {
// Skip the span tag
$i = $i + strlen($matches[0]);
$output .= $matches[0];
} else {
$output .= "<span></span>";
$i++;
}
}
工作示例
下面是我使用preg_replace_callback((得出的结果:
$str = 'Just a <span class="green">little</span>-text åäö width 123#<span>aaa</span> lol';
// This requires PHP 5.3+
$output = preg_replace_callback('#.*?(<span[^>]*>.*?</span>)|.*#is', function($m){
if(!isset($m[1])){return preg_replace('/'S/', '<span></span>', $m[0]);}
$array = explode($m[1], $m[0]);
$array = preg_replace('/'S/', '<span></span>', $array);
return(implode($m[1], $array));
}, $str);
echo($output);
输出:
<span></span><span></span><span></span><span></span> <span></span> <span class="green">little</span><span></span><span></span><span></span><span></span><span></span> <span></span><span></span><span></span><span></span><span></span><span></span> <span></span><span></span><span></span><span></span><span></span> <span></span><span></span><span></span><span></span><span>aaa</span> <span></span><span></span><span></span>
有点破解,但试试这个:
$str="Just a <span class='"green'">little</span> -text åäö'n width 123#";
// get all span tags
if(preg_match_all("/('<span.*'<'/span'>)/", $str, $matches))
{
// replace spans with #
$str=preg_replace_all("/('<span.*'<'/span'>)/", "#", $str);
//print_r($matches);
}
// replace all non spaces, CR and #
$str=preg_replace("/[^'s'n#]/", "<span></span>", $str);
// replenish the matched spans
while(list($key,$value)=each($matches[0]))
{
$str=preg_replace('/#/', $value, $str, 1);
}
这不是一个简单的正则表达式方法。这是一个坚实、简洁、一行一函数的调用解决方案,它避免了在字符串中的每个字符上迭代一组条件,保留了标记,并关心多字节字符。
alexn的解决方案不保持CCD_ 4的可见字符长度。他的解决方案将在屏幕上打印6个打开和关闭跨度标签,而不是仅打印3个。这是因为没有使用mb_
函数。在本主题中,请注意本页中未使用前缀为mb_
的字符串函数的任何方法。
我建议的解决方案将利用(*SKIP)(*FAIL)
技术来忽略/取消所有遇到的标记,然后只匹配字符串中的非空白字符。
代码:(演示(
$str = 'Just a <span class="green">little</span> -text åäö width 123#';
var_export(preg_replace('/<[^>]*>(*SKIP)(*FAIL)|'S/','<span></span>',$str)); // no "u" flag means åäö will be span x6
echo "'n";
var_export(preg_replace('/<[^>]*>(*SKIP)(*FAIL)|'S/u','<span></span>',$str)); // "u" flag means åäö will be span x3
输出:(向右滚动查看unicode标志对模式的影响(
'<span></span><span></span><span></span><span></span> <span></span> <span class="green"><span></span><span></span><span></span><span></span><span></span><span></span></span> <span></span><span></span><span></span><span></span><span></span> <span></span><span></span><span></span><span></span><span></span><span></span> <span></span><span></span><span></span><span></span><span></span> <span></span><span></span><span></span><span></span>'
// notice the number of replacements for åäö ->-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------111111111111122222222222223333333333333444444444444455555555555556666666666666
'<span></span><span></span><span></span><span></span> <span></span> <span class="green"><span></span><span></span><span></span><span></span><span></span><span></span></span> <span></span><span></span><span></span><span></span><span></span> <span></span><span></span><span></span> <span></span><span></span><span></span><span></span><span></span> <span></span><span></span><span></span><span></span>'
// notice the number of replacements for åäö ->-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------111111111111122222222222223333333333333