在分号周围添加空格,除非它是HTML实体的一部分


Add spaces around semi-colon, unless it’s part of an HTML entity

我想preg_replace所有的;不是HTML实体的一部分,如果没有,在前面加空格,如果没有,在后面加空格。

。:此处选择标记<;在前面加空格,;标记>在后面加空格,;标记=在周围加空格。(标记的x必须忽略)

Hello; Hello ;Hello ; Hello;Hello &egrave;Hello &egrave; Hello &egrave;;Hello&egrave; Hello &#45;Hello &#45;; Hello&#45; Hello
     <       >      x      =             x             x              x=            x           x          x<          x      

变成了

Hello ; Hello ; Hello ; Hello ; Hello &egrave;Hello &egrave; Hello &egrave; ; Hello&egrave; Hello &#45;Hello &#45; ; Hello&#45; Hello

尝试使用this(参见注释):在分号后插入空格,除非它是HTML实体的一部分,但工作方式不同。

测试:https://regex101.com/r/xV4zA2/1

谢谢!

[略作更改,以考虑某些边缘情况]

除非有一些神秘的边缘情况没有解决,我认为应该这样做:

preg_replace("/('G|'b(?<!&|&#)'w+|['W_]) ?; ?/", "$1 ; ", $input_lines);

它转换

Hello; Hello ;Hello ; Hello;Hello &egrave;Hello &egrave; Hello &egrave;;Hello&egrave; Hello &#45;Hello &#45;; Hello&#45; Hello

Hello ; Hello ; Hello ; Hello ; Hello &egrave;Hello &egrave; Hello &egrave; ; Hello&egrave; Hello &#45;Hello &#45; ; Hello&#45; Hello

…现在也处理某些边缘情况…

分解后,正则表达式如下:

(               # begin capture group #1 and match:
    'G          #     [assert at beginning of match]
|               # OR match:
    'b          #     [assert a word boundary]
    (?<!        #     look behind (the word boundary) and assert that there is not:
        &|&#    #         an ampersand or an ampersand and a pound sign
    )           #     end look-behind assertion
    'w+         #     one or more of any word character ([0-9a-zA-Z_])
|               # OR match:
    ['W_]       #     a non-word character or underscore
)               # end capture group #1
 ?              # optional single space
;               # semicolon
 ?              # optional single space character

然后将其替换为捕获组#1 ($1)中的内容,一个空格,一个分号和一个空格:

$1 ; 

我想这就是你要找的

查看此模式(?<=o|'s|;);

查看这里的演示https://regex101.com/r/uJ0vD4/13