无效的HTML引用属性


Invalid HTML - Quoting Attributes

我有以下HTML:

<td width=140 style='width:105.0pt;padding:0cm 0cm 0cm 0cm'>
    <p class=MsoNormal><span style='font-size:9.0pt;font-family:"Arial","sans-serif";
       mso-fareast-font-family:"Times New Roman";color:#666666'>OCCUPANCY
       TAX:</span></p>
</td>

一些HTML属性没有被引用,例如:width=140和class=MsoNormal

有没有PHP函数可以处理这类事情,如果没有,在HTML中清除这类问题的聪明方法是什么?

谢谢。

我想你可以使用regexp来实现这一点:

/'s(['w]{1,}=)((?!")['w]{1,}(?!"))/g

's match any white space character ['r'n't'f ]
1st Capturing group (['w]{1,}=)
    ['w]{1,} match a single character present in the list below
        Quantifier: {1,} Between 1 and unlimited times, as many times as possible, giving back as needed [greedy]
    'w match any word character [a-zA-Z0-9_]
    = matches the character = literally
2nd Capturing group ((?!")['w]{1,}(?!"))
    (?!") Negative Lookahead - Assert that it is impossible to match the regex below
    " matches the characters " literally
    ['w]{1,} match a single character present in the list below
        Quantifier: {1,} Between 1 and unlimited times, as many times as possible, giving back as needed [greedy]
    'w match any word character [a-zA-Z0-9_]
    (?!") Negative Lookahead - Assert that it is impossible to match the regex below
    " matches the characters " literally
g modifier: global. All matches (don't return on first match)

实现方式如下:

echo preg_replace_callback('/'s(['w]{1,}=)((?!")['w]{1,}(?!"))/', function($matches){
    return ' '.$matches[1].'"'.$matches[2].'"';
}, $str);

并将导致:

 <td width="140" style='width:105.0pt;padding:0cm 0cm 0cm 0cm'>
   <p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Arial","sans-serif";
     mso-fareast-font-family:"Times New Roman";color:#666666'>OCCUPANCY
      TAX:</span></p>
 </td>

Eval.in实例

注意,这是一个肮脏的例子,肯定可以清理掉。