如何在PHP中剥离html标记类和id,只保留form和it元素


How to strip html tags classes and ids and preserve only form and it's elements with PHP?

我一直在寻找能够让我可靠地将HTML代码块剥离为裸表单及其元素的东西。我需要一个解决方案来删除所有非表单元素,包括所有内容,类和id。我可以使用JavaScript或PHP。

有人能给我指出正确的方向和/或提供一些小的示例代码和建议让我开始吗?

给你一些背景……

在大多数情况下,自动应答器服务提供商提供各种类型的可嵌入代码。出于我永远无法理解的原因,他们从来没有提供一个干净的表单代码…周围总是有一些丑陋的垃圾,然后必须清理。

下面是一个来自响应器服务的可嵌入代码示例

<style>
  ._form {
  position:relative;
  background:#fff;
  width:400px;/*F*/
  padding:0!important;
  text-align:left;
  }
  ._form em {
  color:#9a9a9a;
  }
  ._form a {
  margin-left:3px;
  }
  ._form ._field,
  ._form ._field ._label,
  ._form ._type_radio,
  ._form ._type_checkbox,
  ._form ._type_captcha,
  ._form ._field table {
  background:none;
  }
  ._form ._field  {
  position:relative;
  width:100%;
  cursor:move;
  font-style:normal;
  margin:1.2em 0;
  padding:0;
  overflow:hidden;
  }
  ._form ._field input[type="text"] {
  width:100%;
  padding:8px;
  font-size:16px;
  border:1px solid #b6b6b6;
  -webkit-box-sizing: border-box;
  -moz-box-sizing: border-box;
  box-sizing: border-box;
  }
  ._form ._field ._label {
  display:block;
  margin:0 0 0.5em;
  padding:0!important;
  font-size:15px;
  }
  ._form ._field ._option input[type="checkbox"],
  ._form ._field ._option input[type="radio"] {
  position:relative;
  width:13px;
  height:13px;
  margin:-4px 0 0 1px;
  cursor:pointer;
  vertical-align:middle;
  }
  ._form ._field ._option input[type="submit"],
  ._form ._field ._option input[type="button"] {
  margin:0;
  cursor:pointer;
  height:35px;
  width:auto;
  font-size:15px;
  }
  ._form ._field ._option select {
  display:block;
  margin:0;
  padding:0;
  width:auto;
  font-size:15px;
  border:1px solid #b6b6b6;
  }
  ._form ._type_radio ._option,
  ._form ._type_checkbox ._option {
  font-size:13px;
  font-weight:normal;
  line-height:1.8;
  }
  ._form ._type_date ._option input[type="text"] {
  float:left;
  width:100px;
  -webkit-box-sizing: border-box;
  -moz-box-sizing: border-box;
  box-sizing: border-box;
  }
  ._form ._type_date ._option input[type="button"] {
  width:37px;
  height:36px;
  margin-left:5px;
  padding:20px;
  border:none;
  outline:none;
  text-indent:-9999px;
  }
  ._form ._type_captcha img {
  float:left;
  margin:0 6px 0 0;
  width:70px;
  height:33px;
  border:1px solid #b6b6b6;
  }
  ._form ._type_captcha input[type="text"] {
  margin:-14px 0 0 0!important;
  width:25%;
  -webkit-box-sizing: border-box;
  -moz-box-sizing: border-box;
  box-sizing: border-box;
  }
  ._form ._field table  {
  width:100%!important;
  }
  ._form ._field table tbody tr td  {
  width:50%!important;
  font-size:15px;
  }
  ._form {
  width:265px;/*F*/
  background:#fff;
  color:#2c2c2c;
  font-weight:normal;
  }
  ._form #notice {
  margin:10px 0 0 -3px!important;
  padding:0;
  color:#acacac;
  font-size:11px;
  font-family:helvetica,arial,sans-serif;
  }
  ._form #notice a:link, ._form #notice a:visited {
  color:#acacac;
  text-decoration:underline;
  }
  ._form ._field  {
  position:relative;
  width:100%;
  cursor:default;
  font-style:normal;
  margin:0 0 16px;
  padding:0;
  overflow:hidden;
  }
  ._form ._field input[type="text"],
  ._form ._field input[type="email"] {
  width:100%;
  padding:4px;
  font-size:14px;
  background:#fafafa;
  border:1px solid #c7c7c7;
  border-top:1px solid #b6b6b6;
  -webkit-border-radius:3px;
  -moz-border-radius:3px;
  border-radius:3px;
  -webkit-box-sizing: border-box;
  -moz-box-sizing: border-box;
  box-sizing: border-box;
  }
  ._form ._field ._label {
  margin:0 0 4px;
  color:#2c2c2c;
  font-size:13px;
  font-family:helvetica,arial,sans-serif;
  font-weight:700;
  }
  ._form ._field ._option {
  margin:0;
  padding:0;
  color:#2c2c2c;
  font-size:13px;
  font-family:helvetica,arial,sans-serif;
  font-weight:normal;
  line-height:20px;
  }
  ._form ._type_header ._label {
  width:100%;
  font-style:normal;
  font-size:16px!important;
  line-height:20px;
  color:#005698;
  margin:0 0 5px!important;
  padding:0 0 10px!important;
  overflow:hidden;
  border-bottom:1px solid #e0e0e0;
  }
  ._form ._type_input ._option  textarea{
  width:97%!important;
  background:#fafafa;
  border:1px solid #c7c7c7;
  border-top:1px solid #b6b6b6;
  -webkit-border-radius:3px;
  -moz-border-radius:3px;
  border-radius:3px;
  }
  ._form ._type_input ._option input[type="submit"],
  ._form ._type_input ._option input[type="button"] {
  width:auto;
  margin:10px 0 0!important;
  padding:2px 15px!important;
  cursor:pointer;
  font-family:verdana,arial,sans-serif;
  font-weight:700;
  font-size:12px;
  color:#3f3f3f;
  background:#f7f7f7;
  border:1px solid #999999;
  border-bottom:1px solid #888888;
  text-align:center;
  }
  ._form ._type_input ._option input[type="submit"]:hover,
  ._form ._type_input ._option input[type="button"]:hover {
  border:1px solid #afafaf;
  border-bottom:1px solid #a5a5a5;
  background:#f7f7f7;
  color:#525252;
  }
  ._form ._type_date ._option input[type="text"] {
  float:left;
  width:100px;
  }
  ._form ._type_radio ._option label {
  display:inline;
  font-size:14px;
  font-weight:normal;
  line-height:18px;
  }
  ._form ._type_radio ._option label input[type="radio"] {
  position:relative;
  width:13px;
  height:13px;
  margin:-4px 0 0 1px;
  cursor:pointer;
  vertical-align:middle;
  line-height:20px;
  }
  ._form ._type_date ._option input[type="button"] {
  width:24px;
  height:24px;
  margin:2px 0 0 5px;
  padding:0;
  border:none;
  outline:none;
  text-indent:-9999px;
  }
  ._form ._field ._option select {
  display:block;
  margin:0;
  padding:0;
  width:auto;
  font-size:14px;
  border:1px solid #b6b6b6;
  }
  ._form ._type_captcha img {
  float:left;
  width:42px;
  height:24px;
  margin:0 6px 0 0;
  border:1px solid #b6b6b6;
  }
  ._form ._type_captcha input[type="text"] {
  float:left;
  margin:0!important;
  width:40%;
  font-size:14px;
  -webkit-box-sizing: border-box;
  -moz-box-sizing: border-box;
  box-sizing: border-box;
  }
  ._form ._field table {
  margin:0;
  padding:0;
  border-collapse:collapse;
  width:100%!important;
  table-layout:fixed;
  margin-bottom:18px;
  font-size:13px!important;
  border-collapse:collapse;
  border-spacing:0;
  }
  ._form ._field table td {
  padding:0 10px 0 0!important;
  line-height:18px;
  text-align:left;
  font-size:13px!important;
  color:#606060;
  }
  ._form ._type_input ._option  table tbody#_forward_rcpt input {margin:0 0 4px 0; width:96%!important;}
  ._form ._type_input ._option  table tbody#_forward_rcpt img.image_addrcpt {cursor:pointer;}
  .form_errors{
  text-align:center;
  font-size:15px;
  margin:10px;
  color:#900;
  font-family:Arial, Helvetica, sans-serif;
  font-weight:bold;
  margin-bottom:20px;
  }
</style>
<form action='//something.com/proc.php' method='post' id='_form_37' accept-charset='utf-8' enctype='multipart/form-data'>
  <input type='hidden' name='f' value='37'>
  <input type='hidden' name='s' value=''>
  <input type='hidden' name='c' value='0'>
  <input type='hidden' name='m' value='0'>
  <input type='hidden' name='act' value='sub'>
  <input type='hidden' name='nlbox[]' value='6'>
  <div class='_form'>
    <div class='formwrapper'>
      <div id='_field284'>
        <div id='compile284' class='_field _type_input'>
          <div class='_label '>
            First Name
          </div>
          <div class='_option'>
            <input type='text' name='field[6]' value=''>
          </div>
        </div>
      </div>
      <div id='_field272'>
        <div id='compile272' class='_field _type_input'>
          <div class='_label '>
            Email *
          </div>
          <div class='_option'>
            <input type='email' name='email' >
          </div>
        </div>
      </div>
      <div id='_field273'>
        <div id='compile273' class='_field _type_input'>
          <div class='_option'>
            <input type='submit' value="Subscribe">
          </div>
        </div>
      </div>
      <div id='_field280'>
        <div id='compile280' class='_field _type_hidden'>
          <div class='_option'>
            <input type='hidden' name='field[4]' value=''>
          </div>
        </div>
      </div>
      <div id='_field281'>
        <div id='compile281' class='_field _type_hidden'>
          <div class='_option'>
            <input type='hidden' name='field[5]' value=''>
          </div>
        </div>
      </div>
      <div id='_field282'>
        <div id='compile282' class='_field _type_hidden'>
          <div class='_option'>
            <input type='hidden' name='field[3]' value=''>
          </div>
        </div>
      </div>
    </div>
  </div>
</form>

这是我想要的,而不需要手动清理:

<form action='//something.com/proc.php' method='post' accept-charset='utf-8' enctype='multipart/form-data'>
    <input type='hidden' name='f' value='37'>
    <input type='hidden' name='s' value=''>
    <input type='hidden' name='c' value='0'>
    <input type='hidden' name='m' value='0'>
    <input type='hidden' name='act' value='sub'>
    <input type='hidden' name='nlbox[]' value='6'>
    First Name
    <input type='text' name='field[6]' value=''>
    Email *
    <input type='email' name='email' >
    <input type='submit' value="Subscribe">
    <input type='hidden' name='field[4]' value=''>
    <input type='hidden' name='field[5]' value=''>
    <input type='hidden' name='field[3]' value=''>
</form>

一个简单的条带标签的使用似乎是ok的,但它不删除css从标签

我添加了示例可嵌入代码到string.txt

$file = file_get_contents('string.txt', true);
echo '<textarea  rows="50" cols="50">' . $file . '</textarea>';
$file = strip_tags($file, '<form><input>');
echo '<textarea  rows="50" cols="80">' . $file . '</textarea>';

最后我在这方面取得了一些进展,但它是不完美的…ID仍然在表单元素上,我预见到更多的问题

$file = file_get_contents('string.txt', true);
function strip_html_tags( $text )
{
$text = preg_replace(
    array(
        // Remove invisible content
        '@<head[^>]*?>.*?</head>@siu',
        '@<style[^>]*?>.*?</style>@siu',
        '@<script[^>]*?.*?</script>@siu',
        '@<object[^>]*?.*?</object>@siu',
        '@<embed[^>]*?.*?</embed>@siu',
        '@<applet[^>]*?.*?</applet>@siu',
        '@<noframes[^>]*?.*?</noframes>@siu',
        '@<noscript[^>]*?.*?</noscript>@siu',
        '@<noembed[^>]*?.*?</noembed>@siu',
    ),
    array(
        ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',
        "'n'$0", "'n'$0", "'n'$0", "'n'$0", "'n'$0", "'n'$0",
        "'n'$0", "'n'$0",
    ),
    $text );
return strip_tags( $text, '<form><input>' );
}
$newText = strip_html_tags($file);
echo '<textarea  rows="50" cols="80">' . $newText . '</textarea>';

所以你希望输出像

<form action="//something.com/proc.php" method="post" id="_form_37" accept-charset="utf-8" enctype="multipart/form-data">
  <input type="hidden" name="f" value="37"><input type="hidden" name="s" value=""><input type="hidden" name="c" value="0"><input type="hidden" name="m" value="0"><input type="hidden" name="act" value="sub"><input type="hidden" name="nlbox[]" value="6">
            First Name
            <input type="text" name="field[6]" value="">
            Email *
            <input type="email" name="email">
            <input type="submit" value="Subscribe">
            <input type="hidden" name="field[4]" value="">
            <input type="hidden" name="field[5]" value="">
            <input type="hidden" name="field[3]" value="">
</form>

如果这是正确的,我想这就可以了:

$string = '<style>
  ._form {
  position:relative;
  background:#fff;
  width:400px;/*F*/
  padding:0!important;
  text-align:left;
  }
  ._form em {
  color:#9a9a9a;
  }
  ._form a {
  margin-left:3px;
  }
  ._form ._field,
  ._form ._field ._label,
  ._form ._type_radio,
  ._form ._type_checkbox,
  ._form ._type_captcha,
  ._form ._field table {
  background:none;
  }
  ._form ._field  {
  position:relative;
  width:100%;
  cursor:move;
  font-style:normal;
  margin:1.2em 0;
  padding:0;
  overflow:hidden;
  }
  ._form ._field input[type="text"] {
  width:100%;
  padding:8px;
  font-size:16px;
  border:1px solid #b6b6b6;
  -webkit-box-sizing: border-box;
  -moz-box-sizing: border-box;
  box-sizing: border-box;
  }
  ._form ._field ._label {
  display:block;
  margin:0 0 0.5em;
  padding:0!important;
  font-size:15px;
  }
  ._form ._field ._option input[type="checkbox"],
  ._form ._field ._option input[type="radio"] {
  position:relative;
  width:13px;
  height:13px;
  margin:-4px 0 0 1px;
  cursor:pointer;
  vertical-align:middle;
  }
  ._form ._field ._option input[type="submit"],
  ._form ._field ._option input[type="button"] {
  margin:0;
  cursor:pointer;
  height:35px;
  width:auto;
  font-size:15px;
  }
  ._form ._field ._option select {
  display:block;
  margin:0;
  padding:0;
  width:auto;
  font-size:15px;
  border:1px solid #b6b6b6;
  }
  ._form ._type_radio ._option,
  ._form ._type_checkbox ._option {
  font-size:13px;
  font-weight:normal;
  line-height:1.8;
  }
  ._form ._type_date ._option input[type="text"] {
  float:left;
  width:100px;
  -webkit-box-sizing: border-box;
  -moz-box-sizing: border-box;
  box-sizing: border-box;
  }
  ._form ._type_date ._option input[type="button"] {
  width:37px;
  height:36px;
  margin-left:5px;
  padding:20px;
  border:none;
  outline:none;
  text-indent:-9999px;
  }
  ._form ._type_captcha img {
  float:left;
  margin:0 6px 0 0;
  width:70px;
  height:33px;
  border:1px solid #b6b6b6;
  }
  ._form ._type_captcha input[type="text"] {
  margin:-14px 0 0 0!important;
  width:25%;
  -webkit-box-sizing: border-box;
  -moz-box-sizing: border-box;
  box-sizing: border-box;
  }
  ._form ._field table  {
  width:100%!important;
  }
  ._form ._field table tbody tr td  {
  width:50%!important;
  font-size:15px;
  }
  ._form {
  width:265px;/*F*/
  background:#fff;
  color:#2c2c2c;
  font-weight:normal;
  }
  ._form #notice {
  margin:10px 0 0 -3px!important;
  padding:0;
  color:#acacac;
  font-size:11px;
  font-family:helvetica,arial,sans-serif;
  }
  ._form #notice a:link, ._form #notice a:visited {
  color:#acacac;
  text-decoration:underline;
  }
  ._form ._field  {
  position:relative;
  width:100%;
  cursor:default;
  font-style:normal;
  margin:0 0 16px;
  padding:0;
  overflow:hidden;
  }
  ._form ._field input[type="text"],
  ._form ._field input[type="email"] {
  width:100%;
  padding:4px;
  font-size:14px;
  background:#fafafa;
  border:1px solid #c7c7c7;
  border-top:1px solid #b6b6b6;
  -webkit-border-radius:3px;
  -moz-border-radius:3px;
  border-radius:3px;
  -webkit-box-sizing: border-box;
  -moz-box-sizing: border-box;
  box-sizing: border-box;
  }
  ._form ._field ._label {
  margin:0 0 4px;
  color:#2c2c2c;
  font-size:13px;
  font-family:helvetica,arial,sans-serif;
  font-weight:700;
  }
  ._form ._field ._option {
  margin:0;
  padding:0;
  color:#2c2c2c;
  font-size:13px;
  font-family:helvetica,arial,sans-serif;
  font-weight:normal;
  line-height:20px;
  }
  ._form ._type_header ._label {
  width:100%;
  font-style:normal;
  font-size:16px!important;
  line-height:20px;
  color:#005698;
  margin:0 0 5px!important;
  padding:0 0 10px!important;
  overflow:hidden;
  border-bottom:1px solid #e0e0e0;
  }
  ._form ._type_input ._option  textarea{
  width:97%!important;
  background:#fafafa;
  border:1px solid #c7c7c7;
  border-top:1px solid #b6b6b6;
  -webkit-border-radius:3px;
  -moz-border-radius:3px;
  border-radius:3px;
  }
  ._form ._type_input ._option input[type="submit"],
  ._form ._type_input ._option input[type="button"] {
  width:auto;
  margin:10px 0 0!important;
  padding:2px 15px!important;
  cursor:pointer;
  font-family:verdana,arial,sans-serif;
  font-weight:700;
  font-size:12px;
  color:#3f3f3f;
  background:#f7f7f7;
  border:1px solid #999999;
  border-bottom:1px solid #888888;
  text-align:center;
  }
  ._form ._type_input ._option input[type="submit"]:hover,
  ._form ._type_input ._option input[type="button"]:hover {
  border:1px solid #afafaf;
  border-bottom:1px solid #a5a5a5;
  background:#f7f7f7;
  color:#525252;
  }
  ._form ._type_date ._option input[type="text"] {
  float:left;
  width:100px;
  }
  ._form ._type_radio ._option label {
  display:inline;
  font-size:14px;
  font-weight:normal;
  line-height:18px;
  }
  ._form ._type_radio ._option label input[type="radio"] {
  position:relative;
  width:13px;
  height:13px;
  margin:-4px 0 0 1px;
  cursor:pointer;
  vertical-align:middle;
  line-height:20px;
  }
  ._form ._type_date ._option input[type="button"] {
  width:24px;
  height:24px;
  margin:2px 0 0 5px;
  padding:0;
  border:none;
  outline:none;
  text-indent:-9999px;
  }
  ._form ._field ._option select {
  display:block;
  margin:0;
  padding:0;
  width:auto;
  font-size:14px;
  border:1px solid #b6b6b6;
  }
  ._form ._type_captcha img {
  float:left;
  width:42px;
  height:24px;
  margin:0 6px 0 0;
  border:1px solid #b6b6b6;
  }
  ._form ._type_captcha input[type="text"] {
  float:left;
  margin:0!important;
  width:40%;
  font-size:14px;
  -webkit-box-sizing: border-box;
  -moz-box-sizing: border-box;
  box-sizing: border-box;
  }
  ._form ._field table {
  margin:0;
  padding:0;
  border-collapse:collapse;
  width:100%!important;
  table-layout:fixed;
  margin-bottom:18px;
  font-size:13px!important;
  border-collapse:collapse;
  border-spacing:0;
  }
  ._form ._field table td {
  padding:0 10px 0 0!important;
  line-height:18px;
  text-align:left;
  font-size:13px!important;
  color:#606060;
  }
  ._form ._type_input ._option  table tbody#_forward_rcpt input {margin:0 0 4px 0; width:96%!important;}
  ._form ._type_input ._option  table tbody#_forward_rcpt img.image_addrcpt {cursor:pointer;}
  .form_errors{
  text-align:center;
  font-size:15px;
  margin:10px;
  color:#900;
  font-family:Arial, Helvetica, sans-serif;
  font-weight:bold;
  margin-bottom:20px;
  }
</style>
<form action="//something.com/proc.php" method="post" id="_form_37" accept-charset="utf-8" enctype="multipart/form-data">
  <input type="hidden" name="f" value="37">
  <input type="hidden" name="s" value="">
  <input type="hidden" name="c" value="0">
  <input type="hidden" name="m" value="0">
  <input type="hidden" name="act" value="sub">
  <input type="hidden" name="nlbox[]" value="6">
  <div class="_form">
    <div class="formwrapper">
      <div id="_field284">
        <div id="compile284" class="_field _type_input">
          <div class="_label ">
            First Name
          </div>
          <div class="_option">
            <input type="text" name="field[6]" value="">
          </div>
        </div>
      </div>
      <div id="_field272">
        <div id="compile272" class="_field _type_input">
          <div class="_label ">
            Email *
          </div>
          <div class="_option">
            <input type="email" name="email" >
          </div>
        </div>
      </div>
      <div id="_field273">
        <div id="compile273" class="_field _type_input">
          <div class="_option">
            <input type="submit" value="Subscribe">
          </div>
        </div>
      </div>
      <div id="_field280">
        <div id="compile280" class="_field _type_hidden">
          <div class="_option">
            <input type="hidden" name="field[4]" value="">
          </div>
        </div>
      </div>
      <div id="_field281">
        <div id="compile281" class="_field _type_hidden">
          <div class="_option">
            <input type="hidden" name="field[5]" value="">
          </div>
        </div>
      </div>
      <div id="_field282">
        <div id="compile282" class="_field _type_hidden">
          <div class="_option">
            <input type="hidden" name="field[3]" value="">
          </div>
        </div>
      </div>
    </div>
  </div>
</form>';
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($string);
libxml_use_internal_errors(false);
$forms = $doc->getElementsByTagName('form');
foreach($forms as $form) {
    echo preg_replace('~^'s+$~m', "", strip_tags($doc->saveHTML($form), '<form><input>'));
}

最好避免正则化HTML/XML,除非有一致的模式(即使这样通常也最好避免)。

更新:

$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($string);
libxml_use_internal_errors(false);
$forms = $doc->getElementsByTagName('form');
foreach($forms as $form) {
    $form->removeAttribute('id');
    $form->removeAttribute('class');
    foreach($form->getElementsByTagName('input') as $input) {
        $input->removeAttribute('class');
        $input->removeAttribute('id');
    }
    echo preg_replace('~^'s+$~m', "", strip_tags($doc->saveHTML($form), '<form><input>'));
}