使用HTML Dom或jquery解析HTML页面


Parsing HTML page with HTML Dom or jquery

我需要解析HTML内容,以便每个标头都有<li>标记中提供的内容。任何建议都将不胜感激。我生成的HTML如下

<div id="word_content">
<br>Testing Time: 2015-10-29 17:57:11<br>
        Total Age: 19<br>
        Total Friemd: 9<br>
        Total Family: 10<br>
        <br>
    Here are the suggestions  - Him_530037_: <a href="www.mytarget.com="_blank">93358546</a>
    <h3>Overview</h3><br>
    <ul>
        <li>(The overlap provided is not good)</li>
    </ul>
    <h3>Structure</h3><br>
    <h4>Target:</h4><br>
    <ul>
        <li>Audience.</li>
        <li>Lookalike</li>
        <li>Overlap of Audience</li>            
    </ul>
</div>

结果应该是:

Overview:The Overlap provided is not good
Structure:
Target: Audience, Lookalike, Overlap of audience

我在思考这些问题,但无法推进

        nodes = document.getElementById("word_content");
        var $result = new Array();
        for (i=0; i < nodes.childNodes.length; i++) 
        { 
            if (nodes.childNodes[i].nodeValue !=null) 
                {
                    $result[i]= nodes.childNodes[i].nodeValue;
                }
        }

您可以参考下面的jquery代码。但这取决于您的HTML代码。如果您对HTML代码进行了任何更改;您将不得不相应地更改jquery代码。

$(document).ready(function(){
  
  var headTags = $("div#word_content").find("*").filter(function(){
    return /^h/i.test(this.nodeName);
  });
  
  var output = {};
  
  $(headTags).each(function(){
    var currentHead = $(this);
     
    var nextNextElem = currentHead.next().next();
    var innerText = [];
    if(nextNextElem.prop("tagName") == "UL")
      {
         nextNextElem.find("li").each(function(){
           innerText.push($(this).text());
         });  
        
      }
    
    output["'""+currentHead.text()+"'""] = innerText;
  });  
  
  alert(JSON.stringify(output));
  console.log(output);
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js"></script>
<div id="word_content">
<h3>Overview</h3><br>
<ul>
<li>(The overlap provided is not good)</li>
</ul>
<h3>Structure</h3><br>
<h4>Target:</h4><br>
        <ul>
            <li> Audience.</li>
            <li>Lookalike</li>
            <li>Overlap of Audience </li>           
        </ul></div>