我需要解析HTML内容,以便每个标头都有<li>
标记中提供的内容。任何建议都将不胜感激。我生成的HTML如下
<div id="word_content">
<br>Testing Time: 2015-10-29 17:57:11<br>
Total Age: 19<br>
Total Friemd: 9<br>
Total Family: 10<br>
<br>
Here are the suggestions - Him_530037_: <a href="www.mytarget.com="_blank">93358546</a>
<h3>Overview</h3><br>
<ul>
<li>(The overlap provided is not good)</li>
</ul>
<h3>Structure</h3><br>
<h4>Target:</h4><br>
<ul>
<li>Audience.</li>
<li>Lookalike</li>
<li>Overlap of Audience</li>
</ul>
</div>
结果应该是:
Overview:The Overlap provided is not good
Structure:
Target: Audience, Lookalike, Overlap of audience
我在思考这些问题,但无法推进
nodes = document.getElementById("word_content");
var $result = new Array();
for (i=0; i < nodes.childNodes.length; i++)
{
if (nodes.childNodes[i].nodeValue !=null)
{
$result[i]= nodes.childNodes[i].nodeValue;
}
}
您可以参考下面的jquery代码。但这取决于您的HTML代码。如果您对HTML代码进行了任何更改;您将不得不相应地更改jquery代码。
$(document).ready(function(){
var headTags = $("div#word_content").find("*").filter(function(){
return /^h/i.test(this.nodeName);
});
var output = {};
$(headTags).each(function(){
var currentHead = $(this);
var nextNextElem = currentHead.next().next();
var innerText = [];
if(nextNextElem.prop("tagName") == "UL")
{
nextNextElem.find("li").each(function(){
innerText.push($(this).text());
});
}
output["'""+currentHead.text()+"'""] = innerText;
});
alert(JSON.stringify(output));
console.log(output);
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js"></script>
<div id="word_content">
<h3>Overview</h3><br>
<ul>
<li>(The overlap provided is not good)</li>
</ul>
<h3>Structure</h3><br>
<h4>Target:</h4><br>
<ul>
<li> Audience.</li>
<li>Lookalike</li>
<li>Overlap of Audience </li>
</ul></div>