如何在PHP中删除字符串中的JavaScript残留物


How to Remove JavaScript Remnants from String in PHP

我正在运行以下步骤,试图清理使用$query_text_lower=file_get_contents(websiteURL)获得的字符串。

我想要的是,只返回文字。没有javascript,没有随机数,没有CSS或任何其他类型的脚本。

//remove javascript
$query_text_lower = preg_replace("/<script[^>]*>.*?< *script[^>]*>/i", "", $new_text); 
//remove html tags
$query_text_lower2 = strip_tags($query_text_lower);
//removes any text containing links (may not be best, as some sites link useful words within the text. Does tend to remove a lot of ads though
$query_text_lower3 = preg_replace('/<a's.*?>.*?<'/a>/s', '', $query_text_lower2);
//removes linebreaks
$query_text_lower4 = trim(preg_replace('/'s+/', ' ', $query_text_lower3));
echo $query_text_lower4;
die();

下面是我目前输出的一个例子:

developing a cafe: 13 steps - wikihow /**/ /**/ messages log in log in via log in remember me forgot? create an account explore community dashboardrandom articleabout uscategoriesrecent changes help us communication an articlerequest a new articleanswer a requestmore ideas... edit edit this article home » categories » investment and business » business » buying & forming a business » hospitality businesses articleeditdiscuss wh.mergelang({ 'navlist_collapse': '- collapse','navlist_expand': '+ expand','usernameoremail': 'username or email','password': 'password' }); edit articlehow to start a cafe edited by harri, maluniu, annie, afc8871 and 1 other if you have always dreamt of management a business, then learning how to start a cafe may be the answer. with the right planning beforehand, opening a cafe can become highly profitable. your cafe can easily become a place where staff come to relax, enjoy schedule with friends or family, grab a quick bite to eat, or to work on their latest project. start a cafe business by following the steps below. ad google_ad_customer = "ca-pub-9543332082073187"; /* iframe unit - intro */ if(abtype == 2) google_ad_slot = '6354743772'; else //a or normal google_ad_slot = '8579663774'; if(abtype == 2 || abtype == 3 || abtype == 4 || abtype == 5 || abtype == 6) { google_ad_width = 671; google_ad_height = 120; google_max_num_ads = 2; } else if(abtype >= 7) { google_ad_width = 645; google_ad_height = 60; google_max_num_ads = 1; } else { google_ad_width = 671; google_ad_height = 60; google_max_num_ads = 1; } google_ad_results = 'html'; google_override_format = true; google_ad_channel = "0206790666+7733764704+1640266093+6709519645+8052511407+6822404019+7122150828" + gchans + xchannels; if( fromsearch ) { document.communication(''); } //--> edit steps 1communication your business and marketing plans. these are very important aspects of any business, as they will show your course of action for both management and marketing the business. refer to these documents often to make sure you stay on track. without these documents, you may not be able to secure funding. ad google_ad_customer = "ca-pub-9543332082073187"; /* iframe unit - first step */ if(abtype == 2) google_ad_slot = '4878010577'; else //a or normal google_ad_slot = '5205564977'; if(abtype == 2 || abtype == 3 || abtype == 4 || abtype == 5 || abtype == 6) { google_ad_width = 629; google_ad_height = 120; google_max_num_ads = 2; } else if(abtype >= 7) { google_ad_width = 600; google_ad_height = 60; google_max_num_ads = 1; } else { google_ad_width = 629; google_ad_height = 60; google_max_num_ads = 1; } google_ad_results = 'html'; google_override_format = true; google_ad_channel = "2748203808+7733764704+1640266093+6709519645+8052511407+2490795108+6822404019+7122150828" + gchans + xchannels; document.communication(''); //--> 2follow all legalities for starting a cafe business of this nature in your area. make sure you get all the necessary licenses, permits, and insurance required on federal, state, and local levels. 3secure funding for your business. in your business plan, you determined how much funding you need to start a cafe business. contact investors, apply for loans, and use whatever capital you have on hand to start the business. 

您的javascript正则表达式已关闭

你有:

$query_text_lower = preg_replace("/<script[^>]*>.*?< *script[^>]*>/i", "", $new_text); 

您没有检测到<脚本>在返回的文档中,所以它不是从页面中删除javascript代码本身,而是在调用striptags时删除标记,这样它们就不会出现在最终输出中。然而,我看不到你要从哪个网站上删除这个,所以我不能100%关注那个网站。

如果这有道理,请告诉我。基本上,在我看来,您的第一个正则表达式实际上并没有匹配任何内容。

您无法使用正则表达式解析这些内容。建议使用现有的DOM工具解析HTML是正确的做法。