在 PHP 脚本中将文本转换为 UTF-8


Converting text to UTF-8 within PHP script

我正在将数据从动态html表导出到CSV。

但是,这会导致一些问题,因为有时数据具有控制字符等。

如果可能的话,我需要去除所有这些或变得"友好"?

我不知道该怎么做,所以有人可以帮忙吗?

这是我的脚本:

<textarea name="siteurl" rows="10" cols="50">
<?php //Check if the form has already been submitted and if this is the case, display the     submitted content. If not, display 'http://'.
echo (isset($_GET['siteurl']))?htmlspecialchars($_GET['siteurl']):"http://";?>
</textarea><br>
<input type="submit" value="Submit">
</form>
</div>
<div id="nofloat"></div>
<table class="metadata" id="metatable_1">
<?php
error_reporting(E_ALL);
//ini_set( "display_errors", 0);
function parseUrl($url){
    //Trim whitespace of the url to ensure proper checking.
    $url = trim($url);
    //Check if a protocol is specified at the beginning of the url. If it's not,    prepend 'http://'.
    if (!preg_match("~^(?:f|ht)tps?://~i", $url)) {
            $url = "http://" . $url;
    }
    //Check if '/' is present at the end of the url. If not, append '/'.
    if (substr($url, -1)!=="/"){
            $url .= "/";
    }
    //Return the processed url.
    return $url;
}
//If the form was submitted
if(isset($_GET['siteurl'])){
    //Put every new line as a new entry in the array
    $urls = explode("'n",trim($_GET["siteurl"]));
    //Iterate through urls
    foreach ($urls as $url) {
            //Parse the url to add 'http://' at the beginning or '/' at the end if not    already there, to avoid errors with the get_meta_tags function
            $url = parseUrl($url);
            //Get the meta data for each url
            $tags = get_meta_tags($url);
            //Check to see if the description tag was present and adjust output    accordingly
            $tags = NULL;
$tags = get_meta_tags($url);
if($tags)
echo "<tr><td>Description($url)</td><td>" .$tags['description']. "</td></tr>";
else 
echo "<tr><td>Description($url)</td><td>No Meta Description</td></tr>";
    }
}
?>
</table>
<script type="text/javascript">
        var exportTable1=new ExportHTMLTable('metatable_1');
    </script>
<div>
        <input type="button" onclick="exportTable1.exportToCSV()"   value="Export to CSV"/>
        <input type="button" onclick="exportTable1.exportToXML()"     value="Export to XML"/>
    </div>
</body>

我猜你想要这样的东西: echo "<tr><td>Description($url)</td><td>" . utf8_encode($tags['description']) . "</td></tr>";

请指定显示错误的文本是什么,是否$tags['description']

以下是您可能需要的功能手册:mb_convert_encoding、utf8_encode。

不确定我是否正确理解了这个问题,但是如果您想要的只是一个 UTF-8 编码的 CSV,您可以对要写入文件的数据使用 utf8_encode()

或者,如果要省略控件字符,则可以在使用ctype_cntrl()将控件字符写入文件之前检查控件字符的行...然后,使用正则表达式来摆脱它们,或者拒绝将行全部写在一起。