PHP字符验证


PHP character validation

我有一个HTML表单,用于将电子邮件直接从我的页面发送到我的电子邮件。发送电子邮件和表格的代码如下所示。

我希望我的表格也能接受字符Č,č,ž,Š,š,Ć,ć。。。我知道允许的字符在代码中写在哪里,但由于我对PHP的经验很少,我不知道如何在现有字符中添加其他字符。

此外,在我看来,代码只检查字段"名称"、"电子邮件"answers"url",而不检查"评论"answers"主题"。我说得对吗?

<form action="<?php echo basename(__FILE__); ?>" method="post" id="signup">
                <noscript>
                        <p><input type="hidden" name="nojs" id="nojs" /></p>
                </noscript>

                <div class="headerObrazca"> 
                <br/>               
                    <h3>Povpraševanje ali naročanje</h3>         
                    <p>Vsa polja so obvezna.</p>                    
                </div>
                <div class="sep"></div>
                <div class="inputs">
                    <center>
                        <input type="text" name="name"  placeholder="Ime in Priimek" autofocus value="<?php get_data("name"); ?>" /><br />

                        <input type="text" name="email" id="email"  placeholder="E-pošta" value="<?php get_data("email"); ?>" /><br />

                        <input type="text" name="subject" id="subject"  placeholder="Zadeva" value="<?php get_data("subject"); ?>" /><br />
                    </center>   

                        <textarea name="comments" id="comments" rows="5" cols="70" placeholder="Vaše vprašanje ali naročilo" ><?php get_data("comments"); ?></textarea><br />
                    <p>
                        <input type="submit" name="submit" id="submit" value="Pošlji!" <?php if (isset($disable) && $disable === true) echo ' disabled="disabled"'; ?> />
                    </p>
                </div>
    </form>

'

$yourEmail = "example@something.si"; // the email address you wish to receive these mails through
$yourWebsite = "XXX"; // the name of your website
$thanksPage = 'ponudbaHvala.php'; // URL to 'thanks for sending mail' page; leave empty to keep message on the same page 
$maxPoints = 4; // max points a person can hit before it refuses to submit - recommend 4
$requiredFields = "name,email,comments,subject"; // names of the fields you'd like to be required as a minimum, separate each field with a comma

// DO NOT EDIT BELOW HERE
$error_msg = array();
$result = null;
$requiredFields = explode(",", $requiredFields);
function clean($data) {
    $data = trim(stripslashes(strip_tags($data)));
    return $data;
}
function isBot() {
    $bots = array("Indy", "Blaiz", "Java", "libwww-perl", "Python", "OutfoxBot", "User-Agent", "PycURL", "AlphaServer", "T8Abot", "Syntryx", "WinHttp", "WebBandit", "nicebot", "Teoma", "alexa", "froogle", "inktomi", "looksmart", "URL_Spider_SQL", "Firefly", "NationalDirectory", "Ask Jeeves", "TECNOSEEK", "InfoSeek", "WebFindBot", "girafabot", "crawler", "www.galaxy.com", "Googlebot", "Scooter", "Slurp", "appie", "FAST", "WebBug", "Spade", "ZyBorg", "rabaz");
    foreach ($bots as $bot)
        if (stripos($_SERVER['HTTP_USER_AGENT'], $bot) !== false)
            return true;
    if (empty($_SERVER['HTTP_USER_AGENT']) || $_SERVER['HTTP_USER_AGENT'] == " ")
        return true;
    return false;
}
if ($_SERVER['REQUEST_METHOD'] == "POST") {
    if (isBot() !== false)
        $error_msg[] = "No bots please! UA reported as: ".$_SERVER['HTTP_USER_AGENT'];
    // lets check a few things - not enough to trigger an error on their own, but worth assigning a spam score.. 
    // score quickly adds up therefore allowing genuine users with 'accidental' score through but cutting out real spam :)
    $points = (int)0;
    $badwords = array("adult", "beastial", "bestial", "blowjob", "clit", "cum", "cunilingus", "cunillingus", "cunnilingus", "cunt", "ejaculate", "fag", "felatio", "fellatio", "fuck", "fuk", "fuks", "gangbang", "gangbanged", "gangbangs", "hotsex", "hardcode", "jism", "jiz", "orgasim", "orgasims", "orgasm", "orgasms", "phonesex", "phuk", "phuq", "pussies", "pussy", "spunk", "xxx", "viagra", "phentermine", "tramadol", "adipex", "advai", "alprazolam", "ambien", "ambian", "amoxicillin", "antivert", "blackjack", "backgammon", "texas", "holdem", "poker", "carisoprodol", "ciara", "ciprofloxacin", "debt", "dating", "porn", "link=", "voyeur", "content-type", "bcc:", "cc:", "document.cookie", "onclick", "onload", "javascript");
    foreach ($badwords as $word)
        if (
            strpos(strtolower($_POST['comments']), $word) !== false || 
            strpos(strtolower($_POST['name']), $word) !== false
        )
            $points += 2;
    if (strpos($_POST['comments'], "http://") !== false || strpos($_POST['comments'], "www.") !== false)
        $points += 2;
    if (isset($_POST['nojs']))
        $points += 1;
    if (preg_match("/(<.*>)/i", $_POST['comments']))
        $points += 2;
    if (strlen($_POST['name']) < 3)
        $points += 1;
    if (strlen($_POST['comments']) < 15 || strlen($_POST['comments'] > 1500))
        $points += 2;
    if (preg_match("/[bcdfghjklmnpqrstvwxyz]{7,}/i", $_POST['comments']))
        $points += 1;
    // end score assignments
    foreach($requiredFields as $field) {
        trim($_POST[$field]);
        if (!isset($_POST[$field]) || empty($_POST[$field]) && array_pop($error_msg) != "Prosim, izpolnite vsa polja in ponovno pošljite.'r'n")
            $error_msg[] = "Prosim, izpolnite vsa polja in ponovno pošljite.";
    }
    if (!empty($_POST['name']) && !preg_match("/^[a-zA-Z-''s]*$/", stripslashes($_POST['name'])))
        $error_msg[] = "Obrazec ne sprejema posebnih znakov.'r'n";
    if (!empty($_POST['email']) && !preg_match('/^([a-z0-9])(([-a-z0-9._])*([a-z0-9]))*'@([a-z0-9])(([a-z0-9-])*([a-z0-9]))+' . '('.([a-z0-9])([-a-z0-9_-])?([a-z0-9])+)+$/i', strtolower($_POST['email'])))
        $error_msg[] = "Vpisali ste napačno obliko E-pošte.'r'n";
    if (!empty($_POST['url']) && !preg_match('/^(http|https):'/'/(([A-Z0-9][A-Z0-9_-]*)('.[A-Z0-9][A-Z0-9_-]*)+)(:('d+))?'/?/i', $_POST['url']))
        $error_msg[] = "Invalid website url.'r'n";
    if ($error_msg == NULL && $points <= $maxPoints) {
        $subject = stripslashes(strip_tags( $_POST['subject'] ));
        $message = "Nekdo je izpolnil obrazec v povpraševanju: 'n'n";
        foreach ($_POST as $key => $val) {
            if (is_array($val)) {
                foreach ($val as $subval) {
                    $message .= ucwords($key) . ": " . clean($subval) . "'r'n";
                }
            } else {
                $message .= ucwords($key) . ": " . clean($val) . "'r'n";
            }
        }
        $message .= "'r'n'n'n";
        $message .= 'IP: '.$_SERVER['REMOTE_ADDR']."'r'n";
        $message .= 'Browser: '.$_SERVER['HTTP_USER_AGENT']."'r'n";
        $message .= 'Points: '.$points;
        if (strstr($_SERVER['SERVER_SOFTWARE'], "Win")) {
            $headers   = "From: $yourEmail'n";
            $headers  .= "Reply-To: {$_POST['email']}";
        } else {
            $headers   = "From: $yourWebsite <$yourEmail>'n";
            $headers  .= "Reply-To: {$_POST['email']}";
        }
        if (mail($yourEmail,$subject,$message,$headers)) {
            if (!empty($thanksPage)) {
                header("Location: $thanksPage");
                exit;
            } else {
                $result = 'Your mail was successfully sent.';
                $disable = true;
            }
        } else {
            $error_msg[] = 'Vaše sporočilo trenutno ne mora biti poslano. ['.$points.']';
        }
    } else {
        if (empty($error_msg))
            $error_msg[] = 'Vaše sporočilo izgleda kot vsiljena pošta. Poskusite ponovno. ['.$points.']';
    }
}
function get_data($var) {
    if (isset($_POST[$var]))
        echo htmlspecialchars($_POST[$var]);
}
?>'

首先确保使用utf-8作为网页的编码。

使用正则表达式(PCRE)来验证输入。对于名称,这是:

!preg_match("/^[a-zA-Z-''s]*$/", ...)

此表达式不在utf-8模式下。它只匹配字节。在utf-8中,像Č这样的字符是多个字节。修改器u激活utf-8模式。此外,-应该是字符类中的第一个或最后一个元素。在另外两个字符之间,它定义了一个范围(如a-z)。

!preg_match("/^[a-zA-Z'''s-]*$/u", ...)

在这种模式下,您可以将特殊字符添加到字符类中。您必须确保编辑器/ide将PHP存储为utf-8。

!preg_match("/^[a-zA-Z'''sČ-]*$/u", ...)

在类中添加几个字符会很快增加模式的大小,而忘记一个字符很容易。更好的解决方案是unicode属性。"''pL"是所有字母(包括西里尔文、朝鲜文…)的缩写。这是我建议的验证方法。

!preg_match("/^[''pL'''s-]*$/u", ...)

但你可以将其限制在更具体的群体中,比如"拉丁语"。

!preg_match("/^[''p{Latin}'''s-]*$/u", ...)

示例:

// latin letters, valid: int(1)
var_dump(
  preg_match('(^'p{Latin}+$)u', 'aäČ')
);
// latin letters, invalid: int(0)
var_dump(
  preg_match('(^'p{Latin}+$)u', 'Русский')
);
// all letters, valid: int(1)
var_dump(
  preg_match('(^'pL+$)u', 'Русский')
);