从 PHP 中的字符串中提取城市和邮政编码


Extracting City and Zipcode from String in PHP

我需要一种快速的通用方法来从输入字符串中提取城市和邮政编码(如果可用(信息。

字符串可以是以下形式

  1. $input_str = "123 Main Street, New Haven, CT";
  2. $input_str = "123 Main Street, New Haven, CT 06510";
  3. $input_str = "纽黑文,康涅狄格州,美国";
  4. $input_str = "纽黑文,CT 06510";

我在想,对于(1(和(3(,至少我可以用","爆炸输入字符串,然后循环遍历数组以找到一个2位数的STATE字符并忽略它。但我被困在这一点之外。

$search_values = explode(',' ,$input_str);
foreach($search_values as $search)
{
    $trim_search = trim($search);   // Remove any trailing white spaces
    // If the 2 digit State is provided without Zipcode, ignore it
    if (strlen($trim_search) == 2)
    {
        //echo 'Ignoring State Without Zipcode: ' . $search . '<br>';
        continue;
    }
    ...

我不是正则表达式最伟大的,但这里有一个寻找有或没有邮政编码的 2 个字符状态的机会。

正则表达式:(([A-Z]{2})|[0-9]{5})+

小提琴

但是,如果您只想在州和邮政编码在一起时匹配,请查看以下内容:正则表达式:(([A-Z]{2})('s*[0-9]{5}))+

小提琴

class Extract  {
    
    private $_string;
    
    private $_sections = array();
    
    private $_output = array();
        
    private $_found = array();
    
    private $_original_string;
    
    private $_countries = array (
        'United States',
        'Canada',
        'Mexico',
        'France',
        'Belgium',
        'United Kingdom',
        'Sweden',
        'Denmark',
        'Spain',
        'Australia',
        'Austria',
        'Italy',
        'Netherlands'
    );
    
    private $_zipcon = array();
    
    private $ZIPREG = array(
        "United States"=>"^'d{5}(['-]?'d{4})?$",
        "United Kingdom"=>"^(GIR|[A-Z]'d[A-Z'd]??|[A-Z]{2}'d[A-Z'd]??)[ ]??('d[A-Z]{2})$",
        "Germany"=>"'b((?:0[1-46-9]'d{3})|(?:[1-357-9]'d{4})|(?:[4][0-24-9]'d{3})|(?:[6][013-9]'d{3}))'b",
        "Canada"=>"^([ABCEGHJKLMNPRSTVXY]'d[ABCEGHJKLMNPRSTVWXYZ])'s*('d[ABCEGHJKLMNPRSTVWXYZ]'d)$",
        "France"=>"^(F-)?((2[A|B])|[0-9]{2})[0-9]{3}$",
        "Italy"=>"^(V-|I-)?[0-9]{5}$",
        "Australia"=>"^(0[289][0-9]{2})|([1345689][0-9]{3})|(2[0-8][0-9]{2})|(290[0-9])|(291[0-4])|(7[0-4][0-9]{2})|(7[8-9][0-9]{2})$",
        "Netherlands"=>"^[1-9][0-9]{3}'s?([a-zA-Z]{2})?$",
        "Spain"=>"^([1-9]{2}|[0-9][1-9]|[1-9][0-9])[0-9]{3}$",
        "Denmark"=>"^([D-d][K-k])?( |-)?[1-9]{1}[0-9]{3}$",
        "Sweden"=>"^(s-|S-){0,1}[0-9]{3}'s?[0-9]{2}$",
        "Belgium"=>"^[1-9]{1}[0-9]{3}$"
    ); // thanks to http://www.pixelenvision.com/1708/zip-postal-code-validation-regex-php-code-for-12-countries/
    
    public function __construct($string) {
        $this->_output = array (
        
            "state" => "",
            "city" => "",
            "country" => "",
            "zip" => "",
            "street" =>"",
            "number" => ""
        );
        $this->_original_string = $string;
        $this->_string = $this->normalize(trim($string));
        
        
        // create an array of patterns in order to extract zip code using the country list we already have
        foreach($this->ZIPREG as $country => $pattern) {
            $this->_zipcon[] = $pattern = preg_replace( array("/'^/","/''$/"),array("",""), $pattern);
        }
    
        $this->init();
    }
    
    protected function init() {
        
        $this->getData(); // get data that can be found without breaking up the string.
        
        $this->_sections = array_filter(explode(',', trim($this->_string)));  // split each section
        if(!empty($this->_sections)) {
            foreach($this->_sections as $i => $d) {
                $d = preg_replace(array("/'s+/", "/'s([?.!])/"),  array(" ","$1"), $d ); 
                $this->_sections[$i] = trim($this->normalize($d));  // normalize strin to have one spacing between each word
            }
        } else {
            $this->_sections[] = $this->_string;    
        }       
        
        // try to match what's missing with has already been found
        $notFound = $this->getNotFound();
        if(count($notFound)==1 && count($this->_found)>1) {
            $found = $this->getFound();
            foreach($found as $string) {
                $notFound[0] = preg_replace("/$string/i", "", $notFound[0]);
            }
            $this->_output["city"] = $notFound[0];
            $this->_found[] = $this->_output["city"];
            $this->remove($this->_output["city"]);
        }   
    }
    
    public function getSections() {
        return $this->_sections;
    }   
    
    protected function normalize($string) {
        $string = preg_replace(array("/'s+/", "/'s([?.!])/"),  array(" ","$1"), trim($string));
        return $string;
    }
    
    protected function country_from_zip($zip) {
        $found = "";
        foreach($this->ZIPREG as $country => $pattern) {
            if(preg_match ("/".$pattern."/", $zip)) {
                $found = $country;
                break;
            }
        }
        return $found;
    }
    
    protected function getData() {
        $container = array();
        // extract zip code only when present beside state, or else five digits are meaningless
        
        if(preg_match ("/[A-Z]{2,}'s*(".implode('|', $this->_zipcon).")/", $this->_string) ){
            preg_match ("/[A-Z]{2,}'s*(".implode('|', $this->_zipcon).")/", $this->_string, $container["state_zip"]);
            $this->_output["state"] = $container["state_zip"][0];
            $this->_output["zip"] = $container["state_zip"][1];
            $this->_found[] = $this->_output["state"] . " ". $this->_output["zip"];
            // remove from string once found
            $this->remove($this->_output["zip"]);   
            $this->remove($this->_output["state"]);
            
            // check to see if we can find the country just by inputting zip code
            if($this->_output["zip"]!="" ) {
                $country = $this->country_from_zip($this->_output["zip"]);
                $this->_output["country"] = $country;
                $this->_found[] = $this->_output["country"];
                $this->remove($this->_output["country"]);
            }
        } 
        
        if(preg_match ("/'b([A-Z]{2,})'b/", $this->_string)) {
            preg_match ("/'b([A-Z]{2,})'b/", $this->_string, $container["state"]);  
            $this->_output["state"] = $container["state"][0];
            $this->_found[] = $this->_output['state'];
            $this->remove($this->_output["state"]);
        }
        // if we weren't able to find a country based on the zip code, use the one provided (if provided)
        if($this->_output["country"] == "" && preg_match("/(". implode('|',$this->_countries)  . ")/i", $this->_string) ){
            preg_match ("/(". implode('|',$this->_countries)  . ")/i", $this->_string, $container["country"]);
            $this->_output["country"] = $container["country"][0];
            $this->_found[] = $this->_output['country'];
            $this->remove($this->_output["country"]);
        }   
            
        if(preg_match ("/([0-9]{1,})'s+([.''-a-zA-Z's*]{1,})/", $this->_string) ){
            preg_match ("/([0-9]{1,})'s+([.''-a-zA-Z's*]{1,})/", $this->_string, $container["address"]);
            $this->_output["number"] = $container["address"][1];
            $this->_output["street"] = $container["address"][2];
            $this->_found[] = $this->_output["number"] . " ". $this->_output["street"];
            $this->remove($this->_output["number"]);
            $this->remove($this->_output["street"]);
        }       
        
        
        //echo $this->_string;
    }
    
    /* remove from string in order to make it easier to find missing this */
    protected function remove($string, $case_sensitive = false) {
        $s = ($case_sensitive==false ? "i" : "");
        $this->_string = preg_replace("/".$string."/$s", "", $this->_string);
    }
    public function getNotFound() {
        return array_values(array_filter(array_diff($this->_sections, $this->_found)));
    }
    
    public function getFound() {
        return $this->_found;   
    }
    /* outputs a readable string with all items found */
    public function toString() {
        $output = $this->getOutput();
        $string = "Original string: [ ".$this->_original_string.' ] ---- New string: [ '. $this->_string. ' ]<br>';
        foreach($output as $type => $data) {
            $string .= "-".$type . ": " . $data. '<br>';    
        }   
        return $string;
    }
    
    /* return the final output as an array */
    public function getOutput() {
        return $this->_output;  
    }   
    
}

$array = array();
$array[0] = "123 Main Street, New Haven, CT 06518";
$array[1] = "123 Main Street, New Haven, CT";
$array[2] = "123 Main Street, New Haven,                            CT 06511";
$array[3] = "New Haven,CT 66554, United States";
$array[4] = "New Haven, CT06513";
$array[5] = "06513";
$array[6] = "123 Main    Street, New Haven CT 06518, united states";
$array[7] = "1253 McGill College, Montreal, QC H3B 2Y5"; // google Montreal  / Canada
$array[8] = "1600 Amphitheatre Parkway, Mountain View, CA 94043"; // google CA  / US
$array[9] = "20 West Kinzie St., Chicago, IL 60654"; // google IL / US
$array[10] = "405 Rue Sainte-Catherine Est, Montreal, QC"; // Montreal address shows hyphened street names
$array[11] = "48 Pirrama Road, Pyrmont, NSW 2009"; // google Australia

foreach($array as $string) {
    $a = new Extract($string);
    echo $a->toString().'<br>'; 
}

使用上面代码中的示例,它应该输出:

Original string: [ 123 Main Street, New Haven, CT 06518 ] ---- New string: [ , , ]
-state: CT
-city: New Haven
-country: United States
-zip: 06518
-street: Main Street
-number: 123
Original string: [ 123 Main Street, New Haven, CT ] ---- New string: [ , , ]
-state: CT
-city: New Haven
-country: 
-zip: 
-street: Main Street
-number: 123
Original string: [ 123 Main Street, New Haven, CT 06511 ] ---- New string: [ , , ]
-state: CT
-city: New Haven
-country: United States
-zip: 06511
-street: Main Street
-number: 123
Original string: [ New Haven,CT 66554, United States ] ---- New string: [ , , ]
-state: CT
-city: New Haven
-country: United States
-zip: 66554
-street: 
-number: 
Original string: [ New Haven, CT06513 ] ---- New string: [ , ]
-state: CT
-city: New Haven
-country: United States
-zip: 06513
-street: 
-number: 
Original string: [ 06513 ] ---- New string: [ 06513 ]
-state: 
-city: 
-country: 
-zip: 
-street: 
-number: 
Original string: [ 123 Main Street, New Haven CT 06518, united states ] ---- New string: [ , , ]
-state: CT
-city: New Haven
-country: United States
-zip: 06518
-street: Main Street
-number: 123
Original string: [ 1253 McGill College, Montreal, QC H3B 2Y5 ] ---- New string: [ , , ]
-state: QC
-city: Montreal
-country: Canada
-zip: H3B 2Y5
-street: McGill College
-number: 1253
Original string: [ 1600 Amphitheatre Parkway, Mountain View, CA 94043 ] ---- New string: [ , , ]
-state: CA
-city: Mountain View
-country: United States
-zip: 94043
-street: Amphitheatre Parkway
-number: 1600
Original string: [ 20 West Kinzie St., Chicago, IL 60654 ] ---- New string: [ , , ]
-state: IL
-city: Chicago
-country: United States
-zip: 60654
-street: West Kinzie St.
-number: 20
Original string: [ 405 Rue Sainte-Catherine Est, Montreal, QC ] ---- New string: [ , , ]
-state: QC
-city: Montreal
-country: 
-zip: 
-street: Rue Sainte-Catherine Est
-number: 405
Original string: [ 48 Pirrama Road, Pyrmont, NSW 2009 ] ---- New string: [ , , ]
-state: NSW
-city: Pyrmont
-country: Australia
-zip: 2009
-street: Pirrama Road
-number: 48

如果要提取实际存储的值以便可以使用。您需要致电getOutput().这将返回一个包含所有必要值的数组。如果我们获取列表中的第一个地址并使用此方法输出其值,它应该输出:

Array
(
    [state] => CT
    [city] => New Haven
    [country] => United States
    [zip] => 06518
    [street] => Main Street
    [number] => 123
)

请注意,此类可以大大优化和改进。这是我在一小时内提出的,所以我不能保证它适用于所有类型的输入。从本质上讲,您必须确保用户至少努力使用逗号来分隔地址的各个部分。您还需要确保提供大写状态和有效的五位数邮政编码。

一些规则

  1. 为了提取邮政编码,必须提供一个有效的 2 个字符的状态,旁边有一个有效的邮政编码。示例:CT 06510。如果没有国家,简单地输入五位数是没有意义的,因为街道号码中也可以有五位数。(无法区分两者(。

  2. 只有当按
  3. 顺序提供数字和单词时,才能提取街道和数字。示例:123 Main Street .它还必须用逗号分隔,否则它将捕获数字后的所有单词。例如,123 Main Street New Haven, CT 06518,代码将街道和号码123 Main Street New Haven而不是123 Main Street

  4. 简单地输入一个五位数的邮政编码是行不通的。

  5. 如果未给出国家/地区,
  6. 它将猜测该国家/地区,前提是存在有效的邮政编码(请参阅上面的邮政编码列表及其各自的国家/地区(。

  7. 它假定不会提供连字符(尤其是对于城市名称(。这可以在以后修改。(正则表达式需要修改以适应城市和街道名称的连字符(。 (固定(

  8. 最重要的是,如果您有时间更改和修改正则表达式并相应地自定义它,您可以做更多的事情。

我强烈建议您使用表单(如果您还没有(,以便轻松捕获输入中提供的地址。这可能会让你的生活更轻松。

快速使用

$Extract = new Extract("123 Main Street, New Haven, CT 06518");
$foundValues = $Extract->getOutput();