从Ruby移植到PHP时的正则表达式问题


Problems with regexes when porting from Ruby to PHP

我有两段代码似乎是彼此的正确翻译。不幸的是,它们似乎返回不同的值。

Ruby中的代码:
def separate(text,boundary = nil)
    # returns array of strings and arrays containing all of the parts of the email
    textList = []
    if !boundary #look in the email for "boundary= X"
        text.scan(/(?<=boundary=).*/) do |bound|
            textList = recursiveSplit(text,bound)
            end
    end
    if boundary 
        textList = recursiveSplit(text,boundary)
    end
    puts textList.count
    return textList
end

def recursiveSplit(chunk,boundary)
    if chunk.is_a? String
        searchString = "--" + boundary
        ar = chunk.split(searchString)
        return ar
    elsif chunk.is_a? Array
        chunk do |bit|
            recursiveSplit(bit,boundary);
        end
    end
end

PHP代码:

function separate($text, $boundary="none"){
    #returns array of strings and arrays containing all the parts of the email
    $textBlock = [];
    if ($boundary == "none") {
        preg_match_all('/(?<=boundary=).*/', $text, $matches);
        $matches = $matches[0];
        foreach ($matches as $match) {
            $textList = recursiveSplit($text,$match);
        }
    }else {
        $textList = recursiveSplit(text,boundary);
    }
    var_dump($textList);
    return$textList;
}
function recursiveSplit($chunk,$boundary){
    if (is_string($chunk)) {
        $ar = preg_split("/--".$boundary."/", $chunk);
        //$ar = explode($searchString, $chunk);
        return $ar;
    }
    elseif (is_array($chunk)) {
        foreach ($chunk as $bit) {
            recursiveSplit($bit,$boundary);
        }
    }
}

var_dump($textList)显示的数组长度为3,而textList.count => 4。到底发生了什么事?

匿名$text示例:

MIME-Version: 1.0
Received: by 10.112.170.40 with HTTP; Fri, 3 May 2013 05:08:21 -0700 (PDT)
Date: Fri, 3 May 2013 08:08:21 -0400
Delivered-To: me@gmail.com
Message-ID: <CADPp44E47syuXvP1K-aemhcU7vdSijZkfKLu-74QPWs9U9551Q@mail.gmail.com>
Subject: MiB 5/3/13 7:43AM (EST)
From: Me <me@gmail.com>
To: Someone <someone@aol.com>
Content-Type: multipart/mixed; boundary=BNDRY1
--BNDRY1
Content-Type: multipart/alternative; boundary=BNDRY2
--BNDRY2
Content-Type: text/plain; charset=ISO-8859-1
-TEXT STUFF HERE. SAYING THINGS
ABOUT CERTAIN THINGS
--BNDRY2
Content-Type: text/html; charset=ISO-8859-1
<div dir="ltr">-changed signature methods to conform more to working clinic header methods(please test/not testable in simulator)<div style>-confirmed that signature image is showing up in simulator. Awaiting further tests</div>
<div style>-Modified findings spacing/buffer. See if you like it</div></div>
--BNDRY2--
--BNDRY1
Content-Type: application/zip; name="Make it Brief.ipa.zip"
Content-Disposition: attachment; filename="Make it Brief.ipa.zip"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_hg9biuno0
<<FILE DATA>>
--BNDRY1--

运行separate(text)上的例子或任何gmail "查看原始"的电子邮件,以重现错误

BINGO ZINGO明白了!

显然,在PHP中,为了在涉及该变量的循环中更改变量,必须在变量前面加上'&'

添加了'&'并修复了一些一般递归错误,运行顺利。