提取所有的电子邮件头,包括身体部分,从邮件在php


Extract all the email headers including body part from mail in php

我想在php中使用regex从以下链邮件中提取正文部分。锁甲文件保存为txt格式。在提取时,如果body标记中存在html标记,则应该保持不变。

 $content = <<<HEREDOC
    From: Matrimony <matrimony@mangalsutrabandhan.in>
    Sent: Fri, 12 Aug 2011 16:17:40
    To: "matrimony@mangalsutrabandhan.com" <matrimony@mangalsutrabandhan.in>
    Subject: Re: bride search

    From: brides <sales@mangalsutrabandhan.com>
    Sent: Fri, 12 Aug 2011 15:49:52
    To: "Matrimony " <matrimony@mangalsutrabandhan.in>
    Cc: "groom" <brides@mangalsutrabandhan.com>
    Subject: Re: bride search
    PFA
    Regds.,
    sales

    From: shaadi <kundaali@mangalsutrabandhan.in>
    Sent: Tue, 22 Feb 2011 16:40:24
    To: <vivaah@mangalsutrabandhan.com>, <bandhan@mangalsutrabandhan.com>
    Cc: "'lagna '" <lagna@mangalsutrabandhan.in>, <movies@mangalsutrabandhan.in>, <manishv@mangalsutrabandhan.com>, "'beta data'" <channel@mangalsutrabandhan.com>, "'test S'" <city@mangalsutrabandhan.com>
    Subject: Re:data transfer would be made live for 145 test
    This is to inform you that we are going to test today.

    Activity Timing: 9:00 PM onwards

    Thanks and Regards,
    free matrimony
    shaadi Operations

     P  Please do not print this e-mail unless it is absolutely necessary
    From: shaadi [nikaah:kundaali@mangalsutrabandhan.in]
    Sent: 21 February 2011 23:09
    To: vivaah@mangalsutrabandhan.com; bandhan@mangalsutrabandhan.com
    Cc: 'lagna '; movies@mangalsutrabandhan.in; manishv@mangalsutrabandhan.com; 
    Subject: data transfer would be made live for 145 test

    Hi,
    gtsdhsdbh
    anbdsmbsa
    sda the data test .
    Would request you to send in your feedback.

    Thanks and Regards,

    beta data
    assa xyz

     P  Please do not print this e-mail unless it is absolutely necessary

    HEREDOC;

O/p

Array
(
    [0] => Array
        (
            [0] => Re: bride search

            [1] => Re: bride search
PFA
Regds.,
sales

            [2] => Re:data transfer would be made live for 145 test
This is to inform you that we are going to test today.

Activity Timing: 9:00 PM onwards

Thanks and Regards,
free matrimony
shaadi Operations

 P  Please do not print this e-mail unless it is absolutely necessary

        )
    [1] => Array
        (
            [0] => Re: bride search

            [1] => Re: bride search
PFA
Regds.,
sales

            [2] => Re:data transfer would be made live for 145 test
This is to inform you that we are going to test today.

Activity Timing: 9:00 PM onwards

Thanks and Regards,
free matrimony
shaadi Operations

 P  Please do not print this e-mail unless it is absolutely necessary

        )
)

0/p

上面的正则表达式
preg_match_all('/(?<=Subject: )(.*?['n]['s]*?)(?=From:)/is',$content,$rest);

,但它不给出最后一个,因为它没有'from'来获取中间数据。希望它很清楚。请让我知道是否还有其他的方法。

preg_match_all('/(?m:^From:'x20(?<From>[^'n]*)'n^Sent:'x20(?<Sent>[^'n]*)'n^To:'x20(?<To>[^'n]*)'n(?:^Cc:'x20(?<Cc>[^'n]*)'n)?^Subject:'x20(?<Subject>[^'n]*)'n)(?<Body>.*?(?=(?:'nFrom:)|$))/s',$content,$matches);
echo "<pre>".print_r($matches,true);

它提供了几乎正确的o/p。我应该提供的文本文件http://www.mangalsutrabandhan.com

您将需要一些更聪明的解析来理解这一点-无论产生该文件的是什么,都在改变电子邮件的结构:

Subject: Re: bride search
PFA

邮件头和正文之间至少要有一个空行。

那么你就有了顶部发布的问题(如果不知道时区,你就不能依赖报头中的时间戳),不完整的报头和没有引号。

所以即使你建立了一个启发式来解析这个,也有太多的情况它无法应付。