使用正则表达式从SIP消息中提取Via头分支令牌


Extract Via header branch token from SIP message with regex

我正在尝试从SIP消息的Via:标头中提取branch=z9hG4bKlmrltg10b801lgkf0681.1。以下是我尝试过的PHP代码:

preg_match('/.branch=.* + From:/', $msg, $result)

这里是$msg:的值

"INVITE sip:3310094@mediastream.voip.cabletel.net:5060 SIP/2.0
Via: SIP/2.0/UDP 192.168.50.240:5060;branch=z9hG4bKlmrltg10b801lgkf0681.1
From: DEATON JEANETTE<sip:9123840782@mediastream.voip.cabletel.net:5060>;tag=SDg7j0c01-959bf958-d8f0f4ea-13c4-50029-140b-4d106390-140b"

如何更正正则表达式以使其正常工作?

请正确解析SIP消息。我发现你不太可能只想要分支ID,你几乎肯定想要除了伪呼叫ID之外的其他交易信息。SIP消息遵循其他几种协议(包括HTTP;-((使用的标准化消息格式,并且有几个库是为解析这种消息格式而设计的。

为了演示这是多么的简单和强大,我们首先来看看我不久前写的RFC822消息解析器类(尽管它们最近得到了改进和更新(。这些可以用于解析电子邮件,我还有一些简单的HTTP消息解析器类,它们是从以下类扩展而来的:

<?php
/**
 * Class representing the basic RFC822 message format
 *
 * @author  Chris Wright
 * @version 1.1
 */
class RFC822Message
{
    /**
     * @var array Collection of headers from the message
     */
    protected $headers = array();
    /**
     * @var string The message body
     */
    protected $body;
    /**
     * Constructor
     *
     * @param array  $headers Collection of headers from the message
     * @param string $body    The message body
     */
    public function __construct($headers, $body)
    {
        $this->headers = $headers;
        $this->body    = $body;
    }
    /**
     * Get the value of a header from the message
     *
     * @param string $name The name of the header
     *
     * @return array The value(s) of the header from the request
     */
    public function getHeader($name)
    {
        $name = strtolower(trim($name));
        return isset($this->headers[$name]) ? $this->headers[$name] : null;
    }
    /**
     * Get the message body
     *
     * @return string The message body
     */
    public function getBody()
    {
        return $this->body;
    }
}
/**
 * Factory which makes RFC822 message objects
 *
 * @author  Chris Wright
 * @version 1.1
 */
class RFC822MessageFactory
{
    /**
     * Create a new RFC822 message object
     *
     * @param array  $headers The request headers
     * @param string $body    The request body
     */
    public function create($headers, $body)
    {
        return new RFC822Message($headers, $body);
    }
}

/**
 * Parser which creates RFC822 message objects from strings
 *
 * @author  Chris Wright
 * @version 1.2
 */
class RFC822MessageParser
{
    /**
     * @var RFC822MessageFactory Factory which makes RFC822 message objects
     */
    protected $messageFactory;
    /**
     * Constructor
     *
     * @param RFC822MessageFactory $messageFactory Factory which makes RFC822 message objects
     */
    public function __construct(RFC822MessageFactory $messageFactory)
    {
        $this->messageFactory  = $messageFactory;
    }
    /**
     * Split a message into head and body sections
     *
     * @param string $message The message string
     *
     * @return array Head at index 0, body at index 1
     */
    protected function splitHeadFromBody($message)
    {
        $parts = preg_split('/'r?'n'r?'n/', ltrim($message), 2);
        return array(
            $parts[0],
            isset($parts[1]) ? $parts[1] : null
        );
    }
    /**
     * Parse the header section into a normalized array
     *
     * @param string $head The message head section
     *
     * @return array The parsed headers
     */
    protected function parseHeaders($head)
    {
        $expr =
        '!
          ^
          ([^()<>@,;:''"/[']?={} 't]+)          # Header name
          [ 't]*:[ 't]*
          (
            (?:
              (?:                               # First line of value
                (?:"(?:[^"'''']|''''.)*"|'S+)   # Quoted string or unquoted token
                [ 't]*                          # LWS
              )*
              (?:                               # Folded lines
                'r?'n
                [ 't]+                          # ...must begin with LWS
                (?:
                  (?:"(?:[^"'''']|''''.)*"|'S+) # ...followed by quoted string or unquoted tokens
                  [ 't]*                        # ...and maybe some more LWS
                )*
              )*
            )?
          )
          'r?$
        !smx';
        preg_match_all($expr, $head, $matches);
        $headers = array();
        for ($i = 0; isset($matches[0][$i]); $i++) {
            $name = strtolower($matches[1][$i]);
            if (!isset($headers[$name])) {
                $headers[$name] = array();
            }
            $value = preg_replace('/'s+("(?:[^"'''']|''''.)*"|'S+)/s', ' $1', $matches[2][$i]);
            $headers[$name][] = $value;
        }
        return $headers;
    }
    /**
     * Create a message object from a string
     *
     * @param string $message The message string
     *
     * @return RFC822Message The parsed message object
     */
    public function parseMessage($message)
    {
        list($head, $body) = $this->splitHeadFromBody($message);
        $headers = $this->parseHeaders($head);
        return $this->requestFactory->create($headers, $body);
    }
}

如果你忽略了解析邮件头的可怕正则表达式,那就没有什么特别可怕的了:-p-但说真的,这些类可以不加修改地用于解析电子邮件的邮件头部分,这是RFC822格式消息的基础。

SIP以HTTP为模型,因此,只要对HTTP消息解析类进行一些相当简单的修改,我们就可以很容易地将它们调整为SIP。让我们来看看这些——在这些类中,我(或多或少(搜索了HTTP,并将其替换为SIP:

<?php
/**
 * Abstract class representing a SIP message
 *
 * @author  Chris Wright
 * @version 1.0
 */
abstract class SIPMessage extends RFC822Message
{
    /**
     * @var string The message protocol version
     */
    protected $version;
    /**
     * Constructor
     *
     * @param array  $headers Collection of headers from the message
     * @param string $body    The message body
     * @param string $version The message protocol version
     */
    public function __construct($headers, $body, $version)
    {
        parent::__construct($headers, $body);
        $this->version = $version;
    }
    /**
     * Get the message protocol version
     *
     * @return string The message protocol version
     */
    public function getVersion()
    {
        return $this->version;
    }
}
/**
 * Class representing a SIP request message
 *
 * @author  Chris Wright
 * @version 1.0
 */
class SIPRequest extends SIPMessage
{
    /**
     * @var string The request method
     */
    private $method;
    /**
     * @var string The request URI
     */
    private $uri;
    /**
     * Constructor
     *
     * @param array  $headers The request headers
     * @param string $body    The request body
     * @param string $version The request protocol version
     * @param string $method  The request method
     * @param string $uri     The request URI
     */
    public function __construct($headers, $body, $version, $method, $uri)
    {
        parent::__construct($headers, $body, $version);
        $this->method  = $method;
        $this->uri     = $uri;
    }
    /**
     * Get the request method
     *
     * @return string The request method
     */
    public function getMethod()
    {
        return $this->method;
    }
    /**
     * Get the request URI
     *
     * @return string The request URI
     */
    public function getURI()
    {
        return $this->uri;
    }
}
/**
 * Class representing a SIP response message
 *
 * @author  Chris Wright
 * @version 1.0
 */
class SIPResponse extends SIPMessage
{
    /**
     * @var int The response code
     */
    private $code;
    /**
     * @var string The response message
     */
    private $message;
    /**
     * Constructor
     *
     * @param array  $headers The request headers
     * @param string $body    The request body
     * @param string $version The request protocol version
     * @param int    $code    The response code
     * @param string $message The response message
     */
    public function __construct($headers, $body, $version, $code, $message)
    {
        parent::__construct($headers, $body, $version);
        $this->code    = $code;
        $this->message = $message;
    }
    /**
     * Get the response code
     *
     * @return int The response code
     */
    public function getCode()
    {
        return $this->code;
    }
    /**
     * Get the response message
     *
     * @return string The response message
     */
    public function getMessage()
    {
        return $this->message;
    }
}
/**
 * Factory which makes SIP request objects
 *
 * @author  Chris Wright
 * @version 1.0
 */
class SIPRequestFactory extends RFC822MessageFactory
{
    /**
     * Create a new SIP request object
     *
     * The last 3 arguments of this method are only optional  to prevent PHP from triggering
     * an E_STRICT at compile time. IMO this particular error is itself an error on the part
     * of the PHP designers,  and I don't feel bad  about about this workaround,  even if it
     * does mean the signature is technically wrong. It is the lesser of two evils.
     *
     * @param array  $headers The request headers
     * @param string $body    The request body
     * @param string $version The request protocol version
     * @param string $method  The request method
     * @param string $uri     The request URI
     */
    public function create($headers, $body, $version = null, $method = null, $uri = null)
    {
        return new SIPRequest($headers, $body, $version, $method, $uri);
    }
}
/**
 * Factory which makes SIP response objects
 *
 * @author  Chris Wright
 * @version 1.0
 */
class SIPResponseFactory extends RFC822MessageFactory
{
    /**
     * Create a new SIP response object
     *
     * The last 3 arguments of this method are only optional  to prevent PHP from triggering
     * an E_STRICT at compile time. IMO this particular error is itself an error on the part
     * of the PHP designers,  and I don't feel bad  about about this workaround,  even if it
     * does mean the signature is technically wrong. It is the lesser of two evils.
     *
     * @param array  $headers The response headers
     * @param string $body    The response body
     * @param string $version The response protocol version
     * @param int    $code    The response code
     * @param string $message The response message
     */
    public function create($headers, $body, $version = null, $code = null, $message = null)
    {
        return new SIPResponse($headers, $body, $version, $code, $message);
    }
}
/**
 * Parser which creates SIP message objects from strings
 *
 * @author  Chris Wright
 * @version 1.0
 */
class SIPMessageParser extends RFC822MessageParser
{
    /**
     * @var SIPRequestFactory Factory which makes SIP request objects
     */
    private $requestFactory;
    /**
     * @var SIPResponseFactory Factory which makes SIP response objects
     */
    private $responseFactory;
    /**
     * Constructor
     *
     * @param SIPRequestFactory  $requestFactory  Factory which makes SIP request objects
     * @param SIPResponseFactory $responseFactory Factory which makes SIP response objects
     */
    public function __construct(SIPRequestFactory $requestFactory, SIPResponseFactory $responseFactory)
    {
        $this->requestFactory  = $requestFactory;
        $this->responseFactory = $responseFactory;
    }
    /**
     * Remove the request line from the message and parse into tokens
     *
     * @param string $head The message head section
     *
     * @return array The parsed request line at index 0, the remainder of the message at index 1
     *
     * @throws 'DomainException When the request line of the message is invalid
     */
    private function removeAndParseRequestLine($head)
    {
        // Note: this method  forgives a couple of minor standards violations, mostly for benefit
        // of some older  Polycom phones and for Voispeed,  who seem to make  stuff up as they go
        // along.  It also  treats the  whole line as  case-insensitive  even though  methods are
        // officially case-sensitive,  because having two different casings of the same verb mean
        // different things makes no sense semantically or implementationally.
        // Side note, from RFC3261:
        // > The SIP-Version string is case-insensitive, but implementations MUST send upper-case
        // Wat. Go home Rosenberg, et. al., you're drunk.
        $parts = preg_split('/'r?'n/', $head, 2);
        $expr =
          '@^
            (?:
              ([^'r'n 't]+) [ 't]+ ([^'r'n 't]+) [ 't]+ SIP/('d+'.'d+) # request
             |
              SIP/('d+'.'d+) [ 't]+ ('d+) [ 't]+ ([^'r'n]+)            # response
            )
           $@ix';
        if (!preg_match($expr, $parts[0], $match)) {
            throw new 'DomainException('Request-Line of the message is invalid');
        }
        if (empty($match[4])) { // request
            $requestLine = array(
                'method'  => strtoupper($match[1]),
                'uri'     => $match[2],
                'version' => $match[3]
            );
        } else { // response
            $requestLine = array(
                'version' => $match[4],
                'code'    => (int) $match[5],
                'message' => $match[6]
            );
        }
        return array(
            $requestLine,
            isset($parts[1]) ? $parts[1] : ''
        );
    }
    /**
     * Create the appropriate message object from a string
     *
     * @param string $message The message string
     *
     * @return SIPRequest|SIPResponse The parsed message object
     *
     * @throws 'DomainException When the message string is not valid SIP message
     */
    public function parseMessage($message)
    {
        list($head, $body) = $this->splitHeadFromBody($message);
        list($requestLine, $head) = $this->removeAndParseRequestLine($head);
        $headers = $this->parseHeaders($head);
        if (isset($requestLine['uri'])) {
            return $this->requestFactory->create(
                $headers,
                $body,
                $requestLine['version'],
                $requestLine['method'],
                $requestLine['uri']
            );
        } else {
            return $this->responseFactory->create(
                $headers,
                $body,
                $requestLine['version'],
                $requestLine['code'],
                $requestLine['message']
            );
        }
    }
}

似乎有很多代码只是为了提取一个标头值,不是吗?是的。但这不是,只是它的作用。它将整个消息解析为一个数据结构,该结构提供了对任意数量信息的轻松访问,允许(或多或少(标准可以向您抛出的任何东西。

所以,让我们来看看你将如何实际使用它:

// First we create a parser object
$messageParser = new SIPMessageParser(
  new SIPRequestFactory,
  new SIPResponseFactory
);
// Parse the message into an object
try {
  $message = $messageParser->parseMessage($msg);
} catch (Exception $e) {
  // The message parsing failed, handle the error here
}
// Get the value of the Via: header
$via = $message->getHeader('Via');
// SIP is irritatingly non-specific about the format of branch IDs. This
// expression matches either a quoted string or an unquoted token, which is
// about all that you can say for sure about arbitrary implementations.
$expr = '/branch=(?:"((?:[^"'''']|''''.)*)"|(.+?)(?:'s|;|$))/i';
// NB: this assumes the message has a single Via: header and a single branch ID.
// In reality this is rarely the case for messages that are received, although
// it is usually the case for messages before they are sent.
if (!preg_match($expr, $via[0], $matches)) {
  // The Via: header does not contain a branch ID, handle this error
}
$branchId = !empty($matches[2]) ? $matches[2] : $matches[1];
var_dump($branchId);

看到它工作

对于眼前的问题来说,这个答案无疑是大材小用。然而,我认为这是解决这个问题的正确方法。

preg_match('/branch=.*/i', $msg, $result);
print_r($result);

会产生类似的结果

Array
(
    [0] => branch=z9hG4bKlmrltg10b801lgkf0681.1
)

试试这个

$str = "INVITE sip:3310094@mediastream.voip.cabletel.net:5060 SIP/2.0
Via: SIP/2.0/UDP 192.168.50.240:5060;branch=z9hG4bKlmrltg10b801lgkf0681.1
From: DEATON JEANETTE<sip:9123840782@mediastream.voip.cabletel.net:5060>;tag=SDg7j0c01-959bf958-d8f0f4ea-13c4-50029-140b-4d106390-140b";
preg_match('/branch=(.*)From:/i', $str, $output);
print_r( $output );

试试这个正则表达式。它检查branch代码后面是否有空格或换行符。您想要的结果总是存储在$output[0]

$str = "INVITE sip:3310094@mediastream.voip.cabletel.net:5060 SIP/2.0
Via: SIP/2.0/UDP 192.168.50.240:5060;branch=z9hG4bKlmrltg10b801lgkf0681.1 From: DEATON JEANETTE<sip:9123840782@mediastream.voip.cabletel.net:5060>;tag=SDg7j0c01-959bf958-d8f0f4ea-13c4-50029-140b-4d106390-140b";
preg_match('/(branch=.*)( |'r'n)/', $str, $output);
print_r( $output ); // $output[0] is what you need

示例:http://codepad.viper-7.com/Gj0lWD

您可以使用这样的前瞻性断言:

preg_match_all('/.branch=(.*?)(?=^'S|'Z)/sm', $msg, $matches);

这里,(?=^'S|'Z)断言一个新行,后面跟着一个非空格(又名折叠标题(或主题结尾。这就是比赛应该结束的地方

或者只匹配branch=,直到行的末尾:

preg_match_all('/.branch=(.*)/m', $msg, $matches);

适用于未折叠的页眉

另请参阅:HTTP标头的基本规则