如何在PHP扩展名中捕获上传的文件数据


How to capture uploaded file data in PHP extension

我现在正在用c/c ++编写PHP扩展。用户上传文件(可以是开机自检(或 PUT 方法,但我可以将其限制为仅 POST(。我需要捕获文件数据已上传,未将其写入服务器上的磁盘。我需要处理数据和(也许,视情况而定(将其发送到其他地方或保存到磁盘。我当然知道,我可以处理文件上传后(保存在服务器上的磁盘上(,但我会喜欢避免它。我还需要做一些相反的事情:我需要生成一个文件"在飞"并发送给用户。生成的文件的所有元数据都是事先已知的(例如大小、名称(。

我已经四处寻找了一段时间,但找不到任何接近解决方案的东西。是否有任何示例或现有的 PHP 扩展可以做某事像这样(至少是类似的(?

我无法评论挂钩到上传过程,但对于下载部分,您需要:

  1. 处理下载请求和发送 HTTP 标头的 PHP 脚本;
    根据 RFC 2183,必须注意文件名,实际上只允许 us-ASCII。
  2. PHP 扩展中的函数/方法,用于将数据流式传输到浏览器

PHP 脚本

下面是一个完整的 PHP 脚本,它还可以检查是否只请求了一定范围的所需文件:

<?php
// sanity checks ...

// script must not timeout
set_time_limit(0);
// user abortion is checked in extension while streaming the data
ignore_user_abort(true);

$filename = $_GET['filename'];
// TODO determine filesize
$filesize = 0;
$offset = 0;
$range_len = -1;
$have_valid_range = false;
if (isset($_SERVER['HTTP_RANGE']))
{
    // split 'bytes=n-m'
    list($range_type, $range) = explode('=', $_SERVER['HTTP_RANGE']);
    // split 'n-m' or 'n-'
    $range = explode('-', $range);
    // range type can only be 'bytes', check it anyway
    $have_valid_range = ($range_type == 'bytes') && is_array($range);
    if (!$have_valid_range)
    {
        header('HTTP/1.1 416 Requested Range Not Satisfiable', true, 416);
        exit;
    }
    if ($range[0] > $filesize)
    {
        $range[0] = $filesize;
    }
    if ((!$range[1]             )   || 
        ($range[1] > $filesize  )   )
    {
        $range[1] = $filesize;
    }
    $offset = $range[0];
    $range_len = $range[1]-$range[0]+1;
}
$attachment_filename = 'xyz';

// send metadata
header('Accept-Ranges: bytes');
if ($have_valid_range)
{
    header('HTTP/1.1 206 Partial Content', true, 206);
    header('Content-Length: ' . $range_len);
    header('Content-Range: bytes ' . $range[0] . '-' . $range[1] . ($filesize > 0 ? ('/' . $filesize) : ''));
}
else if ($filesize > 0)
{
    header('Content-Length: ' . $filesize);
}
// a note about the suggested filename for saving the attachment:
// It's not as easy as one might think!
// We deal (in our php scripts) with utf-8 and the filename is either the export profile's name or a term 
// entered by the user in the download form. Now the big problem is:
// According to the rfc for the Content-Disposition header only us-ascii characters are allowed! 
// (see http://greenbytes.de/tech/webdav/rfc2183.html, section "the filename parameter")
// However, all major browsers accept the filename to be encoded in iso-8859-1 (at least).
// There are other forms like: filename*="utf-8''<urlencoded filename>" but not 
// all browsers support this (most notably IE, only firefox and opera at the moment);
// (see http://greenbytes.de/tech/tc2231/ for testcases)
// 
// Additionally, IE doesn't like so much the '.' and ';' because it treats them as the beginning of the file extension,  
// and then thinks that it deals with a .*&%$§ file instead of a .zip file.
// The double quote '"' is already used as a delimiter for the filename parameter and it's unclear to me 
// how browsers would handle it.
// 
// Hence the procedure to produce a safe suggested filename as the least common denominator is as follows:
// Replace characters to be known as problematic with an underscore and encode the filename in iso-8859-1;
// Note that '?' (they can also result from utf8_decode()), '*', '<', '>', '|', ';', ':', '.', ''' are replaced by 
// firefox and IE with '_' anyway, additionally '#' by IE - meaning that they offer a filename with the mentioned 
// characters replaced by the underscore, i.e.: abc äöü +~*?ß=}'!§$%&/()´`<>|,-_:__@?'_{[]#.zip  -->  abc äöü +~__ß=}'!§$%&_()´`___,-____@___{[]#.zip 
$safe_attachment_fname = utf8_decode(str_replace(array('.', ';', '"'), '_', $attachment_filename)) . '.zip';
$filename_param = 'filename="' . $safe_attachment_fname . '"';
header('Content-Transfer-Encoding: binary');
header('Content-Type: application/zip');
header('Content-Disposition: attachment; ' . $filename_param);
// file can be cached forever by clients and proxies
header('Cache-Control: public');

// disable output buffering, stream directly to the browser;
// in fact, this is a must, otherwise php might crash
while (ob_get_level())
    ob_end_flush();

// stream data
ext_downstreamdata($filename, $offset, $range_len);
?>

从 C/C++ 流式传输

现在,对于 C++ 部分,上面 php-script 中提到的函数ext_downstreamdata()完全是特定于实现的,但数据流本身可以泛化。

例如,我的任务是将多层应用程序中的文件数据直接从应用程序服务器流式传输到浏览器。

下面是一个函数,它充当C++代码中流函数的回调,接收指向数据及其长度的指针(返回 windows 错误代码(:

unsigned long stream2browser(const void* pdata, size_t nLen)
{
    if (nLen)
    {
        // fetch zend's tls stuff
        TSRMLS_FETCH();
        // send data via the zend engine to the browser;
        // this method uses whatever output buffer mechanism (compression, ...) is in use;
        // It's a really good idea to turn off all output buffer levels in the php script because of 
        // strange crashes somewhere within the zend engine (or in one of the extensions?)
        // I did some debugging and the browser really crashes and it's not our fault, turning off the output 
        // buffering solves all problems; you turn it off like this in the script:
        //  <code>
        //  while (ob_get_level())
        //      ob_end_flush();
        //  </code>
        // I'm thinking to use an unbuffered output function (e.g. php_ub_body_write) but don't know for sure how to use it, so 
        // still stay away from it and rely on the script having disabled all output buffers
        // note: php_write returns int but this value is the bytes sent to the browser (which is nLen)
        size_t nSent = php_write((void*) pdata, uint(nLen) TSRMLS_CC);
        if (nSent < nLen)
        {
            if (PG(connection_status) & PHP_CONNECTION_ABORTED)
                return ERROR_CANCELLED;
            else
                return ERROR_NOT_CAPABLE;
        }
    }
    return ERROR_SUCCESS;
}