使用php对for语句进行多线程处理


Multi-threading a for statement with php

我使用下面的函数来检查图像是否存在于它们的位置。每次运行脚本时,它都会加载大约40-50个url,因此加载页面需要很长时间。我想在"for语句"(在脚本的末尾)中使用线程,但找不到很多关于如何做到这一点的例子。我不太熟悉php的多线程,但我在这里找到了一个使用popen的例子。

我的脚本:

function get_image_dim($sURL) {
  try {
    $hSock = @ fopen($sURL, 'rb');
    if ($hSock) {
      while(!feof($hSock)) {
        $vData = fread($hSock, 300);
        break;
      }
      fclose($hSock);
      if (strpos(' ' . $vData, 'JFIF')>0) {
        $vData = substr($vData, 0, 300);
        $asResult = unpack('H*',$vData);        
        $sBytes = $asResult[1];
        $width = 0;
        $height = 0;
        $hex_width = '';
        $hex_height = '';
        if (strstr($sBytes, 'ffc2')) {
          $hex_height = substr($sBytes, strpos($sBytes, 'ffc2') + 10, 4);
          $hex_width = substr($sBytes, strpos($sBytes, 'ffc2') + 14, 4);
        } else {
          $hex_height = substr($sBytes, strpos($sBytes, 'ffc0') + 10, 4);
          $hex_width = substr($sBytes, strpos($sBytes, 'ffc0') + 14, 4);
        }
        $width = hexdec($hex_width);
        $height = hexdec($hex_height);
        return array('width' => $width, 'height' => $height);
      } elseif (strpos(' ' . $vData, 'GIF')>0) {
        $vData = substr($vData, 0, 300);
        $asResult = unpack('h*',$vData);
        $sBytes = $asResult[1];
        $sBytesH = substr($sBytes, 16, 4);
        $height = hexdec(strrev($sBytesH));
        $sBytesW = substr($sBytes, 12, 4);
        $width = hexdec(strrev($sBytesW));
        return array('width' => $width, 'height' => $height);
      } elseif (strpos(' ' . $vData, 'PNG')>0) {
        $vDataH = substr($vData, 22, 4);
        $asResult = unpack('n',$vDataH);
        $height = $asResult[1];        
        $vDataW = substr($vData, 18, 4);
        $asResult = unpack('n',$vDataW);
        $width = $asResult[1];        
        return array('width' => $width, 'height' => $height);
      }
    }
  } catch (Exception $e) {}
  return FALSE;
}
for($y=0;$y<= ($image_count-1);$y++){
$dim = get_image_dim($images[$y]);
    if (empty($dim)) {
    echo $images[$y];
    unset($images[$y]);
    }
}
$images = array_values($images);

我发现的流行例子是:

for ($i=0; $i<10; $i++) {
    // open ten processes
    for ($j=0; $j<10; $j++) {
        $pipe[$j] = popen('script.php', 'w');
    }
    // wait for them to finish
    for ($j=0; $j<10; ++$j) {
        pclose($pipe[$j]);
    }
}

我不确定我的代码的哪一部分必须放在script.php中?我试着移动整个剧本,但没有成功?

关于如何实现这一点,或者是否有更好的多线程方法,有什么想法吗?谢谢

PHP本身没有多线程。你可以用pthreads来做,但有一点经验,我可以肯定地说,这对你的需求来说太多了。

您最好使用curl,您可以使用curl_multi_init发起多个请求。根据PHP.net上的示例,以下内容可能适用于您的需求:

function curl_multi_callback(Array $urls, $callback, $cache_dir = NULL, $age = 600) {
    $return = array();
    $conn = array();
    $max_age = time()-intval($age);
    $mh = curl_multi_init();
    if(is_dir($cache_dir)) {
        foreach($urls as $i => $url) {
            $cache_path = $cache_dir.DIRECTORY_SEPARATOR.sha1($url).'.ser';
            if(file_exists($cache_path)) {
                $stat = stat($cache_path);
                if($stat['atime'] > $max_age) {
                    $return[$i] = unserialize(file_get_contents($cache_path));
                    unset($urls[$i]);
                } else {
                    unlink($cache_path);
                }
            }
        }
    }
    foreach ($urls as $i => $url) {
        $conn[$i] = curl_init($url);
        curl_setopt($conn[$i], CURLOPT_RETURNTRANSFER, 1);
        curl_multi_add_handle($mh, $conn[$i]);
    }
    do {
        $status = curl_multi_exec($mh, $active);
        // Keep attempting to get info so long as we get info
        while (($info = curl_multi_info_read($mh)) !== FALSE) {
            // We received information from Multi
            if (false !== $info) {
                //  The connection was successful
                $handle = $info['handle'];
                // Find the index of the connection in `conn`
                $i = array_search($handle, $conn);
                if($info['result'] === CURLE_OK) {
                    // If we found an index and that index is set in the `urls` array
                    if(false !== $i && isset($urls[$i])) {
                        $content = curl_multi_getcontent($handle);
                        $return[$i] = $data = array(
                            'url'     => $urls[$i],
                            'content' => $content,
                            'parsed'  => call_user_func($callback, $content, $urls[$i]),
                        );
                        if(is_dir($cache_dir)) {
                            file_put_contents($cache_dir.DIRECTORY_SEPARATOR.sha1($urls[$i]).'.ser', serialize($data));
                        }
                    }
                } else {
                    // Handle failures how you will
                }
                // Close, even if a failure
                curl_multi_remove_handle($mh, $handle);
                unset($conn[$i]);
            }
        }
    } while ($status === CURLM_CALL_MULTI_PERFORM || $active);

    // Cleanup and resolve any remaining connections (unlikely)
    if(!empty($conn)) {
        foreach ($conn as $i => $handle) {
            if(isset($urls[$i])) {
                $content = curl_multi_getcontent($handle);
                $return[$i] = $data = array(
                    'url'     => $urls[$i],
                    'content' => $content,
                    'parsed'  => call_user_func($callback, $content, $urls[$i]),
                );
                if(is_dir($cache_dir)) {
                    file_put_contents($cache_dir.DIRECTORY_SEPARATOR.sha1($urls[$i]).'.ser', serialize($data));
                }
            }
            curl_multi_remove_handle($mh, $handle);
            unset($conn[$i]);
        }
    }
    curl_multi_close($mh);
    return $return;
}
$return = curl_multi_callback($urls, function($data, $url) {
    echo "got $url'n";
    return array('some stuff');
}, '/tmp', 30);
//print_r($return);
/*
$url_dims = array(
    'url'     => 'http://www......',
    'content' => raw content
    'parsed'  => return of get_image_dim
)
*/

只需重新构造原始函数get_image_dim,即可使用原始数据并输出您想要的任何内容。

这不是一个完整的功能,可能有错误或特质需要解决,但它应该是一个很好的起点。

已更新以包含缓存。这将我在18个URLS上运行的测试从1秒更改为.007秒(缓存命中)。

注意:您可能不想像我那样缓存完整的请求内容,只想缓存url和解析的数据。