我使用下面的函数来检查图像是否存在于它们的位置。每次运行脚本时,它都会加载大约40-50个url,因此加载页面需要很长时间。我想在"for语句"(在脚本的末尾)中使用线程,但找不到很多关于如何做到这一点的例子。我不太熟悉php的多线程,但我在这里找到了一个使用popen的例子。
我的脚本:
function get_image_dim($sURL) {
try {
$hSock = @ fopen($sURL, 'rb');
if ($hSock) {
while(!feof($hSock)) {
$vData = fread($hSock, 300);
break;
}
fclose($hSock);
if (strpos(' ' . $vData, 'JFIF')>0) {
$vData = substr($vData, 0, 300);
$asResult = unpack('H*',$vData);
$sBytes = $asResult[1];
$width = 0;
$height = 0;
$hex_width = '';
$hex_height = '';
if (strstr($sBytes, 'ffc2')) {
$hex_height = substr($sBytes, strpos($sBytes, 'ffc2') + 10, 4);
$hex_width = substr($sBytes, strpos($sBytes, 'ffc2') + 14, 4);
} else {
$hex_height = substr($sBytes, strpos($sBytes, 'ffc0') + 10, 4);
$hex_width = substr($sBytes, strpos($sBytes, 'ffc0') + 14, 4);
}
$width = hexdec($hex_width);
$height = hexdec($hex_height);
return array('width' => $width, 'height' => $height);
} elseif (strpos(' ' . $vData, 'GIF')>0) {
$vData = substr($vData, 0, 300);
$asResult = unpack('h*',$vData);
$sBytes = $asResult[1];
$sBytesH = substr($sBytes, 16, 4);
$height = hexdec(strrev($sBytesH));
$sBytesW = substr($sBytes, 12, 4);
$width = hexdec(strrev($sBytesW));
return array('width' => $width, 'height' => $height);
} elseif (strpos(' ' . $vData, 'PNG')>0) {
$vDataH = substr($vData, 22, 4);
$asResult = unpack('n',$vDataH);
$height = $asResult[1];
$vDataW = substr($vData, 18, 4);
$asResult = unpack('n',$vDataW);
$width = $asResult[1];
return array('width' => $width, 'height' => $height);
}
}
} catch (Exception $e) {}
return FALSE;
}
for($y=0;$y<= ($image_count-1);$y++){
$dim = get_image_dim($images[$y]);
if (empty($dim)) {
echo $images[$y];
unset($images[$y]);
}
}
$images = array_values($images);
我发现的流行例子是:
for ($i=0; $i<10; $i++) {
// open ten processes
for ($j=0; $j<10; $j++) {
$pipe[$j] = popen('script.php', 'w');
}
// wait for them to finish
for ($j=0; $j<10; ++$j) {
pclose($pipe[$j]);
}
}
我不确定我的代码的哪一部分必须放在script.php中?我试着移动整个剧本,但没有成功?
关于如何实现这一点,或者是否有更好的多线程方法,有什么想法吗?谢谢
PHP本身没有多线程。你可以用pthreads
来做,但有一点经验,我可以肯定地说,这对你的需求来说太多了。
您最好使用curl
,您可以使用curl_multi_init
发起多个请求。根据PHP.net上的示例,以下内容可能适用于您的需求:
function curl_multi_callback(Array $urls, $callback, $cache_dir = NULL, $age = 600) {
$return = array();
$conn = array();
$max_age = time()-intval($age);
$mh = curl_multi_init();
if(is_dir($cache_dir)) {
foreach($urls as $i => $url) {
$cache_path = $cache_dir.DIRECTORY_SEPARATOR.sha1($url).'.ser';
if(file_exists($cache_path)) {
$stat = stat($cache_path);
if($stat['atime'] > $max_age) {
$return[$i] = unserialize(file_get_contents($cache_path));
unset($urls[$i]);
} else {
unlink($cache_path);
}
}
}
}
foreach ($urls as $i => $url) {
$conn[$i] = curl_init($url);
curl_setopt($conn[$i], CURLOPT_RETURNTRANSFER, 1);
curl_multi_add_handle($mh, $conn[$i]);
}
do {
$status = curl_multi_exec($mh, $active);
// Keep attempting to get info so long as we get info
while (($info = curl_multi_info_read($mh)) !== FALSE) {
// We received information from Multi
if (false !== $info) {
// The connection was successful
$handle = $info['handle'];
// Find the index of the connection in `conn`
$i = array_search($handle, $conn);
if($info['result'] === CURLE_OK) {
// If we found an index and that index is set in the `urls` array
if(false !== $i && isset($urls[$i])) {
$content = curl_multi_getcontent($handle);
$return[$i] = $data = array(
'url' => $urls[$i],
'content' => $content,
'parsed' => call_user_func($callback, $content, $urls[$i]),
);
if(is_dir($cache_dir)) {
file_put_contents($cache_dir.DIRECTORY_SEPARATOR.sha1($urls[$i]).'.ser', serialize($data));
}
}
} else {
// Handle failures how you will
}
// Close, even if a failure
curl_multi_remove_handle($mh, $handle);
unset($conn[$i]);
}
}
} while ($status === CURLM_CALL_MULTI_PERFORM || $active);
// Cleanup and resolve any remaining connections (unlikely)
if(!empty($conn)) {
foreach ($conn as $i => $handle) {
if(isset($urls[$i])) {
$content = curl_multi_getcontent($handle);
$return[$i] = $data = array(
'url' => $urls[$i],
'content' => $content,
'parsed' => call_user_func($callback, $content, $urls[$i]),
);
if(is_dir($cache_dir)) {
file_put_contents($cache_dir.DIRECTORY_SEPARATOR.sha1($urls[$i]).'.ser', serialize($data));
}
}
curl_multi_remove_handle($mh, $handle);
unset($conn[$i]);
}
}
curl_multi_close($mh);
return $return;
}
$return = curl_multi_callback($urls, function($data, $url) {
echo "got $url'n";
return array('some stuff');
}, '/tmp', 30);
//print_r($return);
/*
$url_dims = array(
'url' => 'http://www......',
'content' => raw content
'parsed' => return of get_image_dim
)
*/
只需重新构造原始函数get_image_dim
,即可使用原始数据并输出您想要的任何内容。
这不是一个完整的功能,可能有错误或特质需要解决,但它应该是一个很好的起点。
已更新以包含缓存。这将我在18个URLS上运行的测试从1秒更改为.007秒(缓存命中)。
注意:您可能不想像我那样缓存完整的请求内容,只想缓存url和解析的数据。