使用Beanstalkd队列进行文件分析 - File parsing with Beanstalkd Queue

File parsing with Beanstalkd Queue

本文关键字：文件 Beanstalkd 队列使用 | 更新日期: 2023-11-01

我目前正在重新编写一个文件上传程序。当前存在的用于不同数据类型的解析脚本是perl脚本。程序是用php编写的。目前的方式是，它只允许上传单个文件，一旦文件在服务器上，它就会为上传的文件的数据类型调用perl脚本。我们有20多种数据类型。

到目前为止，我所做的是编写一个允许多个文件上传的新系统。它将首先让你在上传之前验证你的属性，使用zipjs压缩它们，上传压缩后的文件，在服务器上解压缩它，对于每个文件，调用它的解析器。

我现在要说的是，对于每个文件，将解析器调用放入队列中。我不能同时运行多个解析器。草图如下。

for each file 
$job = "exec('location/to/file/parser.pl file');";
// using the pheanstalkd library 
$this->pheanstalk->useTube('testtube')->put($job);

根据文件的不同，解析可能需要2分钟或20分钟。当我将作业放入队列时，我需要确保文件2的解析器在文件1的解析器完成后启动。我怎样才能做到这一点？Thx

Beanstalk没有作业之间依赖关系的概念。你似乎有两份工作：

作业A：分析文件1
作业B:分析文件2

如果您需要作业B只在作业A之后运行，那么最简单的方法是作业A创建作业B作为其最后一个操作。

我已经实现了我想要的，即如果解析器花费的时间超过一分钟，则请求更多的时间。Worker是一个php脚本，当我为解析器可执行文件执行"exec"命令时，我可以获得进程id。我目前正在我的工作人员中使用下面的代码片段。

$job = $pheanstalk->watch( $tubeName )->reserve();
// do some more stuff here ... then 
// while the parser is running on the server
while( file_exists( "/proc/$pid" ) )
{
// make sure the job is still reserved on the queue server
    if( $job )  {
        // get the time left on the queue server for the job
        $jobStats = $pheanstalk->statsJob( $job );
        // when there is not enough time, request more
        if( $jobStats['time-left'] < 5 ){
            echo "requested more time for the job at ".$jobStats['time-left']." secs left 'n";
            $pheanstalk->touch( $job );
        }
    } 
}