如何在本地缩短文件


How to ocassionally shorten files?

这是我使用的代码:

if(mt_rand(0,20000)==0)
{
    $lines = file($fileName);
    if (count($lines)>50000)
    {
        $lines=array_slice($lines, -50000, 50000, true);
    }
    $result=implode("'n",lines);
    file_put_contents($fileName, $result . "'n",FILE_APPEND);
}

我经常遇到这样的错误:

[25-Nov-2013 23:20:40 UTC] PHP Fatal error:  Allowed memory size of 33554432 bytes exhausted (tried to allocate 33 bytes) in /home/datetrng/public_html/checkblocked.php on line 40
[26-Nov-2013 02:41:54 UTC] PHP Fatal error:  Allowed memory size of 33554432 bytes exhausted (tried to allocate 27 bytes) in /home/datetrng/public_html/checkblocked.php on line 40
[26-Nov-2013 09:56:49 UTC] PHP Fatal error:  Allowed memory size of 33554432 bytes exhausted (tried to allocate 72 bytes) in /home/datetrng/public_html/checkblocked.php on line 40
[26-Nov-2013 12:44:32 UTC] PHP Fatal error:  Allowed memory size of 33554432 bytes exhausted (tried to allocate 2097152 bytes) in /home/datetrng/public_html/checkblocked.php on line 40
[26-Nov-2013 13:53:31 UTC] PHP Fatal error:  Allowed memory size of 33554432 bytes exhausted (tried to allocate 2097152 bytes) in /home/datetrng/public_html/checkblocked.php on line 40

如果我们只想通过擦除文件的开头来缩短文件,我想阅读整个文件可能不是一个好主意。

知道其他选择吗?

fopen fwrite fseek可能会派上用场

我认为您只需要文件中的最后50000行。

if(mt_rand(0,20000)==0)
{
    $tmp_file = $fileName . '.tmp';
    $cmd = "tail -n 50000  $fileName > $tmp_file";
    exec($cmd);
    rename($tmp_file, $fileName);
}

纯php的更新

我制作了一个大约100000行的文件:

<?php
$file_name = 'tmp.dat';
$f = fopen($file_name, 'w');
for ($i = 0; $i < 1000000; $i++)
{
    fwrite($f, str_pad($i, 100, 'x') . "'n");
}
fclose($f);

这个文件大约有970M

[huqiu@localhost home]$ ll -h  tmp.dat
-rw-rw-r-- 1 huqiu huqiu 97M Nov 27 06:08 tmp.dat

读取最后50000行

<?php
$file_name = 'tmp.dat';
$remain_count = 50000;
$begin_time = microtime(true);
$temp_file_name = $file_name . '.tmp';
$fp = fopen($file_name, 'r');
$total_count = 0;
while(fgets($fp))
{
    $total_count++;
}
echo 'total count: ' . $total_count . "'n";
if ($total_count > $remain_count)
{
    $start = $total_count - $remain_count;
    echo 'start: ' . $start . "'n";
    $temp_fp = fopen($temp_file_name, 'w');
    $index = 0;
    rewind($fp);
    while($line = fgets($fp))
    {
        $index++;
        if ($index > $start)
        {
            fwrite($temp_fp, $line);
        }
    }
    fclose($temp_fp);
}
fclose($fp);
echo 'time: ' . (microtime(true) - $begin_time), "'n";
rename($temp_file_name, $file_name);

经过时间:0.63908791542053

total count: 1000000
start: 950000
time: 0.63908791542053

结果:

[huqiu@localhost home]$ ll -h tmp.dat  
-rw-rw-r-- 1 huqiu huqiu 4.9M Nov 27 06:23 tmp.dat

为什么不将fseek指针指向要消除的点之后的位置?使用fpassthru来节省一些内存可能也会更好。