正则表达式与长模式的问题


RegExp trouble with a LONG pattern

通常,在执行 Reqular 表达式代码时,字符串很长,模式很短,但这次是相反的。我有一篇大约 500 个字符的简短文本。在该文本中,我想找到与大约 47.000 个唯一名称的数据库匹配的名称,并添加指向特定名称的链接。最好的方法是什么?我将名称数组划分为 64 个分区,因为一个数组作为模式大到进程。

function implode_r ($glue, $pieces){
    $out = "";
    foreach ($pieces as $piece){
        if (is_array ($piece)){
            $out .= implode_r ($glue, $piece); // recurse
        }
        else{
            if(strlen($piece)>1){
                $piece = str_replace("(", "'(", $piece);
                $piece = str_replace(")", "')", $piece);
                $piece = str_replace("[", "'[", $piece);
                $piece = str_replace("]", "']", $piece);
                $piece = str_replace(":", "':", $piece);
                $piece = str_replace(".", "'.", $piece);
                $piece = str_replace(",", "',", $piece);
                $piece = str_replace("'", "''", $piece);
                $piece = str_replace("&", "'&", $piece);
                $piece = str_replace("?", "'?", $piece);
                $piece = str_replace("!", "'!", $piece);
                $piece = str_replace("<", "'<", $piece);
                $piece = str_replace(">", "'>", $piece);
                $piece = str_replace("{", "'{", $piece);
                $piece = str_replace("}", "'}", $piece);
                $out .= $glue.$piece;
            }
        }
    }
    return $out;
}
function partition( $list, $p ) {
    $listlen = count( $list );
    $partlen = floor( $listlen / $p );
    $partrem = $listlen % $p;
    $partition = array();
    $mark = 0;
    for ($px = 0; $px < $p; $px++) {
        $incr = ($px < $partrem) ? $partlen + 1 : $partlen;
        $partition[$px] = array_slice( $list, $mark, $incr );
        $mark += $incr;
    }
    return $partition;
}
add_filter( 'the_content', 'find_names_in_text');
add_filter( 'get_the_content', 'find_names_in_text');
function find_names_in_text($content){
    global $wpdb;
    $thenames = $wpdb->get_results("SELECT post_title FROM $wpdb->posts WHERE post_type='dogs' GROUP BY post_title", ARRAY_N);
    $namesparts = partition($thenames, 64);
    foreach($namesparts as $part){
        $pattern = implode_r("|", $part);
        $content = preg_replace("(".$pattern.")", "<a href='$1'>$1</a>", $content);
    }
    return $content;
}

如果你的文本只有 500 个字符,我会反过来处理。将文本分成可能是名称的部分(假设这些是单词,我认为没有拆分单词的名称)。

所以现在你有