使用PHP从单词数组中查找字符串中不连续的单词组合


Find non-consecutive combinations of words in a string from array of words using PHP

我正在寻找一种方法,从$length$pattern$search的任何组合的(可能)非连续匹配模式的字符串中返回起始位置和匹配模式。

在我的例子中,查找数字是单词的电话号码。

$subject = "hello my name is inigo montoya you killed my father please call me at eight zero zero five five five one to three four prepare to die"
$search = array("zero", "one", "two", "to", "too" "three", "four", "five", "six", "seven", "eight", "nine")
$length = 10;
$result = jedi_find_trick($subject,$search,$length);

$result设置为数组:

$result[0]["start"] = 70
$result[0]["match"] = "eight zero zero five five five one to three four"
$result[1] ... 

生成$search的所有可能组合是我的前进方向,但我觉得还有一个更优雅的解决方案,谢谢你的建议。


根据@chris85的建议,这似乎是一个很好的起点:

$subject = 'hello my name is inigo montoya you killed my father please call me at eight zero zero five five five one to three four or too oh five seven seven seven five one one one prepare to die';
$search = array('zero','oh','one','two','too','to','three','four','five','six','seven','eight','nine','hundred','thousand');
$replace = array('0','0','1','2','2','2','3','4','5','6','7','8','9','00','000');
$length = 10;
$result = jedi_find_trick($subject,$search,$replace,10);
$result = jedi_find_trick($subject,$search,$replace,$length);
print_r($result);
function jedi_find_trick($subject,$search,$replace,$length) {
    preg_match_all('/('h*(' . implode('|', $search) . ')'h*){10}/', $subject, $numbers);
    foreach($numbers[0] as $match) {
        $number = str_replace($search,$replace,$match);
        $number = str_replace(' ', '', $number);
        $number = ' ' . $number . ' ';
        $subject = str_replace($match,$number,$subject);
    }
    return $subject;
}

退货:

hello my name is inigo montoya you killed my father please call me at 8005551234 or 2057775111 prepare to die

对于str_replace(),"too"需要在$search中的"to"之前,否则您将得到"2o"。一些纪念preg_replace()的单词边界应该清理一下。

类似这样的东西:

$subject = 'hello my name is inigo montoya you killed my father please call me '
         . 'at eight zero zero five five five one to three four prepare to die';
$search = ['zero', 'one', 'two', 'to', 'too', 'three', 'four', 'five', 'six',
           'seven', 'eight', 'nine'];
$length = 10;
function jedi_find_trick($search, $subject, $length, $sep = ' ', $septype = 0) {
    // quote special characters in the search list
    $search = array_map(function ($i) { return preg_quote($i, '~'); }, $search);
    // quote the separator when it is a literal string
    if ($septype === 0) $sep = preg_quote($sep, '~');
    // build the pattern
    $altern = '(?:' . implode('|', $search) . ')';
    $format = '~(?:%1$s|'A)(%2$s'
            . ($length<2 ? '': '(?:%1$s%2$s){%3$d}')
            . ')(?=%1$s|'z)~';
    $pattern = sprintf($format, $sep, $altern, $length - 1);
    if (preg_match_all($pattern, $subject, $matches, PREG_OFFSET_CAPTURE))
        return $matches[1];
    // return an empty array if there is no match
    return [];
}
print_r(jedi_find_trick($search, $subject, $length));
print_r(jedi_find_trick($search, $subject, 8, ''h+', 1));

默认情况下,分隔符是一个空格。当分隔符类型不为0时,这意味着分隔符必须被视为子模式(因此不需要转义特殊字符)。