如何从文本中删除所有字母数字单词


How to remove all alphanumeric words from the text?

我正在尝试用PHP编写正则表达式,它只需删除字母数字单词(包含数字的单词),但不删除带有标点符号和类似特殊字符(如价格、电话号码等)的数字。

应删除的单词:

1stH202ndO23rdNUMB3RSRüthen1Wrocław2

不应删除的单词:

05.510$100£65+44(20)123ext:1244.4-BSD

这是迄今为止的代码:

$text = 'To remove: 1st H20; 2nd O2; 3rd NUMB3RS; To leave: Digits: -2 0 5.5 10, Prices: $100 or £65, Phone: +44 (20) 123 ext:124, 4.4-BSD';
$pattern = '/'b'w*'d'w*'b-?/';
echo $text, preg_replace($pattern, " ", $text);

然而,它删除了所有单词,包括数字、价格和电话。

到目前为止,我还尝试了以下模式:

/(''s+''w{1,2}(?=''W+))|(''s+[a-zA-Z0-9_-]+''d+)/ # Removes digits, etc.
/[^('w|'d|''|'"|'.|'!|'?|;|,|''|'/|'-|:|'&|@)]+/ # Doesn't work.
/(''s+''w{1,2}(?=''W+))|(''s+[a-zA-Z0-9_-]+''d+)/ # Removes too much.
/[^'p{L}'p{N}-]+/u                       # It removes only special characters.
/(^['D]+'s|'s['D]+'s|'s['D]+$|^['D]+$)+/ # Removes words.
/ ?'b[^ ]*[0-9][^ ]*'b/i                 # Almost, but removes digits, price, phone.
/'s+['w-]*'d['w-]*|['w-]*'d['w-]*'s*/    # Almost, but removes digits, price, phone.
/'b'w*'d'w*'b-?/                         # Almost, but removes digits, price, phone.
/[A-Za-z0-9]*[A-Za-z][A-Za-z0-9]*/       # Almost, but removes too much.

我在SO和其他网站上发现了这一点(大多数网站通常过于具体),这些网站假设删除带数字的单词,但事实并非如此。

我怎么能写一个简单的正则表达式,在不接触其他东西的情况下删除这些单词?

示例文本:

删除:1st H202nd O23rd NUMB3RS

离开:数字:-2 0 5.5 10,价格:100美元或65英镑,电话:+44(20)123分机:124,4.4-BSD

预期输出:

删除:;离开:数字:-2 0 5.5 10,价格:100美元或65英镑,电话:+44(20)123分机:124,4.4-BSD

什么都不替换'b(?=[a-z]+'d|[a-z]*'d+[a-z]+)'w*'b's*怎么样?

演示:https://regex101.com/r/jA2fW3/1

图案代码:

$pattern = '/'b(?=[a-z]+'d|[a-z]*'d+[a-z]+)'w*'b's*/i';

要匹配包含外来/重音字母的字母数字单词,请使用以下模式:

$pattern = '/'b(?=['pL]+'d|['pL]*'d+['pL]+)['pL'w]*'b's*/i';

演示:https://regex101.com/r/jA2fW3/3

您可以按照如下方式修改正则表达式以获得所需的输出。

$text = preg_replace('/'b(?:[a-z]+'d+[a-z]*|'d+[a-z]+)'b/i', '', $text);

要匹配任何语言的任何类型的字母,请使用Unicode属性'p{L}:

$text = preg_replace('/'b(?:'pL+'d+'pL*|'d+'pL+)'b/u', '', $text);