尝试过滤日志文件中的网址，如“/fcfc/fcf/fc.php” - Trying to filter log-file for urls like "/fcfc/fcf/fc.php"

Trying to filter log-file for urls like "/fcfc/fcf/fc.php"

本文关键字：fcfc fcf php fc 日志过滤文件 | 更新日期: 2023-09-27

在我的apache-access-logs中，我收到了很多来自机器人的无效请求（可能）。

所有无效的网址都遵循相同的模式，我想用正则表达式过滤它们。

下面是一些示例：

/oaoa/oao/oa.php
/fcfc/fcf/fc.php 
/mcmc/mcm/mc.php 
/rxrx/rxr/rx.php 
/wlwl/wlw/wl.php 
/nini/nin/ni.php 
/gigi/gig/gi.php 
/jojo/joj/jo.php 
/okok/oko/ok.php

我可以看到模式，但我不知道如何构建一个与此模式匹配的（php-）正则表达式，但不是这样的东西。

/help/one/xy.php
/some/oth/er.php

我希望你们中的任何人都知道一个解决方案，如果可能的话。

如果这是您的确切输入，则以下正则表达式应该可以解决问题

/'/(.)(.)'1'2'/'1'2'1'/'1'2'.php/

https://regex101.com/r/rU2sE6/2

_{注意：有趣的问题，尽管您应该向我们展示您尝试过的内容。这就是为什么我把这个答案作为社区维基，以免赢得任何声誉。}

因此，诀窍是捕获组中的角色，然后断言它存在于下一个块中。我猜有点神秘，但这是正则表达式：

^                 # Assert begin of line
(?:               # Non-capturing group
   (              # Capturing group 1
      /           # Match a forward slash
      [^/]+       # Match anything not a forward slash one or more times
   )              # End of capturing group 1
   [^/]           # Match anything not a forward slash one time
   (?='1)         # Assert that what we've matched in group 1 is ahead of us
                  # (ie: a forward slash + the characters - the last character)
)+                # End of non-capturing group, repeat this one or more times
'1'.php           # Match what we've matched in group 1 followed by a dot and "php"
$                 # Assert end of line

不要忘记使用 m 修饰符和x修饰符。

在线演示

对于您列出的这些非常具体的情况，这里有一个简单的正则表达式可以匹配它们：

/([a-z])([a-z])'1'2/'1'2'1/'1'2.php

'1和'2是对第一组和第二组的引用。可能需要对正斜杠进行转义。这本质上是说匹配一个字符，然后是另一个字符，然后匹配第一个字符，然后匹配第二个字符，用斜杠等。