robots.txt模式匹配不起作用 - robots.txt pattern matching not working

robots.txt pattern matching not working

我需要获得一个模式匹配规则来获得这个结果。

allow /dir/path_name.htm/something
disallow /dir/path_name/something
and disallow /dir/path_name.htm

事实上，这两个不允许是一直以来积累的打字错误。那些页面永远不存在。如何阻止谷歌再也不会抓取它们？

我在这里进行了测试：http://www.frobee.com/robots-txt-check/有了以下内容，但似乎什么都不起作用。

Allow: /dir/*.htm/?*
Disallow: /dir/*

出了什么问题？非常感谢。

根据规范：

http://www.robotstxt.org/norobots-rfc.txt

不允许使用通配符（*）。路径完全匹配。我的猜测是，你正在使用某种形式的重写，你不希望出现多个具有相同内容的ulr。在这种情况下，这可能是一个更好的解决方案：

http://googlewebmastercentral.blogspot.de/2009/02/specify-your-canonical.html