正则表达式/preg_replace 提取部件号(子字符串） - regex/preg_replace to extract the part number (substring)

regex/preg_replace to extract the part number (substring)

我对正则表达式不是很满意。

使用案例

我使用三个变量，即$url、$pattern和$replacement，并打算按如下方式使用它们：

$url = $node->attr("href");
$resource = ExtractResourceWithoutHtmlExtension($url); // This is jus to abstract the stripping off of the prepended path and cutting the `.html` (see Edit 2 & 3 below).
$pattern =  ...
$replacement = ${1}; // Not very sure of this value
$partno = preg_replace($pattern, replacement, $resource);
echo '"'.$partno.'";"'.$node->attr("title").'";"'.$url.'"'."'n";

部件号和资源方案映射（字符串）

大多数时候

35000-0295 => 产品名称为 slug-35000-0295

27021-0012 => 产品名称为 slug-27021-0012

或很少

38811 => 产品名称为 slug-38811

最后但并非最不重要的（边缘情况 =>没有什么可提取的）
如果部件号不可用，则资源子字符串将只是

designation-of-the-products-as-slug

我仍然更喜欢正则表达式解决方案，因为构成部件号的段内的数字长度可能会有所不同。

问题

我应该为$pattern和$replacement分配什么？

编辑 1（供参考）

子字符串designation-of-the-products-as-slug是可变的~~，路径/to/可以是任意深度~~。

编辑 2（供参考）

再三考虑，我意识到没有必要对整个URL路径使用正则表达式：http://path/to/ 可以剥夺使用parse_url，explode和array_pop。相应地编辑了我的帖子。

编辑 3（供参考）

复杂性也可以通过削减不可变的尾随子字符串.html来降低。参见下面的@bloodyKnuckles评论。帖子相应编辑。

首先，

我会使用 parse_url 和 pathinfo 的组合来去除字符串中多余的位，然后将 preg_filter 与正则表达式一起使用，例如 /.*?('d+['d-]*)$/ 来抓取最后一块数字以及可选的后续连字符和数字。

例：

$urls = [
    "http://example.com/path/to/designation-of-the-products-as-slug-35000-0295.extension",
    "http://example.com/path/to/designation-of-the-products-as-slug-35000.html",
    "http://example.com/path/to/designation-of-the-products-as-slug.ext?foo=bar.baz"
];
$regex = '/.*?('d+['d-]*)$/';
foreach ($urls as $url) {
    $resource = pathinfo(parse_url($url, PHP_URL_PATH), PATHINFO_FILENAME);
    echo preg_filter($regex, '$1', $resource), "'n";
}

输出：

35000-0295
35000