我需要获取标签并从长文本中计算它们。我知道我可以用regex做到这一点,但我做不到。如果你能帮我,我将不胜感激。这是我的示例文本;
#巴黎#爱#春天#户外#生活#伊斯坦布尔#圣器节#巴黎#法国#乳胶#狗这就是世界,毕竟巴黎是一场由对比记忆组成的无休止的战斗。雨没了,我看得很清楚#音乐我能看到路上的所有障碍#巴黎#queenstret#foreveronvocation从未有过如此迷人的感觉#ski#音乐#滑雪#滑雪者#地球公园#巴黎#滑雪板#单板滑雪板#长板滑雪板#longboarding#longboarder#滑板运动员#滑板#冬季#只有我的声音和我的好朋友Danny Marin会为我们的听觉探索做dj#堆栈#over#flow to be or not be#诗歌#音乐#巴黎
我只需要得到像"#paris"这样的标签,并计算每个标签,最后通过迭代排序标签。例如
#巴黎(6)
#音乐(3)
#。。。(2)
#。。。(2)
#。。。(1)
#。。。(1)
#。。。(1)
preg_match_all("/('#'w+)/", $string, $array);
$array = array_count_values($array[1]);
asort($array);
foreach($array as $key => $value) {
echo "$key ($value)<br>'n";
}
应该给你你需要的
编辑:很抱歉忘记了数组的索引
工作示例:
http://sandbox.onlinephpfunctions.com/code/d1fe24cbc8deedd24f7825ea4e48eaa691b8d401
在'#'上将字符串拆分为一个数组
在"上拆分该数组的每个元素,只保留第一个单词
获取每个令牌的计数并存储在并行阵列中
使用并行数组进行排序
您可以使用array_count_values,这里有一个例子:
<?php
$html = <<< EOF
#paris #love #spring #outdoor #life #istanbul #par #sacrecoeur #paris #france #latex #dog Thats what the world is, paris after all, an endless battle of contrasting memories. I can see clearly now the rain is gone. #music I can see all obstacles in my way. #paris #queenstreet #foreveronvocationNever felt more glamorous. #ski #music #skiing #skier #terrainpark #paris #snowboard #snowboarding #snowboarder #longboard #longboarding #longboarder #skateboard #skateboarder #skateboarding #winter #just my voice and my good friend Danny Marin will dj for our auditory exploration. #stack #over #flow to be or not to be #poem #music #paris
EOF;
preg_match_all('/(#.*?'S+)/im', $html, $hTags, PREG_PATTERN_ORDER);
print_r(array_count_values($hTags[1]));
输出:
Array
(
[#paris] => 5
[#love] => 1
[#spring] => 1
[#outdoor] => 1
[#life] => 1
[#istanbul] => 1
[#par] => 1
[#sacrecoeur] => 1
[#france] => 1
[#latex] => 1
[#dog] => 1
[#music] => 3
[#queenstreet] => 1
[#foreveronvocationNever] => 1
[#ski] => 1
[#skiing] => 1
[#skier] => 1
[#terrainpark] => 1
[#snowboard] => 1
[#snowboarding] => 1
[#snowboarder] => 1
[#longboard] => 1
[#longboarding] => 1
[#longboarder] => 1
[#skateboard] => 1
[#skateboarder] => 1
[#skateboarding] => 1
[#winter] => 1
[#just] => 1
[#stack] => 1
[#over] => 1
[#flow] => 1
[#poem] => 1
)
Regex解释:
(#.*?'S+)
Match the regex below and capture its match into backreference number 1 «(#.*?'S+)»
Match the character “#” literally «#»
Match any single character that is NOT a line break character (line feed) «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «'S+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
实时演示
如果你愿意,你可以使用PHP来完成这个技巧:
$tagString = "#paris #love #spring #outdoor #life #istanbul #par #sacrecoeur #paris #france #latex #dog Thats what the world is, paris after all, an endless battle of contrasting memories. I can see clearly now the rain is gone. #music I can see all obstacles in my way. #paris #queenstreet #foreveronvocationNever felt more glamorous. #ski #music #skiing #skier #terrainpark #paris #snowboard #snowboarding #snowboarder #longboard #longboarding #longboarder #skateboard #skateboarder #skateboarding #winter #just my voice and my good friend Danny Marin will dj for our auditory exploration. #stack #over #flow to be or not to be #poem #music #paris";
$countArray = array();
foreach (explode("#", trim($tagString, '#')) as $tag) {
$tag = trim($tag);
if (array_key_exists($tag, $countArray)) {
$countArray[$tag] = (int) $countArray[$tag] + 1;
} else {
$countArray[$tag] = 1;
}
}
arsort($countArray);
var_dump($countArray);
其给出:
array(34) {
["paris"]=>
int(5)
["music"]=>
int(2)
["skateboard"]=>
int(1)
["snowboarding"]=>
int(1)
["snowboarder"]=>
int(1)
["longboard"]=>
int(1)
["longboarding"]=>
int(1)
["longboarder"]=>
int(1)
["skateboarder"]=>
int(1)
["terrainpark"]=>
int(1)
["skateboarding"]=>
int(1)
["winter"]=>
int(1)
["just my voice and my good friend Danny Marin will dj for our auditory exploration."]=>
int(1)
["stack"]=>
int(1)
["over"]=>
int(1)
["flow to be or not to be"]=>
int(1)
["snowboard"]=>
int(1)
["skier"]=>
int(1)
["love"]=>
int(1)
["skiing"]=>
int(1)
["ski"]=>
int(1)
["foreveronvocationNever felt more glamorous."]=>
int(1)
["queenstreet"]=>
int(1)
["music I can see all obstacles in my way."]=>
int(1)
["dog Thats what the world is, paris after all, an endless battle of contrasting memories. I can see clearly now the rain is gone."]=>
int(1)
["latex"]=>
int(1)
["france"]=>
int(1)
["sacrecoeur"]=>
int(1)
["par"]=>
int(1)
["istanbul"]=>
int(1)
["life"]=>
int(1)
["outdoor"]=>
int(1)
["spring"]=>
int(1)
["poem"]=>
int(1)
}
你可以在这里在线测试:http://sandbox.onlinephpfunctions.com/code/3058b887590845e33685b25e14e21df9959e94e7