筛选数组以查找有效内容


Filter an array for valid content

我有一个这样的数组:

$content_array = [
    " ",
    "<p>",
    " ",
    "</p>",
    "<p>",
    "</p>",
    'mycontent',
    '<img src="some-image.jpg">',
    "",
    '&nbsp;',
    'some other content',
    '<div class="1">',
    '<div class="child">',
    ' ',
    '<b>',
    'content text',
    'my other content',
    '     ',
    ' '
];

我需要返回一个这样的数组:

$content_array = [
    "<p></p><p></p>mycontent",
    '<img src="some-image.jpg">some other content',
    '<div class="1"><div class="child"><b>content text',
    'my other content',
];

所以我的计划是摆脱"不可打印"的内容,并用文本合并HTML标签,如果文本可用,那么它应该跳到下一个键。这是我的代码:

$content_array = [
    "&nbsp;",
    "<p>",
    " ",
    "</p>",
    "<p>",
    "</p>",
    'mycontent',
    '<img src="some-image.jpg">',
    "",
    '&nbsp;',
    'some other content',
    '<div class="1">',
    '<div class="child">',
    ' ',
    '<b>',
    'content text',
    'my other content',
    '     ',
    ' '
];
$the_fixed_array = fix_content($content_array);
function fix_content($content_array)
{
    $fixed_content_array = array();
    foreach ($content_array as $this_content) {
        $clean_content = preg_replace( '@([^[:print:]]|&nbsp;|'s+)@', '',$this_content);
        if (!$clean_content) continue;
        $has_text = strip_tags($clean_content);
        if ($has_text) {
            $fixed_content_array[] = $this_content;
            continue;
        }
        $where_to = count($fixed_content_array) ? count($fixed_content_array) - 1 : 0;
        if (!$where_to) {
            $fixed_content_array[$where_to] = $this_content;
            continue;
        }
        $fixed_content_array[$where_to] = $fixed_content_array[$where_to] . $this_content;
        $fixed_content_array = fix_content($fixed_content_array);
    }
    return $fixed_content_array;
}
print_r($the_fixed_array);

但它失败了,我得到了这个:

(
    [0] => </p>
    [1] => mycontent<img src="some-image.jpg">
    [2] => some other content<div class="1"><div class="child"><b>
    [3] => content text
    [4] => my other content
)

敢打赌有一个简单的方法,有人可以帮助我做到这一点吗?

解决方式

function fix_content($content_array)
{
    $fixed_content_array = array();
    foreach ($content_array as $key => $this_content) {
        $where_to = count($fixed_content_array) ? count($fixed_content_array) - 1 : 0;
        $previous_element_has_content = strip_tags($fixed_content_array[$where_to]);
        $clean_content = preg_replace( '@([^[:print:]]|&nbsp;|'s+)@', '',$this_content);
        if (!$clean_content) continue;
        $has_text = strip_tags($clean_content);
        if ($has_text) {
            //Check if previous element has just html tags:
            if ($where_to || $where_to === 0) {
                if (!$previous_element_has_content) {
                    $fixed_content_array[$where_to] = $fixed_content_array[$where_to] . $this_content;
                    continue;
                }
            }
            $fixed_content_array[] = $this_content;
            continue;
        }
        if ($previous_element_has_content) {
            $fixed_content_array[] = $this_content;
            continue;
        }
        $fixed_content_array[$where_to] = $fixed_content_array[$where_to] . $this_content;
    }
    return $fixed_content_array;
}
print_r($the_fixed_array);

但我喜欢@Oli的回答。但是通过查看结果,我注意到我实际上需要这样的东西:

[0] => <p></p><p></p>mycontent<imgsrc="some-image.jpg"><span>
[1] => someothercontent<divclass="1"><divclass="child"><b>
[2] => contenttext
[3] => myothercontent

它必须跳转到下一个键,只有当找到文本时。感谢您的帮助!

这种方式类似于你的,如果找到标签,我有一个累积字符串$tmpString:

$result_array =[];
$tmpString = '';
foreach($content_array as $value) {
    $value = preg_replace('/('s+?|('&nbsp;))/', '', $value);
    if(!empty($value)) {
        preg_match('/(<.+?>)/', $value, $matches);
        if(isset($matches[1])) {
             $tmpString .= $matches[1];
        } else {
            $result_array[] = $tmpString. $value;
            $tmpString = '';
        }
    }
}
var_dump($result_array);

使用允许的字符创建另一个数组,然后使用 in_array() 检查代码中是否允许该数组。

一个简单的例子:

$final_array = array();
$allowed_chars = array('<p>', '</p>');
foreach ($content_array as $v)
{
  if (in_array($v, $allowed_chars))
  {
    array_push($final_array, $v);
  }
}
print_r($final_array);