有限的内容破坏了php中的HTML布局 - Limited content break the HTML layout in php

当我试图限制description的内容时，我遇到了一个问题，我试过这样做：

<?php 
$intDescLt = 400;
$content   = $arrContentList[$arr->nid]['description'];
$excerpt   = substr($content, 0, $intDescLt);
?>
<div class="three16 DetailsDiv">
    <?php echo $excerpt; ?>
<div>

在描述字段中，如果我只是把内容放在没有html标记的地方，效果很好，但如果我把内容放了html标记，并且如果limit在结束标记之前达到末尾，它会将该选项卡样式应用于之后的所有内容。

所以我需要知道如何解决这个问题。

示例问题：

$string = "<p><b>Lorem Ipsum</b> is simply dummy text of the printing and typesetting industry.</p>";
echo substr($string, 0, 15);

控制台中的Html输出：Lorem Ipsu现在它将标记应用于页面中的其余内容。

控制台中的预期输出：Lorem Ipsu

您不能只在HTML字符串上使用PHP的二进制字符串函数，然后期望它正常工作。

$string = "<p><b>Lorem Ipsum</b> is simply dummy text of the printing and typesetting industry.</p>";

首先，您需要制定要在HTML上下文中创建的摘录类型。让我们举一个关注实际文本长度（以字符为单位）的例子。这是而不是计算HTML标记的大小。此外，标签应保持关闭状态。

首先创建一个DOMDocument，这样您就可以对现有的HTML片段进行操作。加载的$string将是<body>标签的子节点，因此代码也会获取它以供参考：

$doc    = new DOMDocument();
$result = $doc->loadHTML($string);
if (!$result) {
    throw new InvalidArgumentException('String could not be parsed as HTML fragment');
}
$body = $doc->getElementsByTagName('body')->item(0);

Next需要按文档顺序对其中的所有节点进行操作。在xpath查询的帮助下，可以很容易地实现这些节点的迭代：

$xp    = new DOMXPath($doc);
$nodes = $xp->query('./descendant::node()', $body);

然后需要实现关于如何创建摘录的逻辑。也就是说，所有文本节点都将被接管，直到它们的长度超过剩余的字符数。如果是这样，它们将被拆分，或者如果没有从父级中删除任何字符：

$length = 0;
foreach ($nodes as $node) {
    if (!$node instanceof DOMText) {
        continue;
    }
    $left = max(0, 15 - $length);
    if ($left) {
        if ($node->length > $left) {
            $node->splitText($left);
            $node->nextSibling->parentNode->removeChild($node->nextSibling);
        }
        $length += $node->length;
    } else {
        $node->parentNode->removeChild($node);
    }
}

最后，您需要将body标记的内部HTML转换为字符串以获得结果：

$buffer = '';
foreach ($body->childNodes as $node) {
    $buffer .= $doc->saveHTML($node);
}
echo $buffer;

这将给您以下结果：

<p><b>Lorem Ipsum</b> is </p>

由于节点元素已被更改，但只有文本节点，因此这些元素仍然完好无损。只是文本被缩短了。文档对象模型允许您根据需要进行遍历、字符串操作以及节点删除。

可以想象，像substr()这样更简单的字符串函数同样不能处理HTML。

事实上，可能还有更多的事情要做：字符串中的HTML可能无效（请检查Tidy扩展名），您可能需要删除HTML属性和标记（图像、脚本、iframe），还可能需要考虑标记的大小。DOM将允许您这样做。

完整示例（在线演示）：

<?php
/**
 * Limited content break the HTML layout in php
 *
 * @link http://stackoverflow.com/a/29323396/367456
 * @author hakre <http://hakre.wordpress.com>
 */
$string = "<p><b>Lorem Ipsum</b> is simply dummy text of the printing and typesetting industry.</p>";
echo substr($string, 0, 15), "'n";
$doc    = new DOMDocument();
$result = $doc->loadHTML($string);
if (!$result) {
    throw new InvalidArgumentException('String could not be parsed as HTML fragment');
}
$body = $doc->getElementsByTagName('body')->item(0);
$xp    = new DOMXPath($doc);
$nodes = $xp->query('./descendant::node()', $body);
$length = 0;
foreach ($nodes as $node) {
    if (!$node instanceof DOMText) {
        continue;
    }
    $left = max(0, 15 - $length);
    if ($left) {
        if ($node->length > $left) {
            $node->splitText($left);
            $node->nextSibling->parentNode->removeChild($node->nextSibling);
        }
        $length += $node->length;
    } else {
        $node->parentNode->removeChild($node);
    }
}
$buffer = '';
foreach ($body->childNodes as $node) {
    $buffer .= $doc->saveHTML($node);
}
echo $buffer;

好的，给出您提供的示例：

$string = "<p><b>Lorem Ipsum</b> is simply dummy text of the printing and typesetting industry.</p>";
$substring = substr((addslashes($string)),0,15);

如果您想关闭所有未关闭的标签，一个可能的解决方案是使用DOMDocument类：

$doc = new DOMDocument();
$doc->loadHTML($substring);
$yourText = $doc->saveHTML($doc->getElementsByTagName('*')->item(2));
//item(0) = html
//item(1) = body
echo htmlspecialchars($yourText);
//<p><b>Lorem Ips</b></p>