使用simplexml导入XML


php- XML import with simplexml

我想设置一个脚本在我的DB中导入XML文件。我的问题是,我不知道如何以一种聪明的方式编写导入,以便PHP脚本识别每个子信息。有人能帮我吗?

<books>
   <book attribute="123" attribute2="12345">
    <basic_information>
        <name addition="fooobar">fooobar</name>
        <book_genre>
            <genre>Action</genre>
            <genre>Thriller</genre>
        </book_genre>
        <languages>
            <language>Deutsch</language>
            <language>Englisch</language>
            <language>Polnisch</language>
            <language>Russisch</language>
        </languages>
    </basic_information>
    <author_information>
        <name addition="fooabr">Mr_Ed</name>
    </author_information>
</book>
<book attribute="123" attribute2="12345">
    <basic_information>
        <name addition="fooobar">fooobar</name>
        <genres>
            <genre>Action</genre>
            <genre>Thriller</genre>
        </genres>
        <languages>
            <language>Deutsch</language>
            <language>Englisch</language>
            <language>Polnisch</language>
            <language>Russisch</language>
        </languages>
    </basic_information>
    <author_information>
        <name addition="fooabr">Mr_Ed</name>
    </author_information>
</book>

尽管每个XML文件本身都可以表示一个数据库,但是XML和关系SQL数据库之间通常有两个根本的区别。

最明显的一个是模式。您在问题中呈现的XML 根本没有模式。根据定义,SQL数据库有一个模式。

不仅你的XML没有模式,你甚至不分享任何关于它的意义。因此,最聪明的做法是完全忽略这里的任何模式。

所以给你一个的例子,如何从你的问题的XML可以变成一个数据库表。您可以创建一个由两列组成的数据库表:PathValue。然后您可以决定在那里放入所有属性和leaf-text-nodes:
+-------------------------------------------------------------+--------+
|path                                                         |value   |
+-------------------------------------------------------------+--------+
|/books/book[1]/@attribute                                    |123     |
+-------------------------------------------------------------+--------+
|/books/book[1]/@attribute2                                   |12345   |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/name/@addition              |fooobar |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/name/text()                 |fooobar |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/book_genre/genre[1]/text()  |Action  |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/book_genre/genre[2]/text()  |Thriller|
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/languages/language[1]/text()|Deutsch |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/languages/language[2]/text()|Englisch|
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/languages/language[3]/text()|Polnisch|
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/languages/language[4]/text()|Russisch|
+-------------------------------------------------------------+--------+
|/books/book[1]/author_information/name/@addition             |fooabr  |
+-------------------------------------------------------------+--------+
|/books/book[1]/author_information/name/text()                |Mr_Ed   |
+-------------------------------------------------------------+--------+
|/books/book[2]/@attribute                                    |123     |
+-------------------------------------------------------------+--------+
|/books/book[2]/@attribute2                                   |12345   |
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/name/@addition              |fooobar |
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/name/text()                 |fooobar |
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/genres/genre[1]/text()      |Action  |
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/genres/genre[2]/text()      |Thriller|
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/languages/language[1]/text()|Deutsch |
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/languages/language[2]/text()|Englisch|
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/languages/language[3]/text()|Polnisch|
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/languages/language[4]/text()|Russisch|
+-------------------------------------------------------------+--------+
|/books/book[2]/author_information/name/@addition             |fooabr  |
+-------------------------------------------------------------+--------+
|/books/book[2]/author_information/name/text()                |Mr_Ed   |
+-------------------------------------------------------------+--------+ 

使用支持Xpath查询(如PHP中的dom扩展)的XML解析器创建这种转换非常简单:

$doc    = new DOMDocument();
$result = $doc->loadXML($buffer);
if (!$result) {
    throw new UnexpectedValueException('Could not load XML');
}
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('(//@*|(.|.//*)[not(*)]/text())');
$table = [['path', 'value']];
foreach ($nodes as $node) {
    /** @var DOMNode $node */
    $path    = $node->getNodePath();
    $value   = $node->nodeValue;
    $table[] = [$path, $value];
}
echo new TextTable($table);

但是这样的数据还很少规范化。显然存在重复的值。他们似乎是获得更多正常化的首要目标。例如,对于跟踪值标识的存储:

$values = new IdentityStore('value');
$table = [['path', $values->getKey()]];
foreach ($nodes as $node) {
    /** @var DOMNode $node */
    $path  = $node->getNodePath();
    $value = $values->add($node->nodeValue);
    $table[] = [$path, $value];
}
echo new TextTable($table);
echo new TextTable($values);

然后将值更改为它们的id:

+-------------------------------------------------------------+--------+
|path                                                         |value_id|
+-------------------------------------------------------------+--------+
|/books/book[1]/@attribute                                    |1       |
+-------------------------------------------------------------+--------+
|/books/book[1]/@attribute2                                   |2       |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/name/@addition              |3       |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/name/text()                 |3       |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/book_genre/genre[1]/text()  |4       |
+-------------------------------------------------------------+--------+
...

并给出值各自的表:

+--------+--------+
|value_id|value   |
+--------+--------+
|1       |123     |
+--------+--------+
|2       |12345   |
+--------+--------+
|3       |fooobar |
+--------+--------+
|4       |Action  |
+--------+--------+
|5       |Thriller|
+--------+--------+
|6       |Deutsch |
+--------+--------+
|7       |Englisch|
+--------+--------+
|8       |Polnisch|
+--------+--------+
|9       |Russisch|
+--------+--------+
|10      |fooabr  |
+--------+--------+
|11      |Mr_Ed   |
+--------+--------+

这个本身看起来没有多大帮助。即使现在这些值是标准化的,如何映射路径而不是映射值可能更有趣。

路径包含编码的表名。每个方括号表示表中的记录集,记录集由其前面的路径表示。如果该表位于带前缀表的另一个记录集中,则构造一个关系。

这也是一个有趣的方法:

$tables = new PathTables();
foreach ($nodes as $node) {
    /** @var DOMNode $node */
    $path = $node->getNodePath();
    $tables->add($path, $node->nodeValue);
}
echo $tables;

但是,这些值没有被反标准化,并且模式知道它是否分组值。注意以逗号分隔的值,以注意其缺点:

===  books_book  ===
+-------+----------+-----------+--------------------------------+-----------------------------+-------------------------------------------+------------------------------------------------+---------------------------------+------------------------------+---------------------------------------+
|book_id|@attribute|@attribute2|basic_information/name/@addition|basic_information/name/text()|basic_information_book_genre_genre.genre_id|basic_information_languages_language.language_id|author_information/name/@addition|author_information/name/text()|basic_information_genres_genre.genre_id|
+-------+----------+-----------+--------------------------------+-----------------------------+-------------------------------------------+------------------------------------------------+---------------------------------+------------------------------+---------------------------------------+
|1      |123       |12345      |fooobar                         |fooobar                      |1,2                                        |1,2,3,4                                         |fooabr                           |Mr_Ed                         |                                       |
+-------+----------+-----------+--------------------------------+-----------------------------+-------------------------------------------+------------------------------------------------+---------------------------------+------------------------------+---------------------------------------+
|2      |123       |12345      |fooobar                         |fooobar                      |                                           |1,2,3,4                                         |fooabr                           |Mr_Ed                         |1,2                                    |
+-------+----------+-----------+--------------------------------+-----------------------------+-------------------------------------------+------------------------------------------------+---------------------------------+------------------------------+---------------------------------------+
===  basic_information_book_genre_genre  ===
+--------+--------+
|genre_id|text()  |
+--------+--------+
|1       |Action  |
+--------+--------+
|2       |Thriller|
+--------+--------+
===  basic_information_languages_language  ===
+-----------+-----------------+
|language_id|text()           |
+-----------+-----------------+
|1          |Deutsch,Deutsch  |
+-----------+-----------------+
|2          |Englisch,Englisch|
+-----------+-----------------+
|3          |Polnisch,Polnisch|
+-----------+-----------------+
|4          |Russisch,Russisch|
+-----------+-----------------+
===  basic_information_genres_genre  ===
+--------+--------+
|genre_id|text()  |
+--------+--------+
|1       |Action  |
+--------+--------+
|2       |Thriller|
+--------+--------+

所以在任何情况下,你都遇到了缺少模式的问题。使用XML文档和SQL数据库的模式,您可以使用定义映射的xpath表达式轻松地在两者之间进行映射。

但是如果没有,它就过于复杂了。XML中的更改将更改SQL的模式。转换错误可能不会被注意到,因此唯一直接的方法是将xpath路径映射到值。

关于如何以一种有用的方式进一步规范化肯定会很有趣,但我想说的是,这更适合电脑课,而不是问答网站。找到另外两个参考资料,一个关注数据库技术,另一个关注流传输时将XML映射到SQL结构:

  • 在关系数据库中存储XML(2001年6月;作者:Igor Dayen
  • 支持高效的XML流和插入RDBMS中的数据(2004年4月);作者:Timo Böhme, Erhard Rahm)

打开XML数据后,每个元素都通过格式string->fieldname加载。

试试这个:

$books = simplexml_load_file("xmlfile.xml");
foreach($books->books->book as $book){
    $attribute = $book["@attributes"]["attribute"]; //123
    $attribute2 = $book["@attributes"]["attribute2"]; //12345
    $name = $book->basic_information->name; //fooobar
    $name_addition = $book->basic_information->name["@attributes"]["addition"]; //fooobar
    $genres = $book->basic_information->book_genre; //array; $genres[0] = "Action" etc
    $languages = $book->basic_information->languages; //array; $languages[0] = "Deutsch" etc
    $author = $book->author_information->name; //"Mr_Ed"
    $author_addition = $book->author_information->name["@attributes"]["addition"]; //fooabr
    //...
}