我想设置一个脚本在我的DB中导入XML文件。我的问题是,我不知道如何以一种聪明的方式编写导入,以便PHP脚本识别每个子信息。有人能帮我吗?
<books>
<book attribute="123" attribute2="12345">
<basic_information>
<name addition="fooobar">fooobar</name>
<book_genre>
<genre>Action</genre>
<genre>Thriller</genre>
</book_genre>
<languages>
<language>Deutsch</language>
<language>Englisch</language>
<language>Polnisch</language>
<language>Russisch</language>
</languages>
</basic_information>
<author_information>
<name addition="fooabr">Mr_Ed</name>
</author_information>
</book>
<book attribute="123" attribute2="12345">
<basic_information>
<name addition="fooobar">fooobar</name>
<genres>
<genre>Action</genre>
<genre>Thriller</genre>
</genres>
<languages>
<language>Deutsch</language>
<language>Englisch</language>
<language>Polnisch</language>
<language>Russisch</language>
</languages>
</basic_information>
<author_information>
<name addition="fooabr">Mr_Ed</name>
</author_information>
</book>
尽管每个XML文件本身都可以表示一个数据库,但是XML和关系SQL数据库之间通常有两个根本的区别。
最明显的一个是模式。您在问题中呈现的XML 根本没有模式。根据定义,SQL数据库有一个模式。不仅你的XML没有模式,你甚至不分享任何关于它的意义。因此,最聪明的做法是完全忽略这里的任何模式。
所以给你一个的例子,如何从你的问题的XML可以变成一个数据库表。您可以创建一个由两列组成的数据库表:Path和Value。然后您可以决定在那里放入所有属性和leaf-text-nodes:+-------------------------------------------------------------+--------+
|path |value |
+-------------------------------------------------------------+--------+
|/books/book[1]/@attribute |123 |
+-------------------------------------------------------------+--------+
|/books/book[1]/@attribute2 |12345 |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/name/@addition |fooobar |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/name/text() |fooobar |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/book_genre/genre[1]/text() |Action |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/book_genre/genre[2]/text() |Thriller|
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/languages/language[1]/text()|Deutsch |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/languages/language[2]/text()|Englisch|
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/languages/language[3]/text()|Polnisch|
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/languages/language[4]/text()|Russisch|
+-------------------------------------------------------------+--------+
|/books/book[1]/author_information/name/@addition |fooabr |
+-------------------------------------------------------------+--------+
|/books/book[1]/author_information/name/text() |Mr_Ed |
+-------------------------------------------------------------+--------+
|/books/book[2]/@attribute |123 |
+-------------------------------------------------------------+--------+
|/books/book[2]/@attribute2 |12345 |
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/name/@addition |fooobar |
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/name/text() |fooobar |
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/genres/genre[1]/text() |Action |
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/genres/genre[2]/text() |Thriller|
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/languages/language[1]/text()|Deutsch |
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/languages/language[2]/text()|Englisch|
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/languages/language[3]/text()|Polnisch|
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/languages/language[4]/text()|Russisch|
+-------------------------------------------------------------+--------+
|/books/book[2]/author_information/name/@addition |fooabr |
+-------------------------------------------------------------+--------+
|/books/book[2]/author_information/name/text() |Mr_Ed |
+-------------------------------------------------------------+--------+
使用支持Xpath查询(如PHP中的dom扩展)的XML解析器创建这种转换非常简单:
$doc = new DOMDocument();
$result = $doc->loadXML($buffer);
if (!$result) {
throw new UnexpectedValueException('Could not load XML');
}
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('(//@*|(.|.//*)[not(*)]/text())');
$table = [['path', 'value']];
foreach ($nodes as $node) {
/** @var DOMNode $node */
$path = $node->getNodePath();
$value = $node->nodeValue;
$table[] = [$path, $value];
}
echo new TextTable($table);
但是这样的数据还很少规范化。显然存在重复的值。他们似乎是获得更多正常化的首要目标。例如,对于跟踪值标识的存储:
$values = new IdentityStore('value');
$table = [['path', $values->getKey()]];
foreach ($nodes as $node) {
/** @var DOMNode $node */
$path = $node->getNodePath();
$value = $values->add($node->nodeValue);
$table[] = [$path, $value];
}
echo new TextTable($table);
echo new TextTable($values);
然后将值更改为它们的id:
+-------------------------------------------------------------+--------+
|path |value_id|
+-------------------------------------------------------------+--------+
|/books/book[1]/@attribute |1 |
+-------------------------------------------------------------+--------+
|/books/book[1]/@attribute2 |2 |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/name/@addition |3 |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/name/text() |3 |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/book_genre/genre[1]/text() |4 |
+-------------------------------------------------------------+--------+
...
并给出值各自的表:
+--------+--------+
|value_id|value |
+--------+--------+
|1 |123 |
+--------+--------+
|2 |12345 |
+--------+--------+
|3 |fooobar |
+--------+--------+
|4 |Action |
+--------+--------+
|5 |Thriller|
+--------+--------+
|6 |Deutsch |
+--------+--------+
|7 |Englisch|
+--------+--------+
|8 |Polnisch|
+--------+--------+
|9 |Russisch|
+--------+--------+
|10 |fooabr |
+--------+--------+
|11 |Mr_Ed |
+--------+--------+
这个本身看起来没有多大帮助。即使现在这些值是标准化的,如何映射路径而不是映射值可能更有趣。
路径包含编码的表名。每个方括号表示表中的记录集,记录集由其前面的路径表示。如果该表位于带前缀表的另一个记录集中,则构造一个关系。
这也是一个有趣的方法:
$tables = new PathTables();
foreach ($nodes as $node) {
/** @var DOMNode $node */
$path = $node->getNodePath();
$tables->add($path, $node->nodeValue);
}
echo $tables;
但是,这些值没有被反标准化,并且模式知道它是否分组值。注意以逗号分隔的值,以注意其缺点:
=== books_book ===
+-------+----------+-----------+--------------------------------+-----------------------------+-------------------------------------------+------------------------------------------------+---------------------------------+------------------------------+---------------------------------------+
|book_id|@attribute|@attribute2|basic_information/name/@addition|basic_information/name/text()|basic_information_book_genre_genre.genre_id|basic_information_languages_language.language_id|author_information/name/@addition|author_information/name/text()|basic_information_genres_genre.genre_id|
+-------+----------+-----------+--------------------------------+-----------------------------+-------------------------------------------+------------------------------------------------+---------------------------------+------------------------------+---------------------------------------+
|1 |123 |12345 |fooobar |fooobar |1,2 |1,2,3,4 |fooabr |Mr_Ed | |
+-------+----------+-----------+--------------------------------+-----------------------------+-------------------------------------------+------------------------------------------------+---------------------------------+------------------------------+---------------------------------------+
|2 |123 |12345 |fooobar |fooobar | |1,2,3,4 |fooabr |Mr_Ed |1,2 |
+-------+----------+-----------+--------------------------------+-----------------------------+-------------------------------------------+------------------------------------------------+---------------------------------+------------------------------+---------------------------------------+
=== basic_information_book_genre_genre ===
+--------+--------+
|genre_id|text() |
+--------+--------+
|1 |Action |
+--------+--------+
|2 |Thriller|
+--------+--------+
=== basic_information_languages_language ===
+-----------+-----------------+
|language_id|text() |
+-----------+-----------------+
|1 |Deutsch,Deutsch |
+-----------+-----------------+
|2 |Englisch,Englisch|
+-----------+-----------------+
|3 |Polnisch,Polnisch|
+-----------+-----------------+
|4 |Russisch,Russisch|
+-----------+-----------------+
=== basic_information_genres_genre ===
+--------+--------+
|genre_id|text() |
+--------+--------+
|1 |Action |
+--------+--------+
|2 |Thriller|
+--------+--------+
所以在任何情况下,你都遇到了缺少模式的问题。使用XML文档和SQL数据库的模式,您可以使用定义映射的xpath表达式轻松地在两者之间进行映射。
但是如果没有,它就过于复杂了。XML中的更改将更改SQL的模式。转换错误可能不会被注意到,因此唯一直接的方法是将xpath路径映射到值。
关于如何以一种有用的方式进一步规范化肯定会很有趣,但我想说的是,这更适合电脑课,而不是问答网站。找到另外两个参考资料,一个关注数据库技术,另一个关注流传输时将XML映射到SQL结构:
- 在关系数据库中存储XML(2001年6月;作者:Igor Dayen
- 支持高效的XML流和插入RDBMS中的数据(2004年4月);作者:Timo Böhme, Erhard Rahm)
打开XML数据后,每个元素都通过格式string->fieldname
加载。
试试这个:
$books = simplexml_load_file("xmlfile.xml");
foreach($books->books->book as $book){
$attribute = $book["@attributes"]["attribute"]; //123
$attribute2 = $book["@attributes"]["attribute2"]; //12345
$name = $book->basic_information->name; //fooobar
$name_addition = $book->basic_information->name["@attributes"]["addition"]; //fooobar
$genres = $book->basic_information->book_genre; //array; $genres[0] = "Action" etc
$languages = $book->basic_information->languages; //array; $languages[0] = "Deutsch" etc
$author = $book->author_information->name; //"Mr_Ed"
$author_addition = $book->author_information->name["@attributes"]["addition"]; //fooabr
//...
}