通过使用CakePHP saveMany关联插入的记录来避免重复


Avoid duplicates by associating to inserted records with CakePHP saveMany

我试图利用CakePHP的saveMany功能(以及相关的数据功能),但却创建了重复的记录。我认为这是因为find()查询没有找到作者,因为事务尚未提交到数据库。

这意味着,例如,如果电子表格中有两个用户名相同的作者,那么CakePHP将不会将第二个与第一个相关联,而是创建两个。我为这篇文章编了一些代码:

/*
 * Foobar user (not in database) entered twice, whereas Existing user 
 * (in database) is associated
 */
$spreadsheet_rows = array(
    array(
      'title' => 'New post',
      'author_username' => 'foobar',
      'content' => 'New post'
    ),
    array(
      'title' => 'Another new post',
      'author_username' => 'foobar',
      'content' => 'Another new post'
    ),
    array(
      'title' => 'Third post',
      'author_username' => 'Existing user',
      'content' => 'Third post'
    ),
    array(
      'title' => 'Fourth post', // author_id in this case would be NULL
      'content' => 'Third post'
    ),
);

$posts = array();
foreach ($spreadsheet_rows as $row) {
    /*
     * This query doesn't pick up the authors
     * entered automatically (see comment 2.)
     * within the db transaction by CakePHP,
     * so creates duplicate author names
     */
    $author = $this->Author->find('first', array('conditions' => array('Author.username' => $row['author_username'])));
    $post = array(
        'title' => $row['title'],
        'content' => $row['content'],
    );
    /*
     * Associate post to existing author
     */
    if (!empty($author)) {
        $post['author_id'] = $author['Author']['id'];
    } else {
        /*
         * 2. CakePHP creates and automatically
         * associates new author record if author_username is not blank
         * (author_id is NULL in db if blank)
         */
        if (!empty($ow['author_username'])) {            
             $post['Author']['username'] = $row['author_username'];
        }
    }
    $posts[] = $post;
}

$this->Post->saveMany($posts, array('deep' => true));

有什么方法可以实现这一点,同时保持交易?

更新

你对保存没有关联作者的帖子的新要求改变了很多情况,正如评论中提到的,CakePHPs模型保存方法不能同时保存来自不同模型的数据,如果它不是关联,如果你需要在事务中这样做,那么你需要手动处理。

保存作者及其帖子,而不是帖子及其作者

我建议您以另一种方式保存数据,即保存作者及其相关帖子,这样您就可以通过使用用户名对重复用户的数据进行分组来轻松处理重复用户。

这样一来,CakePHP将只在必要时创建新的作者,并自动向帖子添加适当的外键。

数据的格式应该是这样的:

Array
(
    [0] => Array
        (
            [username] => foobar
            [Post] => Array
                (
                    [0] => Array
                        (
                            [title] => New post
                        )
                    [1] => Array
                        (
                            [title] => Another new post
                        )
                )
        )
    [1] => Array
        (
            [id] => 1
            [Post] => Array
                (
                    [0] => Array
                        (
                            [title] => Third post
                        )
                )
        )
)

您可以通过Author型号保存:

$this->Author->saveMany($data, array('deep' => true));

单独存储非关联帖子,并手动使用交易

如果你想使用CakePHP ORM,就没有办法绕过这一点,想象一下如果原始SQL查询需要处理所有这些逻辑,它会是什么样子。

因此,只需将其分为两次保存,然后手动使用DboSource::begin()/commit()/rollback()将其全部打包即可。

一个例子

以下是一个基于您的数据并根据您的新需求更新的简单示例:

$spreadsheet_rows = array(
    array(
      'title' => 'New post',
      'author_username' => 'foobar',
      'content' => 'New post'
    ),
    array(
      'title' => 'Another new post',
      'author_username' => 'foobar',
      'content' => 'Another new post'
    ),
    array(
      'title' => 'Third post',
      'author_username' => 'Existing user',
      'content' => 'Third post'
    ),
    array(
      'title' => 'Fourth post',
      'content' => 'Fourth post'
    ),
    array(
      'title' => 'Fifth post',
      'content' => 'Fifth post'
    ),
);
$authors = array();
$posts = array();
foreach ($spreadsheet_rows as $row) {
    // store non-author associated posts separately
    if (!isset($row['author_username'])) {
        $posts[] = $row;
    } else {
        $username = $row['author_username'];
        // prepare an author only once per username
        if (!isset($authors[$username])) {
            $author = $this->Author->find('first', array(
                'conditions' => array(
                    'Author.username' => $row['author_username']
                )
            ));
            // if the author already exists use its id, otherwise
            // use the username so that a new author is being created
            if (!empty($author)) {
                $authors[$username] = array(
                    'id' => $author['Author']['id']
                );
            } else {
                $authors[$username] = array(
                    'username' => $username
                );
            }
            $authors[$username]['Post'] = array();
        }
        // group posts under their respective authors
        $authors[$username]['Post'][] = array(
            'title' => $row['title'],
            'content' => $row['content'],
        );
    }
}
// convert the string (username) indices into numeric ones
$authors = Hash::extract($authors, '{s}');
// manually wrap both saves in a transaction.
//
// might require additional table locking as
// CakePHP issues SELECT queries in between.
//
// also this example requires both tables to use
// the default connection
$ds = ConnectionManager::getDataSource('default');
$ds->begin();
try {
    $result =
        $this->Author->saveMany($authors, array('deep' => true)) &&
        $this->Post->saveMany($posts);
    if ($result && $ds->commit() !== false) {
        // success, yay
    } else {
        // failure, buhu
        $ds->rollback();
    }
 } catch(Exception $e) {
    // failed hard, ouch
    $ds->rollback();
    throw $e;
}

您需要使用saveAll,它是saveMany和saveAssociated的混合体(您需要同时使用这两种方法)。此外,您还需要更改每个帖子的结构。

下面是一个需要在循环中创建的结构示例。

<?php
  $posts = array();
  //This is a post for a row with a new author
  $post = array (
    'Post' => array ('title' => 'My Title', 'content' => 'This is the content'),
    'Author' => array ('username' => 'new_author')
  );
  $posts[] = $post;
  //This is a post for a row with an existing author
  $post = array (
    'Post' => array ('title' => 'My Second Title', 'content' => 'This is another content'),
    'Author' => array ('id' => 1)
  );
  $posts[] = $post;
  //This is a post for a row with no author
  $post = array (
    'Post' => array ('title' => 'My Third Title', 'content' => 'This is one more content')
  );
  $posts[] = $post;

  $this->Post->saveAll($posts, array ('deep' => true));
?>

按照ndm建议的"手动使用事务",这段代码(在单元测试中编写!)似乎做到了:

public function testAdd() {
    $this->generate('Articles', array());
    $this->controller->loadModel('Article');
    $this->controller->loadModel('Author');
    $csv_data = array(
        array(
            'Article' => array(
                'title' => 'title'
            )),
        array(
            'Article' => array(
                'title' => 'title'
            ),
            'Author' => array(
                'name' => 'foobar'
            ),
        ),
        array(
            'Article' => array(
                'title' => 'title2'
            ),
            'Author' => array(
                'name' => 'foobar'
            )
        ),
        /* array( */
        /*     'Article' => array( */
        /*         'title' => '' */
        /*     ), */
        /*     'Author' => array( */
        /*         'name' => '' // this breaks our validation */
        /*     ) */
        /* ), */
    );
    $db = $this->controller->Article->getDataSource();
    $db->begin();
    /*
     * We want to inform the user of _all_ validation messages, not one at a time
     */
    $validation_errors = array();
    /*
     * Do this by row count, so that user can look through their CSV file
     */
    $row_count = 1;
    foreach ($csv_data as &$row) {
        /*
         * If author already exists, don't create new record, but associate to existing
         */
        if (!empty($row['Author'])) {                
            $author = $this->controller->Author->find('first', 
                array(
                    'conditions' => array(
                        'name' => $row['Author']['name']
                    )
                ));
            if (!empty($author)) {
                $row['Author']['id'] = $author['Author']['id'];
            }
        }
        $this->controller->Article->saveAssociated($row, array('validate' => true));
        if (!empty($this->controller->Article->validationErrors)) {
            $validation_errors[$row_count] = $this->controller->Article->validationErrors;
        }            
        $row_count++;
    }

    if (empty($validation_errors)) {
        $db->commit();            
    } else {
        $db->rollback();
        debug($validation_errors);
    }
    debug($this->controller->Article->find('all'));
}