如何优化laravel上的大数据处理


How to optimise handle of big data on laravel?

我的任务是:"获取交易表,按交易日期分组行并计算状态。这些操作将形成统计信息,并在页面上呈现"。

这是我这个统计生成的方法

public static function getStatistics(Website $website = null)
{
    if($website == null) return [];
    $query = 'DB::table('transactions')->where("website_id", $website->id)->orderBy("dt", "desc")->get();
    $transitions = collect(static::convertDate($query))->groupBy("dt");
    $statistics = collect();
    dd($transitions);
    foreach ($transitions as $date => $trans) {
        $subscriptions = $trans->where("status", 'subscribe')->count();
        $unsubscriptions = $trans->where("status", 'unsubscribe')->count();
        $prolongations = $trans->where("status", 'rebilling')->count();
        $redirections = $trans->where("status", 'redirect_to_lp')->count();
        $conversion = $redirections == 0 ? 0 : ((float) ($subscriptions / $redirections));
        $earnings = $trans->sum("pay");
        $statistics->push((object)[
            "date" => $date,
            "subscriptions" => $subscriptions,
            'unsubscriptions' => $unsubscriptions,
            'prolongations' => $prolongations,
            'redirections' => $redirections,
            'conversion' => round($conversion, 2),
            'earnings' => $earnings,
        ]);
    }
    return $statistics;
}

如果事务行数低于100000,那都是wright。但是,如果计数在150-200k以上,nginx抛出502坏网关。你能给我什么建议?我对大数据处理没有任何经验。可能是,我的恶作剧有根本错误吗?

大数据从来都不容易,但我建议使用Laravel chunk而不是get

https://laravel.com/docs/5.1/eloquent(ctrl+f"::chunk")

::chunk所做的是一次选择n行,并允许您一点一点地处理它们。这很方便,因为它允许您将更新流式传输到浏览器,但在约150k的结果范围内,我建议您查找如何将这项工作推送到后台进程中,而不是根据请求进行处理。

经过几天对这个问题的研究,我找到了正确的答案:

不要使用PHP处理原始数据。最好使用SQL!

在我的例子中,我们使用的是PostgreSQL。

下面,我将编写对我有用的sql查询,也许它会帮助其他人。

WITH
        cte_range(dt) AS
        (
            SELECT
                generate_series('2016-04-01 00:00:00'::timestamp with time zone, '{$date} 00:00:00'::timestamp with time zone, INTERVAL '1 day')
        ),
        cte_data AS
        (
            SELECT
                date_trunc('day', dt) AS dt,
                COUNT(*) FILTER (WHERE status = 'subscribe') AS count_subscribes,
                COUNT(*) FILTER (WHERE status = 'unsubscribe') AS count_unsubscribes,
                COUNT(*) FILTER (WHERE status = 'rebilling') AS count_rebillings,
                COUNT(*) FILTER (WHERE status = 'redirect_to_lp') AS count_redirects_to_lp,
                SUM(pay) AS earnings,
                CASE
                    WHEN COUNT(*) FILTER (WHERE status = 'redirect_to_lp') > 0 THEN 100.0 * COUNT(*) FILTER (WHERE status = 'subscribe')::float / COUNT(*) FILTER (WHERE status = 'redirect_to_lp')::float
                    ELSE 0
                END
                AS conversion_percent
            FROM
                transactions
            WHERE
                website_id = {$website->id}
            GROUP BY
                date_trunc('day', dt)
        )
        SELECT
            to_char(cte_range.dt, 'YYYY-MM-DD') AS day,
            COALESCE(cte_data.count_subscribes, 0) AS count_subscribe,
            COALESCE(cte_data.count_unsubscribes, 0) AS count_unsubscribes,
            COALESCE(cte_data.count_rebillings, 0) AS count_rebillings,
            COALESCE(cte_data.count_redirects_to_lp, 0) AS count_redirects_to_lp,
            COALESCE(cte_data.conversion_percent, 0) AS conversion_percent,
            COALESCE(cte_data.earnings, 0) AS earnings
        FROM
            cte_range
        LEFT JOIN
            cte_data
            ON cte_data.dt = cte_range.dt
        ORDER BY
            cte_range.dt DESC