应该使用mongoDb查询来查找出现次数


What should be the mongoDb query to find number of occurrences?

collection、中的样本记录

(文件1)

[{
   "_id": ObjectId("567941aaf0058ed6755ab3dc"),
   "hash_count": NumberInt(7),
   "time": [
     NumberInt(1450787170),
     NumberInt(1450787292),
     NumberInt(1450787307),
     NumberInt(1450787333),
     NumberInt(1450787615) 
  ],
   "word": "batman" 
},

(文件2)

   {
       "_id": ObjectId("567941aaf0058ed6755ab3dc"),
       "hash_count": NumberInt(7),
   "time": [
     NumberInt(1450787170),
     NumberInt(1450787292),
     NumberInt(1450787307),
     NumberInt(1450787333),
     NumberInt(1450787354),
     NumberInt(1450787526),
     NumberInt(1450787615) 
  ],
   "word": "apple" 
}]

已使用CCD_ 2存储,我想找出时间(1450787307)和(1450787615)之间的记录数

答案:

apple=5
batman=3 

应该查询什么?

我运行了这个命令

{
aggregate : "hashtags",       
pipeline:
[
{$match:{"time":{$gte:NumberInt(1450787307), $lte:NumberInt(1450787615)}}},
{$unwind:"$time"},
{$match:{"time":{$gte:NumberInt(1450787307), $lte:NumberInt(1450787615)}}},
{$group:{"_id":"$word","count":{$sum:1}}}
]
}

它给出了这个结果

Response from server:
{
   "result": [
  ],
   "ok": 1 
}

由于您一直使用旧版本的mongoDB,因此无法利用3.2中引入的array aggregation operators的功能。

您必须按照以下方式进行汇总:

db.collection.aggregate([
{$match:{"time":{$gte:NumberInt(1450787307), $lte:NumberInt(1450787615)}}},
{$unwind:"$time"},
{$match:{"time":{$gte:NumberInt(1450787307), $lte:NumberInt(1450787615)}}},
{$group:{"_id":"$word","count":{$sum:1}}}
])

翻译成PHP

$result = $c->aggregate([
[ '$match' => [ 'time' => [ '$gte' => NumberInt(1450787307), 
                            '$lte' => NumberInt(1450787615) ] ] ],
[ '$unwind' => '$time' ],
[ '$match' => [ 'time' => [ '$gte' => NumberInt(1450787307), 
                            '$lte' => NumberInt(1450787615) ] ] ],
[ '$group' => [ '_id' => '$word', 'count' => [ '$sum' => 1 ] ] ]
]);

在版本3.2中,您可以使用$filter$size的组合来获得相同的结果,并且操作成本更低。

db.collection.aggregate([
{$match:{"time":{$gte:NumberInt(1450787307), 
                 $lte:NumberInt(1450787615)}}},
{$project:{"_id":0,"word":1,
           "count":{$size:{$filter:
                               {"input":"$time",
                                "as":"t",
                                "cond":{$and:[
                                     {$gte:["$$t",NumberInt(1450787307)]},
                                     {$lte:["$$t",NumberInt(1450787615)]}]}
                                }
                           }
                    }
}}
])

好吧,经过多次尝试,我得出了这个答案,而且是正确的对于1450787615-下限1450855155-上限

db.hashtags.aggregate([
    {
        "$match": {
            "time": {
                "$gte": 1450787615, "$lte": 1450855155  
            }
        }
    },
    { "$unwind": "$time" },
    {
        "$match": {
            "time": {
                "$gte": 1450787615, "$lte": 1450855155  
            }
        }
    },
    {
        "$group": {
            "_id": "$word",
            "count": {
                "$sum": 1
            }
        }
    }
])

答案就像

{
    "result" : [ 
        {
            "_id" : "batman",
            "count" : 3
        }, 
        {
            "_id" : "dear",
            "count" : 1
        }, 
        {
            "_id" : "ghost",
            "count" : 1
        }
    ],
    "ok" : 1
}
db.collection.find({time:{$gt: 1450787307, $lt: 1450787615}}); 

这将首先为您提供一个适合您给定时间范围内的所有文档的光标。一旦你有了它,你就可以遍历光标,打印出名称和一些循环逻辑,以找到每一个的出现次数。我只使用过mongodb,所以可能有一种更有效的方法可以做到这一点。

参考:https://docs.mongodb.org/v3.0/reference/method/db.collection.find/