data.table is a powerful tool for exploring data. However, how is it fast? Here we provides a performance test for summing by groups.
Code:
1  | N = 1e4  | 
In the case with small sample size, tapply is the most efficient tool for summing by groups. In the case with large sample size, data.table and summarise in dplyr are more efficient.
Next, we benchmark the performance of summing by two groups. Code:
1  | N = 1e4  | 
In the case of summing by two groups, data.table is much more efficient in large size. We also try a Rcpp in summing by groups. Code:
1  | library(Rcpp)  | 
We can see that data.table is compatible with Rcpp and more convenient than Rcpp.
My environment is ubuntu 14.04, R 3.1.1 compiled by intel c++, fortran compiler with MKL. My CPU is 3770K@4.3GHz.