data.table
is a powerful tool for exploring data. However, how is it fast? Here we provides a performance test for summing by groups.
Code:
1 | N = 1e4 |
In the case with small sample size, tapply
is the most efficient tool for summing by groups. In the case with large sample size, data.table
and summarise
in dplyr
are more efficient.
Next, we benchmark the performance of summing by two groups. Code:
1 | N = 1e4 |
In the case of summing by two groups, data.table
is much more efficient in large size. We also try a Rcpp
in summing by groups. Code:
1 | library(Rcpp) |
We can see that data.table
is compatible with Rcpp
and more convenient than Rcpp
.
My environment is ubuntu 14.04, R 3.1.1 compiled by intel c++, fortran compiler with MKL. My CPU is 3770K@4.3GHz.