Ching-Chuan Chen's Blogger

Statistics, Machine Learning and Programming

0%

I try to write a lasso algorithm by Rcpp attributes.

Reference:
Friedman, J., Hastie, T. and Tibshirani, R. (2008) Regularization Paths for Generalized Linear Models via Coordinate Descent, http://www.stanford.edu/~hastie/Papers/glmnet.pdf, Journal of Statistical Software, Vol. 33(1), 1-22 Feb 2010, http://www.jstatsoft.org/v33/i01/

Code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
library(Rcpp)
library(RcppArmadillo)
sourceCpp(code = '
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
#include <ctime>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
List lasso_fit_f(NumericMatrix Xr, NumericVector yr, int n_penalty = 100){
int n = Xr.nrow(), p = Xr.ncol();
mat X(Xr.begin(), n, p, false);
colvec y(yr.begin(), n, false);
double z0 = mean(y), penalty;
colvec xydot = (y.t() * X).t();
mat xxdot(p, p);
xxdot = X.t() * X;
double penalty_max_log = log(max(xydot) / n);
colvec penalties = exp(linspace<colvec>(penalty_max_log, penalty_max_log+log(0.05), n_penalty));
mat coef_m(p, n_penalty);
int MaxIter = 1e5;
colvec coef(p), coef_new(p);
for(int k = 1; k < n_penalty; k++)
{
coef = coef_m.col(k-1);
penalty = penalties(k);
for(int i = 0; i < MaxIter; i++)
{
coef_new = (xydot - xxdot * coef) / n + coef;
coef_new(find(abs(coef_new) < penalty)).zeros();
coef_new(find(coef_new > penalty)) -= penalty;
coef_new(find(coef_new < -penalty)) += penalty;
if( as_scalar((coef - coef_new).t() * (coef - coef_new)) / k < 1e-7)
break;
coef = coef_new;
}
coef_m.col(k) = coef;
}
return List::create(Named("intercept") = z0,
Named("coefficients") = coef_m,
Named("penalties") = penalties);
}')

d = 1000; N = 10000
x = matrix(rnorm(N*d), N)
x[,3] = x[,1] - 2*x[,2] + rnorm(N)
x[,30] = x[,10] - 2*x[,20] + rnorm(N)
y = cbind(1, x) %*% c(-1, 2, -0.3, 0.7, sample(10**seq(-10, 1, length = N), d-3)) + rnorm(N, 0, 2)
x = scale(x)

t1 = Sys.time()
a = lasso_fit_f(x, y)
t_cpp = Sys.time() - t1

library(glmnet)
t1 = Sys.time()
fit = glmnet(x,y, lambda = a$penalties)
t_glmnet = Sys.time() - t1
c(t_glmnet, t_cpp)
# [1] 0.7171087 0.2032089

It works well!!

My environment is ubuntu 14.04, R 3.1.1 compiled by intel c++, fortran compiler with MKL. My CPU is 3770K@4.3GHz.

Recently, I went to the 23th STSC, I got some information about the new API of Rcpp, Rcpp attributes. I had tried some examples and it worked well. Here I demonstrate some examples.

First example: call the pnorm function in Rcpp:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
require(Rcpp)
sourceCpp(code = '
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
DataFrame mypnorm(NumericVector x){
int n = x.size();
NumericVector y1(n), y2(n), y3(n);
for (int i=0; i<n; i++){
y1[i] = ::Rf_pnorm5(x[i], 0.0, 1.0, 1, 0);
y2[i] = R::pnorm(x[i], 0.0, 1.0, 1, 0);
}
y3 = pnorm(x);
return DataFrame::create(
Named("R") = y1,
Named("Rf_") = y2,
Named("sugar") = y3);
}')
mypnorm(runif(10, -3, 3))

Rcpp attributes allows user to write Rcpp in a simple way. User does not need to learn about how to write R extension. Just write a cpp script and add the line // [[Rcpp::export]], then user can use the function in R.

Next two example is about the two extension packages of Rcpp, RcppArmadillo and RcppEigen. The two packages provide Rcpp to link the C++ linear algebra libraries, armadillo and Eigen.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
library(Rcpp)
library(RcppArmadillo)
library(RcppEigen)
sourceCpp(code = '
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;

// [[Rcpp::export]]
List fastLm_RcppArma(NumericVector yr, NumericMatrix Xr) {
int n = Xr.nrow(), k = Xr.ncol();
arma::mat X(Xr.begin(), n, k, false);
arma::colvec y(yr.begin(), yr.size(), false);
arma::colvec coef = arma::solve(X, y);
arma::colvec resid = y - X*coef;
double sig2 = arma::as_scalar(arma::trans(resid)*resid/(n-k));
arma::colvec stderrest = arma::sqrt(sig2 * arma::diagvec( arma::inv(arma::trans(X)*X)));
return List::create(
Named("coefficients") = coef,
Named("stderr") = stderrest);
}')

sourceCpp(code = '
// [[Rcpp::depends(RcppEigen)]]
#include <RcppEigen.h>
using namespace Rcpp;
using Eigen::Map;
using Eigen::MatrixXd;
using Eigen::VectorXd;

// [[Rcpp::export]]
List fastLm_RcppEigen(NumericVector yr, NumericMatrix Xr) {
const Map<MatrixXd> X(as<Map<MatrixXd> >(Xr));
const Map<VectorXd> y(as<Map<VectorXd> >(yr));
int n = Xr.nrow(), k = Xr.ncol();
VectorXd coef = (X.transpose() * X).llt().solve(X.transpose() * y.col(0));
VectorXd resid = y - X*coef;
double sig2 = resid.squaredNorm() / (n - k);
VectorXd stderrest = (sig2 * ((X.transpose() * X).inverse()).diagonal()).array().sqrt();
return List::create(
Named("coefficients") = coef,
Named("stderr") = stderrest);
}')
N = 20000
p = 1000
X = matrix(rnorm(N*p), ncol = p)
y = X %*% 10**(sample(seq(-5, 3, length = N+p), p)) + rnorm(N)

system.time(fastLm_RcppArma(y, X))
# user system elapsed
# 5.49 0.06 1.46
system.time(fastLm_RcppEigen(y, X))
# user system elapsed
# 9.32 0.06 8.96
system.time(lm(y~X - 1))
# user system elapsed
# 145.13 14.76 91.84

The cpp functions are faster 63 times than R function lm. My environment is ubuntu 14.04, R 3.1.1 compiled by intel c++, fortran compiler with MKL. My CPU is 3770K@4.3GHz. I think that Rcpp is the package which is the worthiest to learn if you want to use R to do statistical computing or machine learning. Rcpp attributes had changed the way to source C++ code in R, it let Rcpp is more convenient and more powerful.

This post is used to record some tips I can’t categorize in ubuntu.

i. automatically load some shell scripts
In my system ubuntu 14.04, I can find the file .bashrc in my home directory.
Since I want ubuntu to load intel complier and mkl parameter automatically, all I need to do is to add the two lines in the end of that file: (mint 17: gedit /etc/bash.bashrc)

1
2
source /opt/intel/composer_xe_2015/mkl/bin/mklvars.sh intel64
source /opt/intel/composer_xe_2015/bin/compilervars.sh intel64

Then I success!!

ii. cannot install ubuntu or Mint
With the options - acpi=off nolapic noapic, I finally install ubuntu successfully.

iii. cannot boot without nolapic, however, it only recognize one cpu with nolapic
I solved this problem by Dual core recognized as single core because of nolapic?.
I edited the grub file with following commands:

1
2
sudo bash
gedit /etc/default/grub

And replace nolapic with pci=assign-busses apicmaintimer idle=poll reboot=cold,hard, the grub file would be contain this two lines:

1
2
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash acpi_osi=linux"
GRUB_CMDLINE_LINUX="noapic pci=assign-busses apicmaintimer idle=poll reboot=cold,hard"

Then use following command to update grub. And the problem is fixed.

1
sudo update-grub

iv. to get the permission of ntfs disks, you can edit the fstab in /etc as following:

1
sudo gedit /etc/fstab

And you can find the uuid by using the command ls -l /dev/disk/by-uuid. To add the disk and set the permission in the file fstab like this:

1
2
3
4
UUID=1c712d26-7f9d-4efc-b796-65bee366c8aa / ext4    noatime,nodiratime,discard,errors=remount-ro 0       1
UUID=9298D0AB98D08EDB /media/Windows ntfs defaults,uid=1000,gid=1000,umask=002 0 0
UUID=08C2997EC29970A4 /media/Download ntfs defaults,uid=1000,gid=1000,umask=002 0 0
UUID=01CD524F3352C990 /media/Files ntfs defaults,uid=1000,gid=1000,umask=002 0 0

Then you can access your ntfs disk and set an alias for each disk.

v. use grub comstomer to edit the boot order. Installation:

1
2
3
sudo add-apt-repository ppa:danielrichter2007/grub-customizer
sudo apt-get update
sudo apt-get install grub-customizer

vi. Install font InconsolataDownload Here and unity tweak tool (sudo apt-get install unity-tweak-tool).

vii. Install the chinese input fcitx and language Chinese Traditional.

1
sudo apt-get install fcitx fcitx-chewing fcitx-config-gtk fcitx-frontend-all fcitx-module-cloudpinyin fcitx-ui-classic fcitx-frontend-qt4 fcitx-frontend-qt5 fcitx-frontend-gtk2 fcitx-frontend-gtk3

viii. Install ruby, jekyll and git.

1
2
3
4
5
6
sudo apt-get install software-properties-common
sudo apt-add-repository ppa:brightbox/ruby-ng
sudo apt-get update
sudo apt-get install ruby2.2 ruby2.2-dev git python-pip python3-pip
sudo gem install jekyll
sudo pip install pygments

(To be continued.)

這篇要來敘述怎麼在linux中,利用Intel C++ compiler以及Intel MKL編譯numpy以及scipy這兩個python的套件,以下是參考連結:

  1. https://software.intel.com/en-us/articles/numpyscipy-with-intel-mkl
  2. http://songuke.blogspot.tw/2012/02/compile-numpy-and-scipy-with-intel-math.html
  3. Numpy使用MKL库提升计算性能
  4. Numpy fails with python-dbg

首先,先取得編譯環境以及root權限以方便進行編譯的工作,另外還有一些需要的套件要安裝,命令如下:

1
2
3
sudo apt-get install python-setuptools python-pip python-dev cython
# python 3
sudo apt-get install python3-setuptools python3-pip python3-dev cython3

接著切換到Downloads目錄(這你可以自己調整)並下載numpy以及scipy的原始碼,命令如下:

1
2
3
cd; cd Downloads
git clone https://github.com/numpy/numpy.git
git clone https://github.com/scipy/scipy.git

接著在numpy資料夾中新增一個site.cfg的檔案(此處以sublime text做編輯器),命令如下:

1
2
3
4
5
6
sudo -s
source /opt/intel/composer_xe_2013_sp1.3.174/mkl/bin/mklvars.sh intel64
source /opt/intel/composer_xe_2013_sp1.3.174/bin/compilervars.sh intel64
cd numpy
# rm -rf build # if there is a build folder
subl site.cfg

並且添加內容:

1
2
3
4
5
6
[mkl]
library_dirs = /opt/intel/composer_xe_2013_sp1.3.174/compiler/lib/intel64:/opt/intel/composer_xe_2013_sp1.3.174/mkl/lib/intel64
include_dirs = /opt/intel/composer_xe_2013_sp1.3.174/compiler/include:/opt/intel/composer_xe_2013_sp1.3.174/mkl/include
mkl_libs = mkl_def, mkl_intel_lp64, mkl_intel_thread, mkl_core
lapack_libs = mkl_lapack95_lp64
libraries = iomp5

接著修改編譯的參數,

1
subl numpy/distutils/intelccompiler.py

以下方文字分別取代取代文件中self.cc_exe='icc -fPIC'以及self.cc_exe='icc -m64 -fPIC'

1
2
self.cc_exe = 'icc -O3 -g -fPIC -fp-model strict -fomit-frame-pointer -openmp -xhost'
self.cc_exe = 'icc -m64 -O3 -g -fPIC -fp-model strict -fomit-frame-pointer -openmp -xhost'

最後運行這個指令就可以進行安裝了。

1
2
3
python setup.py config --compiler=intelem build_clib --compiler=intelem build_ext --compiler=intelem install
# python 3
python3 setup.py config --compiler=intelem build_clib --compiler=intelem build_ext --compiler=intelem install

如果出現No module named msvc9compiler,就把numpy/distutil/intelccompiler.py裡面有關msvc9compiler的code都註解掉就好了。

請先測試numpy是否正常,先安裝nose這個套件:

1
2
3
easy_install nose
# python3
easy_install3 nose

開啟python並運行(注意環境還是要source上方兩個檔案):

1
2
import numpy
numpy.test()

接著編譯scipy,把site.cfg從numpy複製到scipy的資料夾中:

1
2
3
4
5
cp site.cfg ../scipy/site.cfg
cd ../scipy
python setup.py config --compiler=intelem --fcompiler=intelem build_clib --compiler=intelem --fcompiler=intelem build_ext --compiler=intelem --fcompiler=intelem install
# python 3
python3 setup.py config --compiler=intelem --fcompiler=intelem build_clib --compiler=intelem --fcompiler=intelem build_ext --compiler=intelem --fcompiler=intelem install

開啟python測試scipy(注意環境還是要source上方兩個檔案):

1
2
import scipy
scipy.test()

我跑scipy的測試會失敗三個,看了一下別人的問答,他們認為應該不是太嚴重的錯,我也沒有再裡他了。最後如果中間有出錯,請記得移除掉你安裝套件的位置,假設你使用的python是2.7版就是執行下方指令,在加上tab補全剩下的檔名:

1
2
3
4
5
sudo rm -r /usr/local/lib/python2.7/dist-packages/numpy
sudo rm -r /usr/local/lib/python2.7/dist-packages/scipy
# python 3.4
sudo rm -r /usr/local/lib/python3.4/dist-packages/numpy
sudo rm -r /usr/local/lib/python3.4/dist-packages/scipy

Add $LD_LIBRARY_PATH in environment by using subl ~/.bashrc.

1
export LD_LIBRARY_PATH=/opt/intel/composer_xe_2013_sp1.3.174/compiler/lib/intel64:/opt/intel/composer_xe_2013_sp1.3.174/mkl/lib/intel64:$LD_LIBRARY_PATH

從上週就一直在嘗試如何把我部落格的程式碼都上色,弄了好多天才發現主要的癥結。中間參考太多網站,列幾個重要的。參考網站如下:

  1. Do I need to generate a css file …

  2. Add code highlight with Pygments

你要有syntax highlight的功能,要先安裝幾個重要的工具,第一個是要有Python,並且安裝其套件Pygments,Ruby要安裝Pygments.rb,版本0.5.4可能會出錯,我裝的是0.5.0,這個版本大多人都可以成功。接著需要調整_config.yml中的選項:

1
highlighter: pygments

另外,還有要生成highlight所需的css檔案,於cmd中鍵入

1
pygmentize -S default -f html > pygments.css

然後在themes中的default.html中加上下面這一行:

1
<link rel="stylesheet" type="text/css" href="/path/to/pygments.css">

這樣就成功了。

以下文章參考下列四個網址:

  1. 完美解决 Linux 下 Sublime Text 中文输入
  2. How do I make Sublime Text 3 the default text editor

linux上,中文輸入法一直是難題,應用程式沒辦法支援中文輸入是非常常見的事情,連sublime text也是。還不只如此,還有輸入法的戰爭,我在進入linux時,最一開使用的輸入法是gcin,然後去用hime,最後因為sublime text的解決方法只能只用fcitx,最後用了這個輸入法。

我看了不少文章提供各種sublime text輸入的問題,我覺得下列方法是最方便的:

  1. 把下列的程式碼存為sublime_imfix.c:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
/*
sublime-imfix.c
Use LD_PRELOAD to interpose some function to fix sublime input method support for linux.
By Cjacker Huang

gcc -shared -o libsublime-imfix.so sublime-imfix.c `pkg-config --libs --cflags gtk+-2.0` -fPIC
LD_PRELOAD=./libsublime-imfix.so subl
*/
#include <gtk/gtk.h>
#include <gdk/gdkx.h>
typedef GdkSegment GdkRegionBox;

struct _GdkRegion
{
long size;
long numRects;
GdkRegionBox *rects;
GdkRegionBox extents;
};

GtkIMContext *local_context;

void
gdk_region_get_clipbox (const GdkRegion *region,
GdkRectangle *rectangle)
{
g_return_if_fail (region != NULL);
g_return_if_fail (rectangle != NULL);

rectangle->x = region->extents.x1;
rectangle->y = region->extents.y1;
rectangle->width = region->extents.x2 - region->extents.x1;
rectangle->height = region->extents.y2 - region->extents.y1;
GdkRectangle rect;
rect.x = rectangle->x;
rect.y = rectangle->y;
rect.width = 0;
rect.height = rectangle->height;
//The caret width is 2;
//Maybe sometimes we will make a mistake, but for most of the time, it should be the caret.
if(rectangle->width == 2 && GTK_IS_IM_CONTEXT(local_context)) {
gtk_im_context_set_cursor_location(local_context, rectangle);
}
}

//this is needed, for example, if you input something in file dialog and return back the edit area
//context will lost, so here we set it again.

static GdkFilterReturn event_filter (GdkXEvent *xevent, GdkEvent *event, gpointer im_context)
{
XEvent *xev = (XEvent *)xevent;
if(xev->type == KeyRelease && GTK_IS_IM_CONTEXT(im_context)) {
GdkWindow * win = g_object_get_data(G_OBJECT(im_context),"window");
if(GDK_IS_WINDOW(win))
gtk_im_context_set_client_window(im_context, win);
}
return GDK_FILTER_CONTINUE;
}

void gtk_im_context_set_client_window (GtkIMContext *context,
GdkWindow *window)
{
GtkIMContextClass *klass;
g_return_if_fail (GTK_IS_IM_CONTEXT (context));
klass = GTK_IM_CONTEXT_GET_CLASS (context);
if (klass->set_client_window)
klass->set_client_window (context, window);

if(!GDK_IS_WINDOW (window))
return;
g_object_set_data(G_OBJECT(context),"window",window);
int width = gdk_window_get_width(window);
int height = gdk_window_get_height(window);
if(width != 0 && height !=0) {
gtk_im_context_focus_in(context);
local_context = context;
}
gdk_window_add_filter (window, event_filter, context);
}
  1. Ctrl+Alt+T打開你的Terminal視窗到你儲存上面檔案的地方,鍵入:
1
2
3
sudo apt-get install build-essential libgtk2.0-dev
gcc -shared -o libsublime-imfix.so sublime-imfix.c `pkg-config --libs --cflags gtk+-2.0` -fPIC
sudo mv libsublime-imfix.so /opt/sublime_text/

這樣就完成編譯,並且將檔案放置到安裝目錄了。

  1. 修改啟動部份
1
sudo subl /usr/share/applications/sublime-text.desktop

在每一個Exec=後面都加上下面的指令:

1
env LD_PRELOAD=/opt/sublime_text/libsublime-imfix.so

然後輸入

1
sudo subl /usr/bin/subl

更動內容為

1
2
3
#!/bin/sh
export LD_PRELOAD=/opt/sublime_text/libsublime-imfix.so
exec /opt/sublime_text/sublime_text "$@"
  1. 如果想要把sublime text更動為預設編輯器,先使用下列指令確定是否有安裝成功:
1
ls /usr/share/applications/sublime-text.desktop

接著打開linux的default列表:

1
sudo subl /usr/share/applications/defaults.list

按下Ctrl+H replace gedit with sublime-text。接著打開user的設定列表:

1
subl ~/.local/share/applications/mimeapps.list

修改或添加下列下列文字:

1
2
3
4
5
6
7
8
[Added Associations]
text/plain=ubuntu-software-center.desktop;shotwell.desktop;sublime-text.desktop;
text/x-chdr=shotwell-viewer.desktop;

[Default Applications]
text/plain=sublime-text.desktop
text/x-c++src=sublime-text.desktop
text/x-chdr=sublime-text.desktop

以下文章參考下列四個網址:

  1. Using Intel MKL with R
  2. Build R-3.0.1 with Intel C++ Compiler and Intel MKL on Linux
  3. Compiling R 3.0.1 with MKL support
  4. R Installation and Administraction

開始之前,先用Default R and R with Openblas來測試看看,I use testing script found in Simon Urbanek’s,Openblas部份參考這個網站For faster R use OpenBLAS instead: better than ATLAS, trivial to switch to on Ubuntu

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# to install package in /usr/lib/R/library
sudo chmod -R 774 /usr/lib/R
sudo chown -R celest.celest /usr/lib/R
sudo chmod -R 774 /usr/local/lib/R
sudo chown -R celest.celest /usr/local/lib/R
# install required package
R -e "install.packages('SuppDists', repos = 'http://cran.rstudio.com/')"
# run benchmark
R -e "source('http://r.research.att.com/benchmarks/R-benchmark-25.R')"
# install OpenBLAS
sudo apt-get install libopenblas-base libatlas3gf-base
# check the BLAS is replaced with OpenBLAS
sudo update-alternatives --config libblas.so.3
sudo update-alternatives --config liblapack.so.3
# run benchmark again
R -e "source('http://r.research.att.com/benchmarks/R-benchmark-25.R')"

測試結果如下:
Default R:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
   R Benchmark 2.5
===============
Number of times each test is run__________________________: 3

I. Matrix calculation
---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec): 0.812333333333334
2400x2400 normal distributed random matrix ^1000____ (sec): 0.474666666666667
Sorting of 7,000,000 random values__________________ (sec): 0.563333333333333
2800x2800 cross-product matrix (b = a' * a)_________ (sec): 8.99466666666667
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 4.42166666666667
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 1.26481956425649

II. Matrix functions
--------------------
FFT over 2,400,000 random values____________________ (sec): 0.396000000000001
Eigenvalues of a 640x640 random matrix______________ (sec): 0.718000000000001
Determinant of a 2500x2500 random matrix____________ (sec): 3.03633333333334
Cholesky decomposition of a 3000x3000 matrix________ (sec): 3.42433333333333
Inverse of a 1600x1600 random matrix________________ (sec): 2.56266666666666
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 1.77441555997029

III. Programmation
------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec): 0.543999999999992
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.294333333333337
Grand common divisors of 400,000 pairs (recursion)__ (sec): 0.686666666666658
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.49266666666666
Escoufier's method on a 45x45 matrix (mixed)________ (sec): 0.378999999999991
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.466584631384852


Total time for all 15 tests_________________________ (sec): 27.8006666666666
Overall mean (sum of I, II and III trimmed means/3)_ (sec): 1.01548017027814
--- End of test ---

R with Openblas:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
   R Benchmark 2.5
===============
Number of times each test is run__________________________: 3

I. Matrix calculation
---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec): 0.755666666666667
2400x2400 normal distributed random matrix ^1000____ (sec): 0.473
Sorting of 7,000,000 random values__________________ (sec): 0.572
2800x2800 cross-product matrix (b = a' * a)_________ (sec): 0.411
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 0.213
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.480875883392325

II. Matrix functions
--------------------
FFT over 2,400,000 random values____________________ (sec): 0.372666666666666
Eigenvalues of a 640x640 random matrix______________ (sec): 0.895
Determinant of a 2500x2500 random matrix____________ (sec): 0.271333333333335
Cholesky decomposition of a 3000x3000 matrix________ (sec): 0.243333333333333
Inverse of a 1600x1600 random matrix________________ (sec): 0.328000000000001
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.321291459145567

III. Programmation
------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec): 0.522333333333335
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.259666666666668
Grand common divisors of 400,000 pairs (recursion)__ (sec): 0.674333333333337
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.499333333333335
Escoufier's method on a 45x45 matrix (mixed)________ (sec): 0.352999999999994
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.451548428599678


Total time for all 15 tests_________________________ (sec): 6.84366666666667
Overall mean (sum of I, II and III trimmed means/3)_ (sec): 0.411666478563312
--- End of test ---

可以看到total time已經從27.8秒到6.8秒左右,改善幅度已經不少,接著來compile R:

  1. 取得R與其開發包,並安裝需要的套件,在terminal use following commands:
1
2
3
sudo add-apt-repository ppa:webupd8team/java && sudo apt-get update && sudo apt-get install oracle-java8-installer && sudo apt-get install oracle-java8-set-default
apt-cache search readline xorg-dev && sudo apt-get install libreadline6 libreadline6-dev texinfo texlive texlive-binaries texlive-latex-base xorg-dev tcl8.6-dev tk8.6-dev libtiff5 libtiff5-dev libjpeg-dev libpng12-dev libcairo2-dev libglu1-mesa-dev libgsl0-dev libicu-dev R-base R-base-dev libnlopt-dev libstdc++6 build-essential libcurl4-openssl-dev texlive-fonts-extra libxml2-dev aptitude
# sudo apt-get install texlive-latex-extra

有一個工具要另外安裝,方式如下:

1
2
3
4
wget http://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.14.tar.gz
tar -xvzf libiconv-1.14.tar.gz
cd libiconv-1.14 && ./configure --prefix=/usr/local/libiconv
make && sudo make install

但是我在make過程中有出錯,我google之後找到的解法是修改srclib/stdio.in.h的698列:
原本的script:

1
_GL_WARN_ON_USE (gets, "gets is a security hole - use fgets instead");

修改後的scipt:

1
2
3
#if defined(__GLIBC__) && !defined(__UCLIBC__) && !__GLIBC_PREREQ(2, 16)
_GL_WARN_ON_USE (gets, "gets is a security hole - use fgets instead");
#endif

之後再重新make就成功了。

  1. 取得R source code:
1
2
wget http://cran.csie.ntu.edu.tw/src/base/R-3/R-3.2.3.tar.gz
tar -xvzf R-3.2.3.tar.gz
  1. 取得Intel C++ compiler and Intel MKL,你可以取得non-commercial license for this two software in intel website. 另外,64bit linux system不支援32 bits的compiler,安裝時記得取消掉IA32的安裝。

  2. compilitation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
sudo -s
source /opt/intel/composer_xe_2015/mkl/bin intel64
source /opt/intel/composer_xe_2015/bin/compilervars.sh intel64
MKL_path=/opt/intel/composer_xe_2015/mkl
ICC_path=/opt/intel/composer_xe_2015/compiler
export LD="xild"
export AR="xiar"
export CC="icc"
export CXX="icpc"
export CFLAGS="-wd188 -ip -std=gnu99 -g -O3 -openmp -parallel -xHost -ipo -fp-model precise -fp-model source"
export CXXFLAGS="-g -O3 -openmp -parallel -xHost -ipo -fp-model precise -fp-model source"
export F77=ifort
export FFLAGS="-g -O3 -openmp -parallel -xHost -ipo -fp-model source"
export FC=ifort
export FCFLAGS="-g -O3 -openmp -parallel -xHost -ipo -fp-model source"
export ICC_LIBS=$ICC_path/lib/intel64
export IFC_LIBS=$ICC_path/lib/intel64
export LDFLAGS="-L$ICC_LIBS -L$IFC_LIBS -L$MKL_path/lib/intel64 -L/usr/lib -L/usr/local/lib -openmp"
export SHLIB_CXXLD=icpc
export SHLIB_LDFLAGS="-shared -fPIC"
export SHLIB_CXXLDFLAGS="-shared -fPIC"
MKL="-L$MKL_path/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -ldl -lm"
./configure --with-blas="$MKL" --with-lapack --with-x --enable-memory-profiling --with-tcl-config=/usr/lib/tcl8.6/tclConfig.sh --with-tk-config=/usr/lib/tk8.6/tkConfig.sh --enable-R-shlib --enable-BLAS-shlib --enable-prebuilt-html

如果順利會出現下方的畫面:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
R is now configured for x86_64-pc-linux-gnu

Source directory: .
Installation directory: /usr/local

C compiler: icc -wd188 -ip -std=gnu99 -g -O3 -openmp -parallel -xHost -ipo -fp-model precise -fp-model source
Fortran 77 compiler: ifort -g -O3 -openmp -parallel -xHost -ipo -fp-model source

C++ compiler: icpc -g -O3 -openmp -parallel -xHost -ipo -fp-model precise -fp-model source
C++ 11 compiler: icpc -std=c++11 -g -O3 -openmp -parallel -xHost -ipo -fp-model precise -fp-model source
Fortran 90/95 compiler: ifort -g -O3 -openmp -parallel -xHost -ipo -fp-model source
Obj-C compiler:

Interfaces supported: X11, tcltk
External libraries: readline, BLAS(MKL), zlib, bzlib, lzma, PCRE, curl
Additional capabilities: PNG, JPEG, TIFF, NLS, cairo, ICU
Options enabled: shared R library, shared BLAS, R profiling, memory profiling, static HTML

Capabilities skipped:
Options not enabled:

Recommended packages: yes

出現上方畫面就可以開始make跟install了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
make && make check
# removing R before installation
rm /usr/lib/libR.so
rm -r /usr/lib/R
rm -r /usr/bin/Rscript
rm -r /usr/local/lib/R
make docs
make install
chown -R celest.celest /usr/local/lib/R
chmod -R 775 /usr/local/lib/R
# to add mkl and intel c compiler into path
echo 'source /opt/intel/composer_xe_2015/mkl/bin/mklvars.sh intel64' >> /etc/bash.bashrc
echo 'source /opt/intel/composer_xe_2015/bin/compilervars.sh intel64' >> /etc/bash.bashrc
exit
# install required package
R -e "install.packages('SuppDists', repos = 'http://cran.rstudio.com/')"
# run benchmark
R -e "source('http://r.research.att.com/benchmarks/R-benchmark-25.R')"
# to run rstudio-server, you have two options, first:
# echo 'rsession-which-r=/usr/local/bin/R' >> /etc/rstudio/rserver.conf
# second , please use option configure with --prefix=/usr

然後他就會幫你把R安裝於usr/local/lib/R中。

  1. 測試結果
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
   R Benchmark 2.5
===============
Number of times each test is run__________________________: 3

I. Matrix calculation
---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec): 0.683
2400x2400 normal distributed random matrix ^1000____ (sec): 0.259666666666667
Sorting of 7,000,000 random values__________________ (sec): 0.560333333333333
2800x2800 cross-product matrix (b = a' * a)_________ (sec): 0.44
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 0.193666666666666
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.400041560496478

II. Matrix functions
--------------------
FFT over 2,400,000 random values____________________ (sec): 0.393666666666667
Eigenvalues of a 640x640 random matrix______________ (sec): 0.326999999999999
Determinant of a 2500x2500 random matrix____________ (sec): 0.215666666666666
Cholesky decomposition of a 3000x3000 matrix________ (sec): 0.183999999999999
Inverse of a 1600x1600 random matrix________________ (sec): 0.182333333333332
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.234990082575285

III. Programmation
------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec): 0.274333333333333
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.221333333333333
Grand common divisors of 400,000 pairs (recursion)__ (sec): 0.275666666666667
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.255
Escoufier's method on a 45x45 matrix (mixed)________ (sec): 0.338999999999999
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.268164327390206


Total time for all 15 tests_________________________ (sec): 4.80466666666666
Overall mean (sum of I, II and III trimmed means/3)_ (sec): 0.293214347493761
--- End of test ---

最後只需要用到4.8秒就可以完成了,可是complitation過程是滿麻煩的,雖然參考了多個網站,可是參數的設定都不太一樣,linux又有權限的限制,而且就算編譯成功,Rcpp這個套件不見得能夠成功,因此花了很久才終於編譯成功,並且能夠直接開啟,只是要利用到c, cpp or fortran時還是需要source compilervars.sh才能夠運行,而且我安裝了三四十個套件都沒有問題了。最後,如果沒有特別要求速度下,其實直接用OpenBLAS就可以省下很多麻煩。另外,我做了一個小小的測試於Rcpp上,速度有不少的提昇(因為用intel C++ compiler,大概增加5~10倍),測試結果就不放上來了。以上資訊供大家參考,轉載請註明來源,謝謝。

最後附上測試環境: My environment is mint 17.3, R 3.2.3 compiled by Intel c++, fortran compiler with Intel MKL. My CPU is 3770K@4.4GHz.

To use the html help page and change the default language of R to english, you can do that:

1
2
echo 'options("help_type"="html")' > ~/.Rprofile
echo 'LANGUAGE="en"' > ~/.Renviron

如果要讓Rstudio Server裡面成功啟動並且可以使用icpc,請在/usr/lib/rstudio-server/R/ServerOptions.R裡面加入下方:

1
Sys.setenv(PATH = "/opt/intel/composer_xe_2015.1.133/bin/intel64:/opt/intel/composer_xe_2015.1.133/debugger/gdb/intel64_mic/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/db/bin:/usr/lib/jvm/java-8-oracle/jre/bin:$PATH")

java config and install some useful packages:

1
2
3
4
5
6
R CMD javareconf
install.packages(c('devtools', 'testthat'))
devtools::install_github(c('klutometis/roxygen', 'hadley/assertthat', 'RcppCore/Rcpp', 'hadley/devtools', 'hadley/testthat', 'hadley/lazyeval'))
devtools::install_github(c('smbache/magrittr', 'Rdatatable/data.table', 'hadley/reshape', 'hadley/plyr', 'hadley/dplyr'))
devtools::install_github(c('RcppCore/RcppArmadillo', 'RcppCore/RcppEigen', 'RcppCore/RcppParallel'))
devtools::install_github(c('hadley/tidyr', 'hadley/purrr', 'yihui/knitr'))