Rsudio provides a series of packages for the connection between R and hadoop. rhdfs provides the manipulation of HDFS in hadoop in R. rmr2 and plyrmr let user do mapreduce job in R. rhbase allow user to access data in hbase.
Build Hadoop environment in mint 17
Hadoop is one of the most popular tool to deal with the big data. I construct the environment of Hadoop in mint 17. Mint 17 is based on the ubuntu 14.04. The following steps also works in ubuntu 14.04.
Installation of HBase and Hive in mint 17
The second post for series of building hadoop environment in mint 17. HBase support the storage of big table in HDFS. Hive support Hadoop to query data with sql commands.
Combinations of protein in Rcpp
There is a sequence of protein like A B1/B2 C1/C2 K/D E F1/F2
, K
does not connect to next point, so it is cut at K. Therefore, the combinations of protein is in the following:
The length of runs of equal values in a vector
We usually encounter the problem for counting the length of equal values repeatedly. rle
is the build-in command in R to solve this problem which is consist of diff
and which
. But it is not so fast, I write another version of rle
in Rcpp.
Drawing planes in R
A simple log for drawing 2 planes in 3D plot.
Regular expression in R
Several examples of regular expression in R.
Splitting characters in Rcpp
We usually need to process the raw data by ourself, the character type of data is the most common type of raw data. I demonstrate a example to simply split the characters.
Processing string in R (Rcpp and rJava)
There is a example for processing string in R.
Before we use rJava, we need a class file first. The regex_java.java
is shown in below and compilation can be done with command javac regex_java.java
.
Computing the transition matrix for multi-state individual
We have a repeated-measuring data. We want to take average every 3 periods. Here is code to do it.