Ching-Chuan Chen's Blogger

Statistics, Machine Learning and Programming

0%

Rsudio provides a series of packages for the connection between R and hadoop. rhdfs provides the manipulation of HDFS in hadoop in R. rmr2 and plyrmr let user do mapreduce job in R. rhbase allow user to access data in hbase.

Read more »

Hadoop is one of the most popular tool to deal with the big data. I construct the environment of Hadoop in mint 17. Mint 17 is based on the ubuntu 14.04. The following steps also works in ubuntu 14.04.

Read more »

There is a sequence of protein like A B1/B2 C1/C2 K/D E F1/F2, K does not connect to next point, so it is cut at K. Therefore, the combinations of protein is in the following:

Read more »

We usually encounter the problem for counting the length of equal values repeatedly. rle is the build-in command in R to solve this problem which is consist of diff and which. But it is not so fast, I write another version of rle in Rcpp.

Read more »

We usually need to process the raw data by ourself, the character type of data is the most common type of raw data. I demonstrate a example to simply split the characters.

Read more »

There is a example for processing string in R.

Before we use rJava, we need a class file first. The regex_java.java is shown in below and compilation can be done with command javac regex_java.java.

Read more »