Ching-Chuan Chen's Blogger

Statistics, Machine Learning and Programming

0%

SparkR初探

雖然順序有點反了,先介紹了sparklyr

這裡就簡單show一下怎麼用SparkR,並且去拉Cassandra的table

R跟rstudio server就不贅述了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# 讓R能夠找到SparkR套件
Sys.setenv(SPARK_HOME = "/usr/local/bigdata/spark")
.libPaths(c(.libPaths(), paste0(Sys.getenv("SPARK_HOME"), "/R/lib")))

library(SparkR)
library(pipeR)
library(stringr)

# 讀取spark設定
spark_settings <- readLines(paste0(Sys.getenv("SPARK_HOME"), "/conf/spark-defaults.conf")) %>>%
`[`(!str_detect(., "^#")) %>>% `[`(nchar(.) > 0) %>>% str_split("\\s") %>>% {
`names<-`(lapply(., `[`, 2), sapply(., `[`, 1))
} %>>% c(list(spark.cassandra.connection.host =
"192.168.0.121,192.168.0.122,192.168.0.123"))

# 開spark session
sc <- sparkR.session(master = spark_master,
sparkJars = spark_settings$spark.jars %>>% str_split(",") %>>% `[[`(1),
sparkConfig = spark_settings)

# 讀入Cassandra表
cass_tbl <- read.df(NULL, source = "org.apache.spark.sql.cassandra",
keyspace = "test", table = "kv2")

# 拉回local
cass_tbl_r <- as.data.frame(cass_tbl)