Ching-Chuan Chen's Blogger

Statistics, Machine Learning and Programming

0%

Using caret to tune the model parameters

看到陳景翔老師在社團PO了caret這個套件(原文),興起就去玩了一下

這套件東西滿多的,一時半會也沒辦法全部看懂

不過就以repeated cv方式tuning的方式,大概不會跳脫下面示範的範疇

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
library(caret)
checkInstall("xgboost") # check xgboost is installed!

numCV <- 3L
cvFold <- 10L
ratioTrain <- 0.6
# 30 tuning model and 1 final model
numSeeds <- 4L * numCV
# 4 is (tuneLength + 1L), where tuneLength is the parameter of train with default value 3
set.seed(1)
seeds <- replicate(numCV * cvFold + 1L, sample.int(2^31-1, numSeeds), simplify = FALSE)

# split training / testing set
trIndex <- createDataPartition(iris$Species, 1L, ratioTrain, FALSE)
irisTrain <- iris[trIndex, ]
irisTest <- iris[setdiff(1L:nrow(iris), trIndex), ]

# 10-fold and repeat 3 times with seeds
ctrl <- trainControl("repeatedcv", cvFold, numCV, seeds = seeds)

#### auto-tuning ####
xgFit1 <- train(Species ~ ., data = irisTrain, trControl = ctrl,
method = "xgbTree", num_class = nlevels(iris$Species))
xgFit1

# training result (auto-tuning)
confusionMatrix(iris.Train$Species, predict(xgFit1, iris.Train))
# testing result (auto-tuning)
confusionMatrix(iris.Test$Species, predict(xgFit1, iris.Test))

#### specific grid ####
# check the model parameters
modelLookup("xgbTree")
# expand training grid
trGrid <- expand.grid(nrounds = c(50, 100), max_depth = c(10, 12), eta = c(0.1, 0.2),
gamma = 0, colsample_bytree = c(0.5, 0.6), min_child_weight = 1)
# start to train model
xgFit2 <- train(Species ~ ., data = iris.Train,
trControl = ctrl, method = "xgbTree",
tuneGrid = trGrid, num_class = nlevels(iris$Species))
xgFit2

# training result (auto-tuning)
confusionMatrix(iris.Train$Species, predict(xgFit2, iris.Train))
# testing result (auto-tuning)
confusionMatrix(iris.Test$Species, predict(xgFit2, iris.Test))