Machine Learning With R
- 5. Machine Learning
Black-box, algorithmic approach to producing predictions or
classifications from data
A computer program is said to learn from
experience E with respect to some task T and some
performance measure P, if its performance on T, as
measured by P, improves with experience E
Tom Mitchell (1998)
- 7. Why Using R?
1. Statistic analysis on the fly
2. Mathematical function and graphic module embedded
3. FREE! & Open Source!
- 10. Topics of Machine Learning
Supervised Learning
Regression
Classfication
Unsupervised Learning
Dimension Reduction
Clustering
- 11. Regression
Predict one set of numbers given another set of numbers
Given number of friends x, predict how many
goods I will receive on each facebook posts
- 12. Scatter Plot
dataset <- read.csv('fbgood.txt',head=TRUE, sep='t', row.names=1)
x = dataset$friends
y = dataset$getgoods
plot(x,y)
- 14. 2nd order polynomial fit
plot(x,y)
polyfit2 <- lm(y ~ poly(x, 2));
lines(sort(x), polyfit2$fit[order(x)], col = 2, lwd = 3)
- 15. 3rd order polynomial fit
plot(x,y)
polyfit3 <- lm(y ~ poly(x, 3));
lines(sort(x), polyfit3$fit[order(x)], col = 2, lwd = 3)
- 17. Classfication
Identifying to which of a set of categories a new observation belongs,
on the basis of a training set of data
Given features of bank costumer, predict whether
the client will subscribe a term deposit
- 19. Classify Data With LibSVM
library(e1071)
dataset <- read.csv('bank.csv',head=TRUE, sep=';')
dati = split.data(dataset, p = 0.7)
train = dati$train
test = dati$test
model <- svm(y~., data = train, probability = TRUE)
pred <- predict(model, test[,1:(dim(test)[[2]]-1)], probability = TRUE)
- 21. Using ROC for assessment
library(ROCR)
pred.prob <- attr(pred, "probabilities")
pred.to.roc <- pred.prob[, 2]
pred.rocr <- prediction(pred.to.roc, as.factor(test[,(dim(test)[[2]])]))
perf.rocr <- performance(pred.rocr, measure = "auc", x.measure = "cutoff")
perf.tpr.rocr <- performance(pred.rocr, "tpr","fpr")
plot(perf.tpr.rocr, colorize=T, main=paste("AUC:",(perf.rocr@y.values)))
- 23. Support Vector Machines and
Kernel Methods
e1071 - LIBSVM
kernlab - SVM, RVM and other kernel learning algorithms
klaR - SVMlight
rdetools - Model selection and prediction
- 24. Dimension Reduction
Seeks linear combinations of the columns of X with maximalvariance
Calculate a new index to measure economy index
of each Taiwan city/county
- 25. Economic Index of Taiwan
County
縣市
營利事業銷售額
經濟發展支出佔歲出比例
得收入者平均每人可支配所得
2012年《天下雜誌》幸福城市大調查 - 第505期
- 26. Component Bar Plot
dataset <- read.csv('eco_index.csv',head=TRUE, sep=',', row.names=1)
pc.cr <- princomp(dataset, cor = TRUE)
plot(pc.cr)
- 33. Determing Clusters
mydata <- read.csv('costumer_segment.txt',head=TRUE, sep='t')
mydata <- scale(mydata)
d <- dist(mydata, method = "euclidean")
fit <- hclust(d, method="ward")
plot(fit)
- 38. Machine Learning Dignostic
1. Get more training examples
2. Try smaller sets of features
3. Try getting additional features
4. Try adding polynomial features
5. Try parameter increasing/decreasing