Questions tagged [random-forest]
In learning algorithms and statistical classification, a random forest is an ensemble classifier that consists in many decision trees. It outputs the class that is the mode of the classes output by individual trees, in other words, the class with the highest frequency.
random-forest
3,695
questions
211
votes
25
answers
169k
views
How to extract the decision rules from scikit-learn decision-tree?
Can I extract the underlying decision-rules (or 'decision paths') from a trained tree in a decision tree as a textual list?
Something like:
if A>0.4 then if B<0.2 then if C>0.8 then class='X'
149
votes
7
answers
86k
views
How are feature_importances in RandomForestClassifier determined?
I have a classification task with a time-series as the data input, where each attribute (n=23) represents a specific point in time. Besides the absolute classification result I would like to find out, ...
105
votes
6
answers
97k
views
Do I need to normalize (or scale) data for randomForest (R package)? [closed]
I am doing regression task - do I need to normalize (or scale) data for randomForest (R package)? And is it neccessary to scale also target values?
And if - I want to use scale function from caret ...
104
votes
3
answers
42k
views
RandomForestClassifier vs ExtraTreesClassifier in scikit learn
Can anyone explain the difference between the RandomForestClassifier and ExtraTreesClassifier in scikit learn. I've spent a good bit of time reading the paper:
P. Geurts, D. Ernst., and L. Wehenkel, ...
92
votes
8
answers
227k
views
RandomForestClassfier.fit(): ValueError: could not convert string to float
Given is a simple CSV file:
A,B,C
Hello,Hi,0
Hola,Bueno,1
Obviously the real dataset is far more complex than this, but this one reproduces the error. I'm attempting to build a random forest ...
86
votes
3
answers
97k
views
How to use random forests in R with missing values?
library(randomForest)
rf.model <- randomForest(WIN ~ ., data = learn)
I would like to fit a random forest model, but I get this error:
Error in na.fail.default(list(WIN = c(2L, 1L, 1L, 2L, 1L, 2L,...
84
votes
6
answers
109k
views
Can sklearn random forest directly handle categorical features?
Say I have a categorical feature, color, which takes the values
['red', 'blue', 'green', 'orange'],
and I want to use it to predict something in a random forest. If I one-hot encode it (i.e. I ...
73
votes
2
answers
72k
views
What is out of bag error in Random Forests? [closed]
What is out of bag error in Random Forests?
Is it the optimal parameter for finding the right number of trees in a Random Forest?
58
votes
2
answers
160k
views
How to get Best Estimator on GridSearchCV (Random Forest Classifier Scikit)
I'm running GridSearch CV to optimize the parameters of a classifier in scikit. Once I'm done, I'd like to know which parameters were chosen as the best.
Whenever I do so I get a AttributeError: '...
54
votes
2
answers
72k
views
How do I solve overfitting in random forest of Python sklearn?
I am using RandomForestClassifier implemented in python sklearn package to build a binary classification model. The below is the results of cross validations:
Fold 1 : Train: 164 Test: 40
Train ...
52
votes
8
answers
132k
views
Random Forest Feature Importance Chart using Python
I am working with RandomForestRegressor in python and I want to create a chart that will illustrate the ranking of feature importance. This is the code I used:
from sklearn.ensemble import ...
48
votes
7
answers
41k
views
multioutput regression by xgboost
Is it possible to train a model by xgboost that has multiple continuous outputs (multi-regression)?
What would be the objective of training such a model?
Thanks in advance for any suggestions
46
votes
3
answers
51k
views
R Random Forests Variable Importance
I am trying to use the random forests package for classification in R.
The Variable Importance Measures listed are:
mean raw importance score of variable x for class 0
mean raw importance score of ...
44
votes
4
answers
48k
views
How to tune parameters in Random Forest, using Scikit Learn?
class sklearn.ensemble.RandomForestClassifier(n_estimators=10,
criterion='gini',
max_depth=None,
...
43
votes
5
answers
96k
views
setting values for ntree and mtry for random forest regression model
I'm using R package randomForest to do a regression on some biological data. My training data size is 38772 X 201.
I just wondered---what would be a good value for the number of trees ntree and the ...
42
votes
4
answers
105k
views
random forest tuning - tree depth and number of trees
I have basic question about tuning a random forest classifier. Is there any relation between the number of trees and the tree depth? Is it necessary that the tree depth should be smaller than the ...
40
votes
3
answers
124k
views
Got continuous is not supported error in RandomForestRegressor
I'm just trying to do a simple RandomForestRegressor example. But while testing the accuracy I get this error
/Users/noppanit/anaconda/lib/python2.7/site-packages/sklearn/metrics/classification.pyc
...
40
votes
3
answers
56k
views
Understanding max_features parameter in RandomForestRegressor
While constructing each tree in the random forest using bootstrapped samples, for each terminal node, we select m variables at random from p variables to find the best split (p is the total number of ...
37
votes
6
answers
136k
views
Plot trees for a Random Forest in Python with Scikit-Learn
I want to plot a decision tree of a random forest. So, i create the following code:
clf = RandomForestClassifier(n_estimators=100)
import pydotplus
import six
from sklearn import tree
dotfile = six....
36
votes
5
answers
47k
views
Save python random forest model to file
In R, after running "random forest" model, I can use save.image("***.RData") to store the model. Afterwards, I can just load the model to do predictions directly.
Can you do a similar thing in python?...
35
votes
4
answers
39k
views
Unbalanced classification using RandomForestClassifier in sklearn
I have a dataset where the classes are unbalanced. The classes are either '1' or '0' where the ratio of class '1':'0' is 5:1. How do you calculate the prediction error for each class and the ...
35
votes
5
answers
27k
views
Is there easy way to grid search without cross validation in python?
There is absolutely helpful class GridSearchCV in scikit-learn to do grid search and cross validation, but I don't want to do cross validataion. I want to do grid search without cross validation and ...
34
votes
8
answers
92k
views
How can I use the row.names attribute to order the rows of my dataframe in R?
I created a random forest and predicted the classes of my test set, which are living happily in a dataframe:
row.names class
564028 1
275747 1
601137 0
922930 1
481988 1
....
33
votes
1
answer
20k
views
How do you access tree depth in Python's scikit-learn?
I'm using scikit-learn to create a Random Forest. However, I want to find the individual depths of each tree. It seems like a simple attribute to have but according to the documentation, (http://...
32
votes
4
answers
58k
views
Random forest output interpretation
I have run a random forest for my data and got the output in the form of a matrix.
What are the rules it applied to classify?
P.S. I want a profile of the customer as output,
e.g. Person from New ...
31
votes
2
answers
52k
views
Numpy Array Get row index searching by a row
I am new to numpy and I am implementing clustering with random forest in python. My question is:
How could I find the index of the exact row in an array? For example
[[ 0. 5. 2.]
[ 0. 0. 3.]
[...
31
votes
3
answers
102k
views
Using the predict_proba() function of RandomForestClassifier in the safe and right way
I'm using Scikit-learn. Sometimes I need to have the probabilities of labels/classes instead of the labels/classes themselves. Instead of having Spam/Not Spam as labels of emails, I wish to have only ...
30
votes
3
answers
50k
views
Random Forest with GridSearchCV - Error on param_grid
Im trying to create a Random Forest model with GridSearchCV but am getting an error pertaining to param_grid: "ValueError: Invalid parameter max_features for estimator Pipeline. Check the list of ...
27
votes
9
answers
43k
views
r random forest error - type of predictors in new data do not match
I am trying to use quantile regression forest function in R (quantregForest) which is built on Random Forest package. I am getting a type mismatch error that I can't quite figure why.
I train the ...
27
votes
3
answers
26k
views
how to use classwt in randomForest of R?
I have a highly imbalanced data set with target class instances in the following ratio 60000:1000:1000:50 (i.e. a total of 4 classes). I want to use randomForest for making predictions of the target ...
26
votes
3
answers
36k
views
How to change datatype of multiple columns in pandas
I'm trying to run a Random Forest on a pandas dataframe. I know there are no nulls or infinities in the dataframe but continually get a ValueError when I fit the model. Presumably this is because I ...
26
votes
2
answers
37k
views
How to extract feature importances from an Sklearn pipeline
I've built a pipeline in Scikit-Learn with two steps: one to construct features, and the second is a RandomForestClassifier.
While I can save that pipeline, look at various steps and the various ...
26
votes
4
answers
11k
views
How to set seed for random simulations with foreach and doMC packages?
I need to do some simulations and for debugging purposes I want to use set.seed to get the same result. Here is the example of what I am trying to do:
library(foreach)
library(doMC)
registerDoMC(2)
...
25
votes
2
answers
69k
views
How to perform random forest/cross validation in R
I'm unable to find a way of performing cross validation on a regression random forest model that I'm trying to produce.
So I have a dataset containing 1664 explanatory variables (different chemical ...
25
votes
4
answers
21k
views
What does the value of 'leaf' in the following xgboost model tree diagram means?
I am guessing that it is conditional probability given that the above (tree branch) condition exists. However, I am not clear on it.
If you want to read more about the data used or how do we get ...
24
votes
3
answers
36k
views
Variable importance with ranger
I trained a random forest using caret + ranger.
fit <- train(
y ~ x1 + x2
,data = total_set
,method = "ranger"
,trControl = trainControl(method="cv", number = 5, allowParallel = ...
23
votes
4
answers
21k
views
Suggestions for speeding up Random Forests
I'm doing some work with the randomForest package and while it works well, it can be time-consuming. Any one have any suggestions for speeding things up? I'm using a Windows 7 box w/ a dual core AMD ...
23
votes
1
answer
46k
views
Using randomForest package in R, how to get probabilities from classification model?
TL;DR :
Is there something I can flag in the original randomForest call to avoid having to re-run the predict function to get predicted categorical probabilities, instead of just the likely category?
...
23
votes
1
answer
9k
views
Why is training a random forest regressor with MAE criterion so slow compared to MSE?
When training on even small applications (<50K rows <50 columns) using the mean absolute error criterion for sklearn's RandomForestRegress is nearly 10x slower than using mean squared error. To ...
22
votes
2
answers
21k
views
How to cross validate RandomForest model?
I want to evaluate a random forest being trained on some data. Is there any utility in Apache Spark to do the same or do I have to perform cross validation manually?
21
votes
2
answers
13k
views
Combining random forest models in scikit learn
I have two RandomForestClassifier models, and I would like to combine them into one meta model. They were both trained using similar, but different, data. How can I do this?
rf1 #this is my first ...
21
votes
1
answer
5k
views
Why is Random Forest with a single tree much better than a Decision Tree classifier?
I apply the
decision tree classifier and the random forest classifier to my data with the following code:
def decision_tree(train_X, train_Y, test_X, test_Y):
clf = tree.DecisionTreeClassifier()
...
21
votes
1
answer
15k
views
What does the parameter 'classwt' in RandomForest function in RandomForest package in R stand for?
The help page for randomforest::randomforest() says:
"classwt - Priors of the classes. Need not add up to one. Ignored for regression."
Could setting the classwt parameter help when you have heavy ...
20
votes
3
answers
91k
views
Plot Feature Importance with feature names
In R there are pre-built functions to plot feature importance of Random Forest model. But in python such method seems to be missing. I search for a method in matplotlib.
model.feature_importances ...
20
votes
1
answer
18k
views
OpenCV - Random Forest Example
Does anyone have some example using Random Forests with the 2.3.1 API Mat and not the cvMat?
Basically I have a Matrix Mat data that consists of 1000 rows with 16x16x3 elements and a Matrix Mat ...
19
votes
4
answers
33k
views
How to improve randomForest performance?
I have a training set of size 38 MB (12 attributes with 420000 rows). I am running the below R snippet, to train the model using randomForest. This is taking hours for me.
rf.model <- randomForest(...
19
votes
1
answer
9k
views
Subscript out of bounds (Caret variable importance for randomForest) [duplicate]
I have trained a model in R:
require(caret)
require(randomForest)
myControl = trainControl(method='cv',number=5,repeats=2,returnResamp='none')
model2 = train(increaseInAssessedLevel~., data=trainData,...
19
votes
4
answers
28k
views
Difference between varImp (caret) and importance (randomForest) for Random Forest
I do not understand which is the difference between varImp function (caret package) and importance function (randomForest package) for a Random Forest model:
I computed a simple RF classification ...
19
votes
4
answers
34k
views
Random Forest with classes that are very unbalanced
I am using random forests in a big data problem, which has a very unbalanced response class, so I read the documentation and I found the following parameters:
strata
sampsize
The documentation for ...
19
votes
4
answers
12k
views
How do I output the regression prediction from each tree in a Random Forest in Python scikit-learn?
Is there is a way to get the predictions from every tree in a random forest in addition to the combined prediction? I would like to output all of the predictions in a list and not view the entire ...