Questions tagged [random-forest]

In learning algorithms and statistical classification, a random forest is an ensemble classifier that consists in many decision trees. It outputs the class that is the mode of the classes output by individual trees, in other words, the class with the highest frequency.

random-forest
Filter by
Sorted by
Tagged with
211 votes
25 answers
169k views

How to extract the decision rules from scikit-learn decision-tree?

Can I extract the underlying decision-rules (or 'decision paths') from a trained tree in a decision tree as a textual list? Something like: if A>0.4 then if B<0.2 then if C>0.8 then class='X'
Dror Hilman's user avatar
  • 7,167
149 votes
7 answers
86k views

How are feature_importances in RandomForestClassifier determined?

I have a classification task with a time-series as the data input, where each attribute (n=23) represents a specific point in time. Besides the absolute classification result I would like to find out, ...
user2244670's user avatar
  • 1,491
105 votes
6 answers
97k views

Do I need to normalize (or scale) data for randomForest (R package)? [closed]

I am doing regression task - do I need to normalize (or scale) data for randomForest (R package)? And is it neccessary to scale also target values? And if - I want to use scale function from caret ...
gutompf's user avatar
  • 1,335
104 votes
3 answers
42k views

RandomForestClassifier vs ExtraTreesClassifier in scikit learn

Can anyone explain the difference between the RandomForestClassifier and ExtraTreesClassifier in scikit learn. I've spent a good bit of time reading the paper: P. Geurts, D. Ernst., and L. Wehenkel, ...
denson's user avatar
  • 2,406
92 votes
8 answers
227k views

RandomForestClassfier.fit(): ValueError: could not convert string to float

Given is a simple CSV file: A,B,C Hello,Hi,0 Hola,Bueno,1 Obviously the real dataset is far more complex than this, but this one reproduces the error. I'm attempting to build a random forest ...
nilkn's user avatar
  • 965
86 votes
3 answers
97k views

How to use random forests in R with missing values?

library(randomForest) rf.model <- randomForest(WIN ~ ., data = learn) I would like to fit a random forest model, but I get this error: Error in na.fail.default(list(WIN = c(2L, 1L, 1L, 2L, 1L, 2L,...
Borut Flis's user avatar
84 votes
6 answers
109k views

Can sklearn random forest directly handle categorical features?

Say I have a categorical feature, color, which takes the values ['red', 'blue', 'green', 'orange'], and I want to use it to predict something in a random forest. If I one-hot encode it (i.e. I ...
tkunk's user avatar
  • 1,418
73 votes
2 answers
72k views

What is out of bag error in Random Forests? [closed]

What is out of bag error in Random Forests? Is it the optimal parameter for finding the right number of trees in a Random Forest?
csalive's user avatar
  • 871
58 votes
2 answers
160k views

How to get Best Estimator on GridSearchCV (Random Forest Classifier Scikit)

I'm running GridSearch CV to optimize the parameters of a classifier in scikit. Once I'm done, I'd like to know which parameters were chosen as the best. Whenever I do so I get a AttributeError: '...
sapo_cosmico's user avatar
  • 6,384
54 votes
2 answers
72k views

How do I solve overfitting in random forest of Python sklearn?

I am using RandomForestClassifier implemented in python sklearn package to build a binary classification model. The below is the results of cross validations: Fold 1 : Train: 164 Test: 40 Train ...
Munichong's user avatar
  • 3,931
52 votes
8 answers
132k views

Random Forest Feature Importance Chart using Python

I am working with RandomForestRegressor in python and I want to create a chart that will illustrate the ranking of feature importance. This is the code I used: from sklearn.ensemble import ...
user348547's user avatar
48 votes
7 answers
41k views

multioutput regression by xgboost

Is it possible to train a model by xgboost that has multiple continuous outputs (multi-regression)? What would be the objective of training such a model? Thanks in advance for any suggestions
user1782011's user avatar
46 votes
3 answers
51k views

R Random Forests Variable Importance

I am trying to use the random forests package for classification in R. The Variable Importance Measures listed are: mean raw importance score of variable x for class 0 mean raw importance score of ...
thirsty93's user avatar
  • 2,622
44 votes
4 answers
48k views

How to tune parameters in Random Forest, using Scikit Learn?

class sklearn.ensemble.RandomForestClassifier(n_estimators=10, criterion='gini', max_depth=None, ...
O.rka's user avatar
  • 30.6k
43 votes
5 answers
96k views

setting values for ntree and mtry for random forest regression model

I'm using R package randomForest to do a regression on some biological data. My training data size is 38772 X 201. I just wondered---what would be a good value for the number of trees ntree and the ...
DOSMarter's user avatar
  • 1,493
42 votes
4 answers
105k views

random forest tuning - tree depth and number of trees

I have basic question about tuning a random forest classifier. Is there any relation between the number of trees and the tree depth? Is it necessary that the tree depth should be smaller than the ...
Vysh's user avatar
  • 728
40 votes
3 answers
124k views

Got continuous is not supported error in RandomForestRegressor

I'm just trying to do a simple RandomForestRegressor example. But while testing the accuracy I get this error /Users/noppanit/anaconda/lib/python2.7/site-packages/sklearn/metrics/classification.pyc ...
toy's user avatar
  • 11.9k
40 votes
3 answers
56k views

Understanding max_features parameter in RandomForestRegressor

While constructing each tree in the random forest using bootstrapped samples, for each terminal node, we select m variables at random from p variables to find the best split (p is the total number of ...
csankar69's user avatar
  • 667
37 votes
6 answers
136k views

Plot trees for a Random Forest in Python with Scikit-Learn

I want to plot a decision tree of a random forest. So, i create the following code: clf = RandomForestClassifier(n_estimators=100) import pydotplus import six from sklearn import tree dotfile = six....
Zoya's user avatar
  • 1,195
36 votes
5 answers
47k views

Save python random forest model to file

In R, after running "random forest" model, I can use save.image("***.RData") to store the model. Afterwards, I can just load the model to do predictions directly. Can you do a similar thing in python?...
user3013706's user avatar
35 votes
4 answers
39k views

Unbalanced classification using RandomForestClassifier in sklearn

I have a dataset where the classes are unbalanced. The classes are either '1' or '0' where the ratio of class '1':'0' is 5:1. How do you calculate the prediction error for each class and the ...
mlo's user avatar
  • 657
35 votes
5 answers
27k views

Is there easy way to grid search without cross validation in python?

There is absolutely helpful class GridSearchCV in scikit-learn to do grid search and cross validation, but I don't want to do cross validataion. I want to do grid search without cross validation and ...
ykensuke9's user avatar
  • 734
34 votes
8 answers
92k views

How can I use the row.names attribute to order the rows of my dataframe in R?

I created a random forest and predicted the classes of my test set, which are living happily in a dataframe: row.names class 564028 1 275747 1 601137 0 922930 1 481988 1 ....
tumultous_rooster's user avatar
33 votes
1 answer
20k views

How do you access tree depth in Python's scikit-learn?

I'm using scikit-learn to create a Random Forest. However, I want to find the individual depths of each tree. It seems like a simple attribute to have but according to the documentation, (http://...
iltp38's user avatar
  • 519
32 votes
4 answers
58k views

Random forest output interpretation

I have run a random forest for my data and got the output in the form of a matrix. What are the rules it applied to classify? P.S. I want a profile of the customer as output, e.g. Person from New ...
user2061730's user avatar
31 votes
2 answers
52k views

Numpy Array Get row index searching by a row

I am new to numpy and I am implementing clustering with random forest in python. My question is: How could I find the index of the exact row in an array? For example [[ 0. 5. 2.] [ 0. 0. 3.] [...
user2801023's user avatar
31 votes
3 answers
102k views

Using the predict_proba() function of RandomForestClassifier in the safe and right way

I'm using Scikit-learn. Sometimes I need to have the probabilities of labels/classes instead of the labels/classes themselves. Instead of having Spam/Not Spam as labels of emails, I wish to have only ...
Clinical's user avatar
  • 503
30 votes
3 answers
50k views

Random Forest with GridSearchCV - Error on param_grid

Im trying to create a Random Forest model with GridSearchCV but am getting an error pertaining to param_grid: "ValueError: Invalid parameter max_features for estimator Pipeline. Check the list of ...
OAK's user avatar
  • 3,074
27 votes
9 answers
43k views

r random forest error - type of predictors in new data do not match

I am trying to use quantile regression forest function in R (quantregForest) which is built on Random Forest package. I am getting a type mismatch error that I can't quite figure why. I train the ...
Gizem's user avatar
  • 371
27 votes
3 answers
26k views

how to use classwt in randomForest of R?

I have a highly imbalanced data set with target class instances in the following ratio 60000:1000:1000:50 (i.e. a total of 4 classes). I want to use randomForest for making predictions of the target ...
StrikeR's user avatar
  • 1,608
26 votes
3 answers
36k views

How to change datatype of multiple columns in pandas

I'm trying to run a Random Forest on a pandas dataframe. I know there are no nulls or infinities in the dataframe but continually get a ValueError when I fit the model. Presumably this is because I ...
MK.'s user avatar
  • 261
26 votes
2 answers
37k views

How to extract feature importances from an Sklearn pipeline

I've built a pipeline in Scikit-Learn with two steps: one to construct features, and the second is a RandomForestClassifier. While I can save that pipeline, look at various steps and the various ...
elksie5000's user avatar
  • 7,488
26 votes
4 answers
11k views

How to set seed for random simulations with foreach and doMC packages?

I need to do some simulations and for debugging purposes I want to use set.seed to get the same result. Here is the example of what I am trying to do: library(foreach) library(doMC) registerDoMC(2) ...
mpiktas's user avatar
  • 11.4k
25 votes
2 answers
69k views

How to perform random forest/cross validation in R

I'm unable to find a way of performing cross validation on a regression random forest model that I'm trying to produce. So I have a dataset containing 1664 explanatory variables (different chemical ...
user2062207's user avatar
25 votes
4 answers
21k views

What does the value of 'leaf' in the following xgboost model tree diagram means?

I am guessing that it is conditional probability given that the above (tree branch) condition exists. However, I am not clear on it. If you want to read more about the data used or how do we get ...
dsl1990's user avatar
  • 1,207
24 votes
3 answers
36k views

Variable importance with ranger

I trained a random forest using caret + ranger. fit <- train( y ~ x1 + x2 ,data = total_set ,method = "ranger" ,trControl = trainControl(method="cv", number = 5, allowParallel = ...
François M.'s user avatar
  • 4,199
23 votes
4 answers
21k views

Suggestions for speeding up Random Forests

I'm doing some work with the randomForest package and while it works well, it can be time-consuming. Any one have any suggestions for speeding things up? I'm using a Windows 7 box w/ a dual core AMD ...
screechOwl's user avatar
  • 27.7k
23 votes
1 answer
46k views

Using randomForest package in R, how to get probabilities from classification model?

TL;DR : Is there something I can flag in the original randomForest call to avoid having to re-run the predict function to get predicted categorical probabilities, instead of just the likely category? ...
Mike Williamson's user avatar
23 votes
1 answer
9k views

Why is training a random forest regressor with MAE criterion so slow compared to MSE?

When training on even small applications (<50K rows <50 columns) using the mean absolute error criterion for sklearn's RandomForestRegress is nearly 10x slower than using mean squared error. To ...
kevins_1's user avatar
  • 1,286
22 votes
2 answers
21k views

How to cross validate RandomForest model?

I want to evaluate a random forest being trained on some data. Is there any utility in Apache Spark to do the same or do I have to perform cross validation manually?
ashishsjsu's user avatar
21 votes
2 answers
13k views

Combining random forest models in scikit learn

I have two RandomForestClassifier models, and I would like to combine them into one meta model. They were both trained using similar, but different, data. How can I do this? rf1 #this is my first ...
mgoldwasser's user avatar
  • 14.9k
21 votes
1 answer
5k views

Why is Random Forest with a single tree much better than a Decision Tree classifier?

I apply the decision tree classifier and the random forest classifier to my data with the following code: def decision_tree(train_X, train_Y, test_X, test_Y): clf = tree.DecisionTreeClassifier() ...
hallow_me's user avatar
  • 1,213
21 votes
1 answer
15k views

What does the parameter 'classwt' in RandomForest function in RandomForest package in R stand for?

The help page for randomforest::randomforest() says: "classwt - Priors of the classes. Need not add up to one. Ignored for regression." Could setting the classwt parameter help when you have heavy ...
Qbik's user avatar
  • 5,995
20 votes
3 answers
91k views

Plot Feature Importance with feature names

In R there are pre-built functions to plot feature importance of Random Forest model. But in python such method seems to be missing. I search for a method in matplotlib. model.feature_importances ...
add-semi-colons's user avatar
20 votes
1 answer
18k views

OpenCV - Random Forest Example

Does anyone have some example using Random Forests with the 2.3.1 API Mat and not the cvMat? Basically I have a Matrix Mat data that consists of 1000 rows with 16x16x3 elements and a Matrix Mat ...
Poul K. Sørensen's user avatar
19 votes
4 answers
33k views

How to improve randomForest performance?

I have a training set of size 38 MB (12 attributes with 420000 rows). I am running the below R snippet, to train the model using randomForest. This is taking hours for me. rf.model <- randomForest(...
user3497321's user avatar
19 votes
1 answer
9k views

Subscript out of bounds (Caret variable importance for randomForest) [duplicate]

I have trained a model in R: require(caret) require(randomForest) myControl = trainControl(method='cv',number=5,repeats=2,returnResamp='none') model2 = train(increaseInAssessedLevel~., data=trainData,...
Jakub Langr's user avatar
19 votes
4 answers
28k views

Difference between varImp (caret) and importance (randomForest) for Random Forest

I do not understand which is the difference between varImp function (caret package) and importance function (randomForest package) for a Random Forest model: I computed a simple RF classification ...
Rafa OR's user avatar
  • 349
19 votes
4 answers
34k views

Random Forest with classes that are very unbalanced

I am using random forests in a big data problem, which has a very unbalanced response class, so I read the documentation and I found the following parameters: strata sampsize The documentation for ...
nanounanue's user avatar
  • 8,112
19 votes
4 answers
12k views

How do I output the regression prediction from each tree in a Random Forest in Python scikit-learn?

Is there is a way to get the predictions from every tree in a random forest in addition to the combined prediction? I would like to output all of the predictions in a list and not view the entire ...
chunky's user avatar
  • 371

1
2 3 4 5
74