Skip to content Skip to sidebar Skip to footer

Stratifiedshufflesplit: Valueerror: The Least Populated Class In Y Has Only 1 Member, Which Is Too Few.

I'm using the StratifiedShuffleSplit cross validator for predicting the house prices in the Boston dataset. When I run the below sample code. def fit_model_S(labels, features,step,

Solution 1:

Boston Housing data is a dataset for regression problem. You are using StratifiedShuffleSplit to divide it into train and test. StratifiedShuffleSplit as mentioned in docs is:

This cross-validation object is a merge of StratifiedKFold and ShuffleSplit, which returns stratified randomized folds. The folds are made by preserving the percentage of samples for each class.

Please look at the last line :- "preserving the percentage of samples for each class". So the StratifiedShuffleSplit tries to see the y values as individual classes.

But it will not be possible because your y is a regression variable (continuous numerical data).

Please look at ShuffleSplit, or train_test_split to divide your data. See here for more details on cross-validation: http://scikit-learn.org/stable/modules/cross_validation.html#cross-validation

Post a Comment for "Stratifiedshufflesplit: Valueerror: The Least Populated Class In Y Has Only 1 Member, Which Is Too Few."