Invalid parameter for sklearn estimator pipeline

Question

I am implementing an example from the O'Reilly book "Introduction to Machine Learning with Python", using Python 2.7 and sklearn 0.16.

The code I am using:

pipe = make_pipeline(TfidfVectorizer(), LogisticRegression())
param_grid = {"logisticregression_C": [0.001, 0.01, 0.1, 1, 10, 100], "tfidfvectorizer_ngram_range": [(1,1), (1,2), (1,3)]}
grid = GridSearchCV(pipe, param_grid, cv=5)
grid.fit(X_train, y_train)
print("Best cross-validation score: {:.2f}".format(grid.best_score_))

The error being returned boils down to:

ValueError: Invalid parameter logisticregression_C for estimator Pipeline

Is this an error related to using Make_pipeline from v.0.16? What is causing this error?

Vivek Kumar · Accepted Answer · 2022-12-30 09:51:47Z

74

There should be two underscores between estimator name and it's parameters in a Pipeline logisticregression__C. Do the same for tfidfvectorizer

It is mentioned in the user guide here: https://scikit-learn.org/stable/modules/compose.html#nested-parameters.

See the example at https://scikit-learn.org/stable/auto_examples/compose/plot_compare_reduction.html#sphx-glr-auto-examples-compose-plot-compare-reduction-py

edited Dec 30, 2022 at 9:51

answered Jan 27, 2017 at 17:01

Vivek Kumar

36.2k9 gold badges114 silver badges137 bronze badges

2

I wish I could upvote more than once. The __ did the trick. Thank you
– seralouk
Commented Jun 9, 2017 at 8:27
5

file not found in this link: http://scikit-learn.org/stable/auto_examples/plot_compare_reduction.html#sphx-glr-auto-examples-plot-compare-reduction-py
– labyrinth
Commented Jun 15, 2020 at 22:16
This link is broken. Therefore provides NO answer.
– mccurcio
Commented Dec 29, 2022 at 22:52
1

@mccurcio Updated the link. Even without the example links, the answer is adequate in itself.
– Vivek Kumar
Commented Dec 30, 2022 at 9:52
@mccurcio Yes I agree. The former comment was just in reponse to "NO answer" part.
– Vivek Kumar
Commented Jan 2, 2023 at 7:10

Add a comment |

Bex T. · Accepted Answer · 2021-04-07 06:04:34Z

For a more general answer to using Pipeline in a GridSearchCV, the parameter grid for the model should start with whatever name you gave when defining the pipeline. For example:

# Pay attention to the name of the second step, i. e. 'model'
pipeline = Pipeline(steps=[
     ('preprocess', preprocess),
     ('model', Lasso())
])

# Define the parameter grid to be used in GridSearch
param_grid = {'model__alpha': np.arange(0, 1, 0.05)}

search = GridSearchCV(pipeline, param_grid)
search.fit(X_train, y_train)

In the pipeline, we used the name model for the estimator step. So, in the grid search, any hyperparameter for Lasso regression should be given with the prefix model__. The parameters in the grid depends on what name you gave in the pipeline. In plain-old GridSearchCV without a pipeline, the grid would be given like this:

param_grid = {'alpha': np.arange(0, 1, 0.05)}
search = GridSearchCV(Lasso(), param_grid)

You can find out more about GridSearch from this post.

Eric Wiener · Accepted Answer · 2020-03-14 01:26:18Z

Note that if you are using a pipeline with a voting classifier and a column selector, you will need multiple layers of names:

pipe1 = make_pipeline(ColumnSelector(cols=(0, 1)),
                      LogisticRegression())
pipe2 = make_pipeline(ColumnSelector(cols=(1, 2, 3)),
                      SVC())
votingClassifier = VotingClassifier(estimators=[
        ('p1', pipe1), ('p2', pipe2)])

You will need a param grid that looks like the following:

param_grid = { 
        'p2__svc__kernel': ['rbf', 'poly'],
        'p2__svc__gamma': ['scale', 'auto'],
    }

p2 is the name of the pipe and svc is the default name of the classifier you create in that pipe. The third element is the parameter you want to modify.

Nishchal Nishant · Accepted Answer · 2021-09-19 15:45:33Z

0

You can always use the model.get_params().keys() [ in case you are using only model ] or pipeline.get_params().keys() [ in case you are using the pipeline] to get the keys to the parameters you can adjust.

answered Sep 19, 2021 at 15:45

Nishchal Nishant

442 bronze badges

This is the only solution helped me to solve same problem. In my case, I had to replace max_depth with selectfrommodel__estimator__max_depth found in pipeline.get_params().keys()
– user164863
Commented Jan 9, 2023 at 19:05

Add a comment |

Collectives™ on Stack Overflow

Invalid parameter for sklearn estimator pipeline

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
python
scikit-learn
grid-search
scikit-learn-pipeline
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Not the answer you're looking for? Browse other questions tagged pythonscikit-learngrid-searchscikit-learn-pipeline or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
scikit-learn
grid-search
scikit-learn-pipeline
or ask your own question.