47

I am implementing an example from the O'Reilly book "Introduction to Machine Learning with Python", using Python 2.7 and sklearn 0.16.

The code I am using:

pipe = make_pipeline(TfidfVectorizer(), LogisticRegression())
param_grid = {"logisticregression_C": [0.001, 0.01, 0.1, 1, 10, 100], "tfidfvectorizer_ngram_range": [(1,1), (1,2), (1,3)]}
grid = GridSearchCV(pipe, param_grid, cv=5)
grid.fit(X_train, y_train)
print("Best cross-validation score: {:.2f}".format(grid.best_score_))

The error being returned boils down to:

ValueError: Invalid parameter logisticregression_C for estimator Pipeline

Is this an error related to using Make_pipeline from v.0.16? What is causing this error?

4 Answers 4

74

There should be two underscores between estimator name and it's parameters in a Pipeline logisticregression__C. Do the same for tfidfvectorizer

It is mentioned in the user guide here: https://scikit-learn.org/stable/modules/compose.html#nested-parameters.

See the example at https://scikit-learn.org/stable/auto_examples/compose/plot_compare_reduction.html#sphx-glr-auto-examples-compose-plot-compare-reduction-py

5
  • 2
    I wish I could upvote more than once. The __ did the trick. Thank you
    – seralouk
    Commented Jun 9, 2017 at 8:27
  • 5
    file not found in this link: http://scikit-learn.org/stable/auto_examples/plot_compare_reduction.html#sphx-glr-auto-examples-plot-compare-reduction-py
    – labyrinth
    Commented Jun 15, 2020 at 22:16
  • This link is broken. Therefore provides NO answer.
    – mccurcio
    Commented Dec 29, 2022 at 22:52
  • 1
    @mccurcio Updated the link. Even without the example links, the answer is adequate in itself. Commented Dec 30, 2022 at 9:52
  • @mccurcio Yes I agree. The former comment was just in reponse to "NO answer" part. Commented Jan 2, 2023 at 7:10
25

For a more general answer to using Pipeline in a GridSearchCV, the parameter grid for the model should start with whatever name you gave when defining the pipeline. For example:

# Pay attention to the name of the second step, i. e. 'model'
pipeline = Pipeline(steps=[
     ('preprocess', preprocess),
     ('model', Lasso())
])

# Define the parameter grid to be used in GridSearch
param_grid = {'model__alpha': np.arange(0, 1, 0.05)}

search = GridSearchCV(pipeline, param_grid)
search.fit(X_train, y_train)

In the pipeline, we used the name model for the estimator step. So, in the grid search, any hyperparameter for Lasso regression should be given with the prefix model__. The parameters in the grid depends on what name you gave in the pipeline. In plain-old GridSearchCV without a pipeline, the grid would be given like this:

param_grid = {'alpha': np.arange(0, 1, 0.05)}
search = GridSearchCV(Lasso(), param_grid)

You can find out more about GridSearch from this post.

8

Note that if you are using a pipeline with a voting classifier and a column selector, you will need multiple layers of names:

pipe1 = make_pipeline(ColumnSelector(cols=(0, 1)),
                      LogisticRegression())
pipe2 = make_pipeline(ColumnSelector(cols=(1, 2, 3)),
                      SVC())
votingClassifier = VotingClassifier(estimators=[
        ('p1', pipe1), ('p2', pipe2)])

You will need a param grid that looks like the following:

param_grid = { 
        'p2__svc__kernel': ['rbf', 'poly'],
        'p2__svc__gamma': ['scale', 'auto'],
    }

p2 is the name of the pipe and svc is the default name of the classifier you create in that pipe. The third element is the parameter you want to modify.

0

You can always use the model.get_params().keys() [ in case you are using only model ] or pipeline.get_params().keys() [ in case you are using the pipeline] to get the keys to the parameters you can adjust.

1
  • This is the only solution helped me to solve same problem. In my case, I had to replace max_depth with selectfrommodel__estimator__max_depth found in pipeline.get_params().keys()
    – user164863
    Commented Jan 9, 2023 at 19:05

Not the answer you're looking for? Browse other questions tagged or ask your own question.