43

I've fit a Pipeline object with RandomizedSearchCV

pipe_sgd = Pipeline([('scl', StandardScaler()),
                    ('clf', SGDClassifier(n_jobs=-1))])

param_dist_sgd = {'clf__loss': ['log'],
                 'clf__penalty': [None, 'l1', 'l2', 'elasticnet'],
                 'clf__alpha': np.linspace(0.15, 0.35),
                 'clf__n_iter': [3, 5, 7]}

sgd_randomized_pipe = RandomizedSearchCV(estimator = pipe_sgd, 
                                         param_distributions=param_dist_sgd, 
                                         cv=3, n_iter=30, n_jobs=-1)

sgd_randomized_pipe.fit(X_train, y_train)

I want to access the coef_ attribute of the best_estimator_ but I'm unable to do that. I've tried accessing coef_ with the code below.

sgd_randomized_pipe.best_estimator_.coef_

However I get the following AttributeError...

AttributeError: 'Pipeline' object has no attribute 'coef_'

The scikit-learn docs say that coef_ is an attribute of SGDClassifier, which is the class of my base_estimator_.

What am I doing wrong?

4 Answers 4

48

You can always use the names you assigned to them while making the pipeline by using the named_steps dict.

scaler = sgd_randomized_pipe.best_estimator_.named_steps['scl']
classifier = sgd_randomized_pipe.best_estimator_.named_steps['clf']

and then access all the attributes like coef_, intercept_ etc. which are available to corresponding fitted estimator.

This is the formal attribute exposed by the Pipeline as specified in the documentation:

named_steps : dict

Read-only attribute to access any step parameter by user given name. Keys are step names and values are steps parameters.

14

I think this should work:

sgd_randomized_pipe.named_steps['clf'].coef_
4

I've found one way to do this is by chained indexing with the steps attribute...

sgd_randomized_pipe.best_estimator_.steps[1][1].coef_

Is this best practice, or is there another way?

2
  • 2
    The named_steps method describe above is preferred
    – MCMZL
    Commented May 3, 2018 at 9:47
  • This worked well when using make_pipeline with many different classifiers!
    – RK1
    Commented Dec 20, 2021 at 12:05
1

In short, in scikit-learn there are two ways to access the estimators chained together in a Pipline: either retrieved by index or retrieved by name. (And each way again has two flavours, i.e. directly vs. indirectly.)


Firstly, as the User Guide of sklearn points out,

The Pipline is built using a list of (key, value) pairs (i.e. steps), where the key is a string containing the name you want to give this step and value is an estimator object.

Which indicates that:

  1. a pipline is constructed by one or multiple estimator objects, in order. (just like a list)

    >>> from sklearn.pipeline import Pipeline
    >>> from sklearn.svm import SVC
    >>> from sklearn.decomposition import PCA
    >>> estimators = [('reduce_dim', PCA()), ('clf', SVC())]
    >>> pipe = Pipeline(estimators)
    >>> pipe
    Pipeline(steps=[('reduce_dim', PCA()), ('clf', SVC())])
    
  2. and each estimator object has a name, either appointed by the user (with the key) or automatically set (e.g. by using make_pipeline utility function)

    >>> from sklearn.pipeline import make_pipeline
    >>> pipe = make_pipeline(PCA(), SVC())
    >>> pipe
    Pipeline(steps=[('pca', PCA()), ('svc', SVC())])
    

So finaly, we can access the estimators in a Pipline either

  1. by indexing the Pipline:
    • directly through the Pipline object (just like a list)
      >>> pipe[0]
      PCA()
      >>> pipe[1]
      SVC()
      
    • indirectly through the steps attribute (actually a list of tuple)
      >>> pipe.steps
      [('pca', PCA()), ('svc', SVC())]
      >>> pipe.steps[0][1]
      PCA()
      >>> pipe.steps[1][1]
      SVC()
      
  2. or by the name of steps/estimators:
    • directly through Pipline object (just like a dict or namedtyple)
      >>> pipe["pca"]
      PCA()
      >>> pipe["svc"]
      SVC()
      
    • indirectly through the named_steps attribute (actually a subclass of dict)
      >>> pipe.named_steps
      {'pca': PCA(), 'svc': SVC()}
      >>> pipe.named_steps["pca"]
      PCA()
      >>> pipe.named_steps["svc"]
      SVC()
      

From here on, I hope we could play around the piplines like a skilled plumber.

Not the answer you're looking for? Browse other questions tagged or ask your own question.