return coefficients from Pipeline object in sklearn

Question

I've fit a Pipeline object with RandomizedSearchCV

pipe_sgd = Pipeline([('scl', StandardScaler()),
                    ('clf', SGDClassifier(n_jobs=-1))])

param_dist_sgd = {'clf__loss': ['log'],
                 'clf__penalty': [None, 'l1', 'l2', 'elasticnet'],
                 'clf__alpha': np.linspace(0.15, 0.35),
                 'clf__n_iter': [3, 5, 7]}

sgd_randomized_pipe = RandomizedSearchCV(estimator = pipe_sgd, 
                                         param_distributions=param_dist_sgd, 
                                         cv=3, n_iter=30, n_jobs=-1)

sgd_randomized_pipe.fit(X_train, y_train)

I want to access the coef_ attribute of the best_estimator_ but I'm unable to do that. I've tried accessing coef_ with the code below.

sgd_randomized_pipe.best_estimator_.coef_

However I get the following AttributeError...

AttributeError: 'Pipeline' object has no attribute 'coef_'

The scikit-learn docs say that coef_ is an attribute of SGDClassifier, which is the class of my base_estimator_.

What am I doing wrong?

Community · Accepted Answer · 2020-06-20 09:12:55Z

You can always use the names you assigned to them while making the pipeline by using the named_steps dict.

scaler = sgd_randomized_pipe.best_estimator_.named_steps['scl']
classifier = sgd_randomized_pipe.best_estimator_.named_steps['clf']

and then access all the attributes like coef_, intercept_ etc. which are available to corresponding fitted estimator.

This is the formal attribute exposed by the Pipeline as specified in the documentation:

named_steps : dict

Read-only attribute to access any step parameter by user given name. Keys are step names and values are steps parameters.

Roozbeh Bakhshi · Accepted Answer · 2018-10-21 01:31:32Z

14

I think this should work:

sgd_randomized_pipe.named_steps['clf'].coef_

answered Oct 21, 2018 at 1:31

Roozbeh Bakhshi

8701 gold badge8 silver badges9 bronze badges

Add a comment |

spies006 · Accepted Answer · 2017-05-08 20:08:58Z

4

I've found one way to do this is by chained indexing with the steps attribute...

sgd_randomized_pipe.best_estimator_.steps[1][1].coef_

Is this best practice, or is there another way?

answered May 8, 2017 at 20:08

spies006

2,9172 gold badges20 silver badges29 bronze badges

2

The named_steps method describe above is preferred
– MCMZL
Commented May 3, 2018 at 9:47
This worked well when using make_pipeline with many different classifiers!
– RK1
Commented Dec 20, 2021 at 12:05

Add a comment |

YaOzI · Accepted Answer · 2022-10-28 14:46:54Z

In short, in scikit-learn there are two ways to access the estimators chained together in a Pipline: either retrieved by index or retrieved by name. (And each way again has two flavours, i.e. directly vs. indirectly.)

Firstly, as the User Guide of sklearn points out,

The Pipline is built using a list of (key, value) pairs (i.e. steps), where the key is a string containing the name you want to give this step and value is an estimator object.

Which indicates that:

a pipline is constructed by one or multiple estimator objects, in order. (just like a list)

>>> from sklearn.pipeline import Pipeline
>>> from sklearn.svm import SVC
>>> from sklearn.decomposition import PCA
>>> estimators = [('reduce_dim', PCA()), ('clf', SVC())]
>>> pipe = Pipeline(estimators)
>>> pipe
Pipeline(steps=[('reduce_dim', PCA()), ('clf', SVC())])

and each estimator object has a name, either appointed by the user (with the key) or automatically set (e.g. by using make_pipeline utility function)

>>> from sklearn.pipeline import make_pipeline
>>> pipe = make_pipeline(PCA(), SVC())
>>> pipe
Pipeline(steps=[('pca', PCA()), ('svc', SVC())])

So finaly, we can access the estimators in a Pipline either

by indexing the Pipline:

directly through the Pipline object (just like a list)
```
>>> pipe[0]
PCA()
>>> pipe[1]
SVC()
```

indirectly through the steps attribute (actually a list of tuple)

>>> pipe.steps
[('pca', PCA()), ('svc', SVC())]
>>> pipe.steps[0][1]
PCA()
>>> pipe.steps[1][1]
SVC()

or by the name of steps/estimators:

directly through Pipline object (just like a dict or namedtyple)
```
>>> pipe["pca"]
PCA()
>>> pipe["svc"]
SVC()
```

indirectly through the named_steps attribute (actually a subclass of dict)

>>> pipe.named_steps
{'pca': PCA(), 'svc': SVC()}
>>> pipe.named_steps["pca"]
PCA()
>>> pipe.named_steps["svc"]
SVC()

From here on, I hope we could play around the piplines like a skilled plumber.

Collectives™ on Stack Overflow

return coefficients from Pipeline object in sklearn

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
python
machine-learning
scikit-learn
cross-validation
scikit-learn-pipeline
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Not the answer you're looking for? Browse other questions tagged pythonmachine-learningscikit-learncross-validationscikit-learn-pipeline or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
machine-learning
scikit-learn
cross-validation
scikit-learn-pipeline
or ask your own question.