How to access scrapy settings from item Pipeline

Question

How do I access the scrapy settings in settings.py from the item pipeline. The documentation mentions it can be accessed through the crawler in extensions, but I don't see how to access the crawler in the pipelines.

not2qubit · Accepted Answer · 2021-05-04 15:20:46Z

UPDATE (2021-05-04)
Please note that this answer is now ~7 years old, so it's validity can no longer be ensured. In addition it is using Python2

The way to access your Scrapy settings (as defined in settings.py) from within your_spider.py is simple. All other answers are way too complicated. The reason for this is the very poor maintenance of the Scrapy documentation, combined with many recent updates & changes. Neither in the "Settings" documentation "How to access settings", nor in the "Settings API" have they bothered giving any workable example. Here's an example, how to get your current USER_AGENT string.

Just add the following lines to your_spider.py:

# To get your settings from (settings.py):
from scrapy.utils.project import get_project_settings
...
class YourSpider(BaseSpider):
    ...
    def parse(self, response):
        ...
        settings = get_project_settings()
        print "Your USER_AGENT is:\n%s" % (settings.get('USER_AGENT'))
        ...

As you can see, there's no need to use @classmethod or re-define the from_crawler() or __init__() functions. Hope this helps.

PS. I'm still not sure why using from scrapy.settings import Settings doesn't work the same way, since it would be the more obvious choice of import?

Despite the documentation suggest the method used @avaleske I still prefer this way because it works and faster to understand. — Arthur Alvim, Commented Aug 1, 2014 at 13:56
This method did not recognize settings that were overridden from the command line. Use @avaleske's answer if you want this functionality. — t-mart, Commented Apr 18, 2015 at 3:17

avaleske · Accepted Answer · 2012-12-28 21:33:13Z

31

Ok, so the documentation at http://doc.scrapy.org/en/latest/topics/extensions.html says that

The main entry point for a Scrapy extension (this also includes middlewares and pipelines) is the from_crawler class method which receives a Crawler instance which is the main object controlling the Scrapy crawler. Through that object you can access settings, signals, stats, and also control the crawler behaviour, if your extension needs to such thing.

So then you can have a function to get the settings.

@classmethod
def from_crawler(cls, crawler):
    settings = crawler.settings
    my_setting = settings.get("MY_SETTING")
    return cls(my_setting)

The crawler engine then calls the pipeline's init function with my_setting, like so:

def __init__(self, my_setting):
    self.my_setting = my_setting

And other functions can access it with self.my_setting, as expected.

Alternatively, in the from_crawler() function you can pass the crawler.settings object to __init__(), and then access settings from the pipeline as needed instead of pulling them all out in the constructor.

edited Dec 28, 2012 at 21:33

answered Dec 28, 2012 at 21:19

avaleske

1,8335 gold badges17 silver badges26 bronze badges

That sound awfully complicated. Isn't there a more easy way to do this, or perhaps a better explanation? Could you not use from scrapy.settings import Settings?
– not2qubit
Commented Jan 5, 2014 at 2:54
2

@user1147688 I'd use this method, because it conforms to the dependency-injection based internal API of scrapy. Your suggestion may work, but it doesn't look like there's any guarantee that it will continue to in the future, as internal APIs may be moved around.
– deceze ♦
Commented Jun 23, 2014 at 13:56
@avaleske, this works awesome, however do you know how we can use this to set a setting? For example, in some other function, say I wanted to change one of the settings values, like download_delay. Can we do that?
– Alex McLean
Commented May 28, 2015 at 3:21
2

This is very confusing. Could someone explain what code goes into what file?
– Josh Usre
Commented Jul 20, 2015 at 21:41
I get (False, <twisted.python.failure.Failure builtins.AttributeError: 'FilesDownloadPipeline' object has no attribute 'crawler'>) after I've added the above code in an item pipeline, class FilesDownloadPipeline(FilesPipeline)
– Alex Bitek
Commented Jan 11, 2017 at 10:55

Add a comment |

Darian Moody · Accepted Answer · 2014-02-12 17:35:52Z

24

The correct answer is: it depends where in the pipeline you wish to access the settings.

avaleske has answered as if you wanted access to the settings outside of your pipelines process_item method but it's very likely this is where you'll want the setting and therefore there is a much easier way as the Spider instance itself gets passed in as an argument.

class PipelineX(object):

    def process_item(self, item, spider):
         wanted_setting = spider.settings.get('WANTED_SETTING')

answered Feb 12, 2014 at 17:35

Darian Moody

3,9143 gold badges24 silver badges34 bronze badges

1

Great answer. For my project it made more sense to put the logic into the open_spider method as I only use the value when the spider first loads.
– Abe Voelker
Commented Jun 17, 2015 at 18:51

Add a comment |

timfeirg · Accepted Answer · 2014-11-28 07:38:44Z

3

the project structure is quite flat, why not:

# pipeline.py
from myproject import settings

answered Nov 28, 2014 at 7:38

timfeirg

1,51418 silver badges37 bronze badges

and then youre gonna change the myproject every time you start a new one
– Sphynx-HenryAY
Commented Jul 2, 2019 at 7:54
You won't get settings overriden by spider
– frenzy
Commented May 1, 2020 at 13:40

Add a comment |

Collectives™ on Stack Overflow

How to access scrapy settings from item Pipeline

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
python
scrapy
settings
pipeline
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Not the answer you're looking for? Browse other questions tagged pythonscrapysettingspipeline or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
scrapy
settings
pipeline
or ask your own question.