20

This is my first question on stack overflow. Recently I want to use linked-in-scraper, so I downloaded and instruct "scrapy crawl linkedin.com" and get the below error message. For your information, I use anaconda 2.3.0 and python 2.7.11. All the related packages, including scrapy and six, are updated by pip before executing the program.

Traceback (most recent call last):
  File "/Users/byeongsuyu/anaconda/bin/scrapy", line 11, in <module>
    sys.exit(execute())
File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/cmdline.py", line 108, in execute
settings = get_project_settings()
File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/utils/project.py", line 60, in get_project_settings
settings.setmodule(settings_module_path, priority='project')
File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 285, in setmodule
self.set(key, getattr(module, key), priority)
  File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 260, in set
self.attributes[name].set(value, priority)
  File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 55, in set
value = BaseSettings(value, priority=priority)
  File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 91, in __init__
self.update(values, priority)
  File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 317, in update
for name, value in six.iteritems(values):
  File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/six.py", line 599, in iteritems
return d.iteritems(**kw)

AttributeError: 'list' object has no attribute 'iteritems'

I understand that this error stems from d is not the dictionary type but list type. And since the error is from the code on scrapy, maybe it is problem on scrapy package or six package. How can I try to fix this error?

EDIT: This is code from scrapy.cfg

  # Automatically created by: scrapy start project
  #
  # For more information about the [deploy] section see:
  # http://doc.scrapy.org/topics/scrapyd.html
  [settings]  
  default = linkedIn.settings

   [deploy]
   #url = http://localhost:6800/
   project = linkedIn
2
  • Do you have a config file for Scrapy? It looks like it is expecting to read a dictionnary but finds a list instead. Commented May 25, 2016 at 17:47
  • @ValentinLorentz Yes I added the code above. But I think it has no additional information for this issue. And the programmer built this code says that it works well on Ubuntu with python 2.7.6.
    – user124697
    Commented May 26, 2016 at 2:12

2 Answers 2

36

This is caused by the linked-in scraper's settings:

ITEM_PIPELINES = ['linkedIn.pipelines.LinkedinPipeline']

However, ITEM_PIPELINES is supposed to be a dict, according to the doc:

To activate an Item Pipeline component you must add its class to the ITEM_PIPELINES setting, like in the following example:

ITEM_PIPELINES = {
    'myproject.pipelines.PricePipeline': 300,
    'myproject.pipelines.JsonWriterPipeline': 800,
}

The integer values you assign to classes in this setting determine the order in which they run: items go through from lower valued to higher valued classes. It’s customary to define these numbers in the 0-1000 range.

According to this question, it used to be a list, which explains why this scraper uses a list. So you will have to either ask your the developer of the scraper to update their code, or to set ITEM_PIPELINES yourself.

0
0

The short answer is the ITEM_PIPELINES should be a dictionary not a list with the key as the pipeline class and value an integer that determines the order in which they run: items go through from lower valued to higher valued classes. It’s customary to define these numbers in the 0-1000 range. as explained by @valentin Lorentz

Not the answer you're looking for? Browse other questions tagged or ask your own question.