0

hey i am trying to download stock data from the nse website of india

so i am using python for this

the link is

 import urllib
   urllib.urlretrieve("https://www.nseindia.com/content/historical/DERIVATIVES/2016/JAN/fo01JAN2016bhav.csv.zip","fo01JAN2016bhav.csv.zip")

but when i try to open the file that is downloaded it says that the

compressed zipped file is invalid  

when i try it normal download from the website by simply pasting the link the file that gets downloaded gets opened

link

https://www.nseindia.com/content/historical/DERIVATIVES/2016/JAN/fo01JAN2016bhav.csv.zip

so if i try using urllib 2 i get this

f=urllib2.urlopen('https://www.nseindia.com/content/historical/DERIVATIVES/2016/JAN/fo01JAN2016bhav.csv.zip')

Traceback (most recent call last):
  File "<pyshell#6>", line 1, in <module>
    f=urllib2.urlopen('https://www.nseindia.com/content/historical/DERIVATIVES/2016/JAN/fo01JAN2016bhav.csv.zip')
  File "C:\Python27\lib\urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 410, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 448, in error
    return self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 531, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 403: Forbidden

how do i fix this ?

it happens for this link only i have tried downloading images from imgur and the code works fine

why is the http 403 error coming when i can normaly access it through my browser?

1
  • The site does some header validation. Setting a user-agent and accept seems to be sufficient.
    – user650881
    Commented Mar 28, 2017 at 20:30

2 Answers 2

1

This link provides an example of what you want to do: https://stackoverflow.com/a/22776/6595777

Found another question regarding downloading zip files. Try this:

url = "http://www.nseindia.com/content/historical/DERIVATIVES/2016/JAN/fo01JAN2016bhav.csv.zip"
download = urllib2.urlopen(url)
with open(os.path.basename(url), "wb") as f:
    f.write(download.read())

I don't have commenting permissions yet so I'm posting as an answer. I can't browse to your link via https, http works though. Have you tried changing your link in your script to http?

It is possible that your script is downloading the error page that I get when trying to use https (ERR_SSL_PROTOCOL_ERROR.) This means that what you download will have the file name you specify (ending in .zip,) but it is actually html. This means it will give you the error that the zip file is invalid

4
  • yea tried changing to http still does not work other links like images work perfectly fine only this link does not Commented Mar 28, 2017 at 19:52
  • The python 2 error you got is the you aren't allowed to access the link (403: Forbidden). I can access the http link so I don't think it should be forbidden. Have you tried http for both urllib and urllib2? Commented Mar 28, 2017 at 20:01
  • i do not know why your not able to get the link through https but i have tried it using incognito and also different browsers it works. using urllib2 i am getting a http 403 error for this link any idea what it is ? yea i have tried http with urllib2 as well but getting the same error Commented Mar 28, 2017 at 20:02
  • The 403 is probably generated by the server to disallow automated clients (such as your script) which identifies themselves with 'python' somewhere in their User-Agent header.
    – MatsLindh
    Commented Mar 28, 2017 at 20:12
0

hey i do not know why this is happening in urllib and urllib2 libraries but when i used the requests library

r = requests.get(url)
with open("code3.zip", "wb") as code:
    code.write(r.content)

it worked

this might be an indirect solution to my answer

Not the answer you're looking for? Browse other questions tagged or ask your own question.