Downloading a file from the internet with python

Question

I'm trying to retrieve CSV data from a website through this link.

When downloaded manually you get synop.201708.csv.gz which is in fact a csv wrongly named .gz, it weights 2233KB

When running this code :

import urllib

file_date = '201708'
file_url = "https://donneespubliques.meteofrance.fr/donnees_libres/Txt/Synop/Archive/synop.{}.csv.gz".format(file_date)
output_file_name = "{}.csv.gz".format(file_date)

print "downloading {} to {}".format(file_url, output_file_name)
urllib.urlretrieve (file_url, output_file_name)

I'm getting a corrupted ~361Kb file

Any ideas why?

What is the content of downloaded file? Trimmed data or actually some web page with warning about something? — Łukasz Rogalski, Commented Aug 18, 2017 at 14:54
From output_file_name = "{}.csv.gz".format(file_date) to output_file_name = "{}.csv".format(file_date) — Joao Vitorino, Commented Aug 18, 2017 at 14:59
@JoaoVitorino and how will changing the name of the output change the input being received? — Jon Clements, Commented Aug 18, 2017 at 15:00
@pvg wow that is CRAZY, my browser (chrome) is unzipping the file without telling me and is keeping it named .gz (that is why I thought that I was getting an unzipped file) — sliders_alpha, Commented Aug 18, 2017 at 15:04

3Doubloons · Accepted Answer · 2017-08-18 15:44:40Z

What seems to be happening is that the MétéoFrance site is misusing the Content-Encoding header. The website reports that it is serving you a gzip file (Content-Type: application/x-gzip) and that it is encoding it in gzip format for the transfer (Content-Encoding: x-gzip). It is also saying the page is an attachment, which should be saved under its normal name (Content-Disposition: attachment)

In a vacuum, this would make sense (to a degree; compressing an already compressed file is mostly useless): The server serves a gzip file and compresses it again for transport. Upon receipt, your browser undoes the transport compression and saves the original gzip file. Here, it decompresses the stream, but since it wasn't compressed again, it doesn't work as expected.

sliders_alpha · Accepted Answer · 2017-08-18 15:11:23Z

0

As pvg said :

the file downloaded by urllib.urlretrieve is a compressed archive and not a csv file, everything is fine

I thought that I was suposed to get a csv named as .gz because when I was downloading it manually through my browser (chrome) it was then unziping it without telling me and it kept the unziped file name .gz

answered Aug 18, 2017 at 15:11

sliders_alpha

2,3644 gold badges36 silver badges55 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Downloading a file from the internet with python

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
python
python-2.7
urllib2
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged pythonpython-2.7urllib2 or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
python-2.7
urllib2
or ask your own question.