cURL method in Python for JSON feed [duplicate]

Question

While building a flask website, I'm using an external JSON feed to feed the local mongoDB with content. This feed is parsed and fed while repurposing keys from the JSON to keys in Mongo.

One of the available keys from the feed is called "img_url" and contains, guess what, an url to an image.

Is there a way, in Python, to mimic a php style cURL? I'd like to grab that key, download the image, and store it somewhere locally while keeping other associated keys, and have that as an entry to my db.

Here is my script up to now:

    import json
    import sys
    import urllib2
    from datetime import datetime

    import pymongo
    import pytz

    from utils import slugify
    # from utils import logger

    client = pymongo.MongoClient()
    db = client.artlogic

    def fetch_artworks():
    # logger.debug("downloading artwork data from Artlogic")

AL_artworks = []
AL_artists = []
url = "http://feeds.artlogic.net/artworks/artlogiconline/json/"

while True:
    f = urllib2.urlopen(url)
    data = json.load(f)

    AL_artworks += data['rows']

    # logger.debug("retrieved page %s of %s of artwork data" % (data['feed_data']['page'], data['feed_data']['no_of_pages']))

    # Stop we are at the last page
    if data['feed_data']['page'] == data['feed_data']['no_of_pages']:
        break

    url = data['feed_data']['next_page_link']

# Now we have a list called ‘artworks’ in which all the descriptions are stored
# We are going to put them into the mongoDB database,
# Making sure that if the artwork is already encoded (an object with the same id
# already is in the database) we update the existing description instead of
# inserting a new one (‘upsert’).

# logger.debug("updating local mongodb database with %s entries" % len(artworks))

for artwork in AL_artworks:
    # Mongo does not like keys that have a dot in their name,
    # this property does not seem to be used anyway so let us
    # delete it:
    if 'artworks.description2' in artwork:
        del artwork['artworks.description2']
    # upsert int the database:
    db.AL_artworks.update({"id": artwork['id']}, artwork, upsert=True)


    # artwork['artist_id'] is not functioning properly
    db.AL_artists.update({"artist": artwork['artist']},
                      {"artist_sort": artwork['artist_sort'],
                       "artist":  artwork['artist'],
                       "slug": slugify(artwork['artist'])},
                      upsert=True)

# db.meta.update({"subject": "artworks"}, {"updated": datetime.now(pytz.utc), "subject": "artworks"}, upsert=True)
return AL_artworks

    if __name__ == "__main__":
        fetch_artworks()

bufh · Accepted Answer · 2015-07-15 10:35:45Z

First, you might like the requests library.

Otherwise, if you want to stick to the stdlib, it will be something in the lines of:

def fetchfile(url, dst):
    fi = urllib2.urlopen(url)
    fo = open(dst, 'wb')
    while True:
        chunk = fi.read(4096)
        if not chunk: break
        fo.write(chunk)


fetchfile(
    data['feed_data']['next_page_link'],
    os.path.join('/var/www/static', uuid.uuid1().get_hex()
)

With the correct exceptions catching (i can develop if you want, but i'm sure the documentation will be clear enough).

You could put the fetchfile() into a pool of async jobs to fetch many files at once.

Collectives™ on Stack Overflow

cURL method in Python for JSON feed [duplicate]

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
python
mongodb
curl
flask
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged pythonmongodbcurlflask or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
mongodb
curl
flask
or ask your own question.