-1

While building a flask website, I'm using an external JSON feed to feed the local mongoDB with content. This feed is parsed and fed while repurposing keys from the JSON to keys in Mongo.

One of the available keys from the feed is called "img_url" and contains, guess what, an url to an image.

Is there a way, in Python, to mimic a php style cURL? I'd like to grab that key, download the image, and store it somewhere locally while keeping other associated keys, and have that as an entry to my db.

Here is my script up to now:

    import json
    import sys
    import urllib2
    from datetime import datetime

    import pymongo
    import pytz

    from utils import slugify
    # from utils import logger

    client = pymongo.MongoClient()
    db = client.artlogic

    def fetch_artworks():
    # logger.debug("downloading artwork data from Artlogic")

AL_artworks = []
AL_artists = []
url = "http://feeds.artlogic.net/artworks/artlogiconline/json/"

while True:
    f = urllib2.urlopen(url)
    data = json.load(f)

    AL_artworks += data['rows']

    # logger.debug("retrieved page %s of %s of artwork data" % (data['feed_data']['page'], data['feed_data']['no_of_pages']))

    # Stop we are at the last page
    if data['feed_data']['page'] == data['feed_data']['no_of_pages']:
        break

    url = data['feed_data']['next_page_link']

# Now we have a list called ‘artworks’ in which all the descriptions are stored
# We are going to put them into the mongoDB database,
# Making sure that if the artwork is already encoded (an object with the same id
# already is in the database) we update the existing description instead of
# inserting a new one (‘upsert’).

# logger.debug("updating local mongodb database with %s entries" % len(artworks))

for artwork in AL_artworks:
    # Mongo does not like keys that have a dot in their name,
    # this property does not seem to be used anyway so let us
    # delete it:
    if 'artworks.description2' in artwork:
        del artwork['artworks.description2']
    # upsert int the database:
    db.AL_artworks.update({"id": artwork['id']}, artwork, upsert=True)


    # artwork['artist_id'] is not functioning properly
    db.AL_artists.update({"artist": artwork['artist']},
                      {"artist_sort": artwork['artist_sort'],
                       "artist":  artwork['artist'],
                       "slug": slugify(artwork['artist'])},
                      upsert=True)

# db.meta.update({"subject": "artworks"}, {"updated": datetime.now(pytz.utc), "subject": "artworks"}, upsert=True)
return AL_artworks

    if __name__ == "__main__":
        fetch_artworks()
0

1 Answer 1

0

First, you might like the requests library.

Otherwise, if you want to stick to the stdlib, it will be something in the lines of:

def fetchfile(url, dst):
    fi = urllib2.urlopen(url)
    fo = open(dst, 'wb')
    while True:
        chunk = fi.read(4096)
        if not chunk: break
        fo.write(chunk)


fetchfile(
    data['feed_data']['next_page_link'],
    os.path.join('/var/www/static', uuid.uuid1().get_hex()
)

With the correct exceptions catching (i can develop if you want, but i'm sure the documentation will be clear enough).

You could put the fetchfile() into a pool of async jobs to fetch many files at once.

Not the answer you're looking for? Browse other questions tagged or ask your own question.