Python Threading - How do i use it to run multiple tasks at time?

Question

Im new to Python, and im having a ruge help from stackoverflow comunity in order to migrate my shellscript to python. But again im struggling on how i can implement threading since this script runs over a x results, would be faster to put it to run with, for example, the scripts return 120 servers to run, i would like to run 5 at time and have a queue.

The method that i want t run on thread is after the condition bellow: ( i marked with comments )

if checkServer.checkit(host,port):

Bellow, is the extract_adapter.py file content:

import psycopg2
import urllib2
import base64
import sys
import re
import lxml.html as LH
import checkServer

def extractAdapter(env,family,iserver,login,password,prefix,proxyUser,proxyPass,proxyHost,service):

    print "Starting on \t"+iserver

    proxy_auth = "http://"+proxyUser+":"+proxyPass+"@"+proxyHost
    proxy_handler = urllib2.ProxyHandler({"http": proxy_auth})

    opener = urllib2.build_opener(proxy_handler)
    urllib2.install_opener(opener)
    request = urllib2.Request("http://"+iserver+"/invoke/listRegisteredAdapters")
    base64string = base64.encodestring('%s:%s' % (login, password)).replace('\n', '')
    request.add_header("Authorization", "Basic %s" % base64string)
    response = urllib2.urlopen(request)
    html = response.read()

    doc = LH.fromstring(html)
    tds = (td.text_content() for td in doc.xpath("//td[not(*)]"))

    for adapterType, adapterDescription in zip(*[tds]*2):

        proxy_auth = "http://"+proxyUser+":"+proxyPass+"@"+proxyHost
        proxy_handler = urllib2.ProxyHandler({"http": proxy_auth})
        opener = urllib2.build_opener(proxy_handler)
        opener = urllib2.build_opener()
        urllib2.install_opener(opener)
        request = urllib2.Request("http://"+iserver+service+""+adapterType)
        base64string = base64.encodestring('%s:%s' % (login, password)).replace('\n', '')
        request.add_header("Authorization", "Basic %s" % base64string)
        response = urllib2.urlopen(request)
        html2 = response.read()

        doc = LH.fromstring(html2)
        tds = (td.text_content() for td in doc.xpath("//td[not(*)]"))

        for connectionAlias,packageName,connectionFactoryType,mcfDisplayName,connectionState,hasError in zip(*[tds]*6):

            cur.execute("INSERT INTO wip.info_adapter (env,family,iserver,prefix,package,adapter_type,connection_name,status) values (%s,%s,%s,%s,%s,%s,%s,%s)",
            (env,family,iserver,prefix,packageName,adapterType,connectionAlias,connectionState))
            con.commit()

################################################################################

def extract(env):
    global cur,con
    con = None
    try:

        con = psycopg2.connect(database='xx', user='xx',password='xxx',host='localhost')
        cur = con.cursor()
        qry=" random non important query"

        cur.execute(qry)
        data = cur.fetchall()

        for result in data:

            family   = result[0]
            prefix   = result[1]
            iserver  = result[2]
            version  = result[3]
            login    = result[4]
            password = result[5]
            service  = result[6]
            proxyHost = result[7]
            proxyUser = result[8]
            proxyPass = result[9]

            parts=iserver.split(":")
            host=parts[0]
            port=parts[1]

            if checkServer.checkit(host,port):
            ##SUPOSE TO AS START THREAD 

                if version == '7' or version == '8':

                    extractAdapter(env,family,iserver,login,password,prefix,proxyUser,proxyPass,proxyHost,service)

                elif version == '60' or version == '61':
                    print "Version 6.0 and 6.1 not supported yet"
            else:
                print iserver+"is offline"
            #TO END  THREAD

    except psycopg2.DatabaseError, e:
        print 'Error %s' % e
        sys.exit(1)

    finally:

        if con:
            con.close()

And this is the way i call the method extract on runme.py

import extract_adapter_thread
from datetime import datetime

startTime = datetime.now()
print"------------------------------"
extract_adapter_thread.extract('TEST')
print"------------------------------"
print(datetime.now()-startTime)

By the way, the code is working just fine. no errors.

Is checkServer.checkit a fast or slow operation, and is it re-entrant? Also, you'd be calling cur.execute from multiple threads simultaneously, is that ok? — Useless, Commented Aug 21, 2012 at 17:54
Hi, its fast. aprox. 1 sec for each server. Yes, i would be calling the cur.exe for each . — thclpr, Commented Aug 22, 2012 at 9:15
I'm asking if calling execute from multiple threads simultaneously is safe. The cursor object would currently be shared by all threads, and I don't know how psycopg2 handles that. — Useless, Commented Aug 22, 2012 at 12:43
@weefwefwqg3 It's 2017 now and i'm really ashamed of that thing that I wrote. Really really ashamed. — thclpr, Commented Nov 1, 2017 at 22:31

Community · Accepted Answer · 2017-05-23 12:20:08Z

1

Threading will block very heavily within Python on non-IO bound problems because of the Global Interpreter Lock. Thus you're probably better off doing multiprocessing -- which comes with a Queue class (see this SO Link for an example of using a mp queue).

This should let you work with many separate processes simultaneously (like batching your 5 jobs at a time out of 120). Note that the overhead of a process is higher than that of a thread, so for small tasks you'll pay a price for using multiprocessing over threading. Your tasks sounds large enough to warrant such costs though.

edited May 23, 2017 at 12:20

CommunityBot

11 silver badge

answered Aug 21, 2012 at 17:56

Pyrce

8,4863 gold badges32 silver badges47 bronze badges

yep, I suggest using multiprocessing, a nice and short answer is here: stackoverflow.com/questions/21045179/…
– weefwefwqg3
Commented Oct 31, 2017 at 14:53

Add a comment |

IT Ninja · Accepted Answer · 2012-08-21 17:46:16Z

0

If everything is threadsafe, you could use the threading module:

import threading
starttime=datetime.now()
print "-"*10
code=threading.thread(target=extract_adapter_thread.extract,args=['TEST'])
code.daemon=True
code.start()
print "-"*10
print(datetime.now()-starttime)

answered Aug 21, 2012 at 17:46

IT Ninja

6,34011 gold badges44 silver badges65 bronze badges

Its possible to apply the solution to the call extractAdapter(env,family,iserver,login,password,prefix,proxyUser,proxyPass,proxyHost,service) ? since this method runs for each server
– thclpr
Commented Aug 22, 2012 at 9:18

Add a comment |

Savir · Accepted Answer · 2012-08-21 18:09:52Z

I really don't know if this will help much, but is an snippet of code that I had in my HD and well... here it goes. Is a basic thing to see the difference of pinging some ips in parallel or sequentially (requires linux, though). It's very simple and is not a direct answer to your specific problem but... since you said that you are new to Python, it may give you some ideas.

#!/usr/bin/env python

import datetime
import subprocess
import threading

ipsToPing = [
    "google.com",
    "stackoverflow.com",
    "yahoo.com",
    "terra.es", 
]

def nonThreadedPinger():
    start = datetime.datetime.now()
    for ipToPing in ipsToPing:
        print "Not-threaded ping to %s" % ipToPing
        subprocess.call(["/bin/ping", "-c", "3", "-W", "1.0", ipToPing], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    end = datetime.datetime.now()
    print ("Non threaded ping of %s ips took: %s." % (len(ipsToPing), end-start))

def _threadedPingerAux(ipToPing):
    print "Threaded ping to %s" % ipToPing
    subprocess.call(["/bin/ping", "-c", "3", "-W", "1.0", ipToPing], stdout=subprocess.PIPE, stderr=subprocess.PIPE)

def threadedPinger():
    retval = dict.fromkeys(ipsToPing, -1)
    threads = list()
    start = datetime.datetime.now()
    for ipToPing in ipsToPing:
        thread = threading.Thread(target=_threadedPingerAux, args=[ipToPing])
        thread.start()
        threads.append(thread)
    for thread in threads:
        thread.join()
    end = datetime.datetime.now()
    print ("Treaded ping of %s ips took: %s" % (len(ipsToPing), end-start))


if __name__ == "__main__":
    threadedPinger()
    nonThreadedPinger()

The script will run under windows env. only but thanks for the sugestion =) — thclpr, Commented Aug 21, 2012 at 18:15

Collectives™ on Stack Overflow

Python Threading - How do i use it to run multiple tasks at time?

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
python
python-multithreading
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Not the answer you're looking for? Browse other questions tagged pythonpython-multithreading or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
python-multithreading
or ask your own question.