SlideShare a Scribd company logo
Website Monitoring with Distributed
Messages/Tasks Processing (AMQP &
       RabbitMQ) on Django
About me?
●   Rahmat Ramadhan Irianto
●   Software Developer at Void-Labs & Defpy-Labs
●   is a Open Source Software Developer Team
●   A Student from Indonesian University STMIK
    Dipanegara 2010 Makassar
●   Lives in Indonesian, Makassar
●   Write Python Apps every day
What is Website-Monitoring ?




●   Website monitoring provides page change monitoring
    and notification services to internet users worldwide.
    Website monitoring will create a change log for the
    page and alert user by email when it detects a change
    in the page text.
What Useful For ?
●   Website monitoring can monitor almost any page on the internet and when it
    detect page changes then it will alert you by email.
●   Website Monitoring can be your good choice for business intelligence
    strategy. Track your competition and get timely alerts when a they changes
    their website. or You can Watch for developments at your customer's
    websites.
●   Monitor the press release page of companies you are invested in. Keep
    track of their current executives. Be alerted to changes on their home page.
●   Monitoring page privacy policies or terms and conditions without notice
    companies on the web , Now you can use website monitoring for alert you to
    these changes.
●   Monitor the new job listings pages at companies where you would like to
    work. When they post a new listing, we will email you.
●   Keep your up to date news. Monitor news page of your top site news. When
    they update it, you'll get an email alert.
                                                  Inspirate from changedetection
●   And much more                                 http://www.changedetection.com
What Power build Website-
       monitoring?




http://goo.gl/hCf34
Python !


                                     http://goo.gl/sSqHh


( Powerfull,Efficient,flexibility,ideal language,Effective for
      OOP,Elegant syntax,Rich of library & etc )
                     www.python.org
http://goo.gl/YXnA9




            Django !
( Django is a high-level Python Web framework that
encourages rapid development and clean, pragmatic
                    design & Etc)
        https://www.djangoproject.com/
Mongodb
  ( flexibility, powerfull, Fast,
        and ease of use )




http://www.mongodb.org

                                    http://goo.gl/NZQ18
RabbitMQ
  ( Powerfull,fast, reliable & high availability
 for message queuing system. open source
  queueing option & Greats for building and
      managing scalable applications)



http://www.rabbitmq.com
                                      http://goo.gl/Pvd9Q
Workflow Website-Monitoring
Ajax Post             Post Api



               request         If Post Api      Rest Api                     Save data


If ajax post                                                            Procces task
                                                                                       Scrape page

                          Message queue      Create worker     worker
  Myview
               Publish task

                                                             Save result

                                                                           If changepage

Save data
                                                                             Alert Email


                                                    Report Diff


                                                                                        Mongodb
Lets Talk About




          http://goo.gl/m8QUH
Why Mongodb ?
●   Greats features of document databases,key-
    value stores, and relational databases.
●   How greats ?
●     Fast
●     Smart
●     Scalable
●     Schema-less
●     Dynamic Query
●     Easy use & etc..
What we gonna Need ?


              +               = Pymongo
http://pypi.python.org/pypi/pymongo/
How to ?
import pymongo
from pymongo import Connection
collection_user = pymongo.Connection().website_monitor.user
collection_monitor = pymongo.Connection().website_monitor.monitor
collection_task = pymongo.Connection().website_monitor.task

INSERT
monitor = {'username':smart_str(request.user),
             'user_id':request.user.id,
             'url':url,
             'datetime':datetime.utcnow(),
             'status':status,
             'hit':0,
             'fail_hit':0,
             'period':int(request.POST.get('period')),
             'email':collection_user.find_one({'name':str(request.user)})['email'],
             'pk':pk,
             'last_checking':None,
             'task_id':task_id,
 }
collection_monitor.insert(monitor)
UPDATE
collection_user.update({'name':data_user['id']},{'$set':
{'email':data_user['email'],
                      'firstname':smart_str(data_user['first_name']),
                      'lastname':smart_str(data_user['last_name']),
                      'ip': request.META.get('REMOTE_ADDR','unknown'),
                      'login':datetime.now(),
                      'user_agent':
request.META.get('HTTP_USER_AGENT','unknown'),
                      'session':
request.META.get('XDG_SESSION_COOKIE','unknown'),
                      'session_fb':session_key,
                      'ts':datetime.now(),
                      'authkey':authkey,
                             }
                          }
                      )



 REMOVE
 if collection_content.find({'url':i['url']}).count() == 3:
     collection_content.remove({'url':i['url'][0]})
Why we must use Distributed
       Computing

       Distributed Computing
Is a method of solving computational
problem by dividing the problem into
  many tasks run simultaneously on
many hardware or software systems
             (Wikipedia)
What is Message queue ?
Message Queues are:
 0->Communication Buffers
 0->Between independent sender & receiver processes
 0->Asynchronous
  • Time of sending not necessarily same as receiving
  • In context of Web Applications:
     o Sender: Web Application Servers
     o Receiver: Background worker processes
     o Queue items: Tasks that the web server doesn’t
       have time/resources to do
How it work ?
Say a web application server has a task it
doesn’t have time to do
• It puts the task in the message queue
• Other web servers can access the same
queue(s)
and put tasks there
• Workers are greedy and they all watch the
queues for tasks
• Workers asynchronously pick up the first
available task on the queue when they are ready
What usefull for ?

• Message Queues are useful in certain
situations
• General guidelines:
  0->Does your web applications take more than
a few seconds to generate a response?
  o->Are you using a lot of cron jobs to process
data in the background?
  o->Do you wish you could distribute the
processing of the data generated by your
application among
many servers?
What We Need To Make Message
          Queue ?
AMQP & RabbitMQ
Why Choice AMQP & RabbitMQ ?
1.RabbitMQ is free to use
2.The documentation is decent
3.There is decent clustering support, even though we
never needed clustering
4.We didn’t want to lose queues or messages upon
broker crash/ restart
5. We develop applications using Python/django and
setting up an AMQP backend using carrot was
easy
Now Lets Talk about RabbitMQ
RabbitMQ ?

 RabbitMQ is Erlang-based open source
application that serves as a message broker or
message-oriented middleware.
 RabbitMQ implementation refers to the
application layer protocol that is the Advanced
Message Queuing Protocol(AMQP).
 AMQP provide an interoperable standard
protocol between the vendor to regulate the
exchange of messages on enterprise-scale
systems.
Why Use RabbitMQ ?
● We need For...
●  Running Task / Procces in the
  backround
●  Asynchronous tasking process
●  Scheduling system & Etc
So .. What make Rabbit Focus ?
Carrot !
           Carrot is an AMQP messaging
           queue framework. AMQP is the
           Advanced Message Queuing
           Protocol, an open standard
           protocol for message orientation,
           queuing, routing, reliability and
           security.

             Easy way to connect to
           RabbitMQ.
             Easy way to pull stuff out of the
           queue.
             Easy way to throw stuff into the
           queue.


 https://github.com/ask/carrot/
Concept ?
●   Publishers (Publishers sends messages to an exchange.)
●   Exchanges (Messages are sent to exchanges. Exchanges are named and can be
    configured to use one of several routing algorithms. The exchange routes the
    messages to consumers by matching the routing key in the message with the routing
    key the consumer provides when binding to the exchange.)
●   Consumers (Consumers declares a queue, binds it to a exchange and receives
    messages from it.)
●   Queues ( Queues receive messages sent to exchanges. The queues are declared by
    consumers. )
●   Routing keys ( Every message has a routing key. The interpretation of the routing
    key depends on the exchange type. There are four default exchange types defined by
    the AMQP standard, and vendors can define custom types (so see your vendors
    manual for details )
●   Exchange types defined by AMQP/0.8:
●     Direct exchange ( Matches if the routing key property of the message and the
    routing_key attribute of the consumer are identical. )
●     Fan-out exchange(Always matches, even if the binding does not have a routing
    key.)
●     Topic exchange (Matches the routing key property of the message by a primitive
    pattern matching scheme.)
Creating Connetion on Django

Settings.py
RABBITMQ_HOST = 'localhost'
RABBITMQ_PORT = 5672
RABBITMQ_USER = 'guest'
RABBITMQ_PASS = 'guest'
RABBITMQ_VHOST = '/'




Views.py
from carrot.messaging import Publisher, Consumer
from carrot.connection import AMQPConnection
from django.conf import settings

conn_for_carrot =
AMQPConnection(hostname=settings.RABBITMQ_HOST,
                  port=settings.RABBITMQ_PORT,
                  userid=settings.RABBITMQ_USER,
                  password=settings.RABBITMQ_PASS,
                  vhost=settings.RABBITMQ_VHOST)
Publisher
      publisher = Publisher(connection=conn_for_carrot,
exchange='website_monitoring_exchange', exchange_type = 'direct')
      publisher.send({'msg':{'do': 'check',
                 'task_id':task_id,
                 }
            })




        publisher = Publisher(connection=conn_for_carrot,
exchange='website_monitoring_exchange', exchange_type = 'direct')
        publisher.send({'msg':{'do': 'check',
                  'task_id':hashlib.md5(str(task_id)
+request.PUT.get('url')).hexdigest(),
                  }
            })
Consumer
def monitoring_check():
   def call(message_data,message):
      if message_data['msg']['do'] == 'check':
         print '[+] receiving message'
         message.ack()
         task_id = message_data['msg']['task_id']
         get_pid = subprocess.Popen(['python','scraper.py', task_id])
         pid = get_pid.pid
         collection_task.update({'task_id':task_id}, {'$set': {'status':'RUNNING',
'pid':pid}})
         print '[Starting PID:%s]'%pid
         get_pid.wait()
      else:
         message.ack()

  queuename = 'website_monitoring_checker'
  consumer = Consumer(connection=conn_for_carrot, queue=queuename,
exchange='website_monitoring_exchange', exchange_type = 'direct')
  consumer.register_callback(call)
  try:
     print '[queue:%s]consume..' % queuename
     consumer.wait()
  except Exception, err:
     print err
Cooking soup with beautifullsoup?

from BeautifulSoup import BeautifulSoup
monitor = collection_monitor.find_one({'pk':pk})

contents = [collection_content.find({'url':str(monitor['url'])})
[1],collection_content.find({'url':str(monitor['url'])})[0]]

 texts = BeautifulSoup(BeautifulSoup(i['content']).prettify()).findAll(text=True)
        data = {'content': ' '.join(filter(visible, texts)),
             'datetime': i['datetime'],
        }



def visible(element):
   if element.parent.name in ['style', 'script', '[document]', 'head', 'title']:
       return False
   if re.search('', str(element)) or
re.search(' ', str(element)):
       return False
   return True
Alert by email !

def sending_email(to,sub,msg):
  try:
     gmail_user = 'romanticdevil.jimmy@gmail.com'
     gmail_pwd = '***************'
     smtpserver = smtplib.SMTP("smtp.gmail.com",587)
     smtpserver.ehlo()
     smtpserver.starttls()
     smtpserver.ehlo
     smtpserver.login(gmail_user, gmail_pwd)
     header = 'To:' + to + 'n' + 'From: Website-Monitoring <'+gmail_user+'>n' +
'Subject: %sn'%sub
     msg = header + msg
     smtpserver.sendmail(gmail_user,to, msg)
     smtpserver.close()
  except Exception ,err :
     print err
Task / Scheduling Checking ?
task_id = sys.argv[1]
print task_id
raw_delay = collection_task.find_one({'task_id':task_id})['schedule']
print raw_delay
if raw_delay == "1":
   delay = 60*60
elif raw_delay =="12":
   delay = 720*60
else:
   delay = 1440*60
while True:
    try:
       print '[+] Starting task: %s' %sys.argv[1]
       log(task_id, 'INFO', 'starting session')
       main()
    except Exception, err:
       log(task_id, 'exception', err)
       print err
       collection_task.update({'task_id':task_id}, {'$set': {'status':'STOPPED', 'pid':None}})
       log(task_id, 'INFO', 'updating database [status:STOPPED]')
    else:
       collection_task.update({'task_id':task_id}, {'$set': {'status':'SLEEP', 'pid':None}})
       log(task_id, 'INFO', 'updating database [status:SLEEP] for %s sec' %delay)
       time.sleep(delay)
Django-Piston
    ( A mini-framework for Django but powerfull for creating RESTful APIs )
               https://bitbucket.org/jespern/django-piston/wiki/Home



●    Ties into Django's internal mechanisms.
●    Supports OAuth out of the box (as well as Basic/Digest or custom auth.)
●    Doesn't require tying to models, allowing arbitrary resources.
●    Speaks JSON, YAML, Python Pickle & XML (and HATEOAS.)
●    Ships with a convenient reusable library in Python
●    Respects and encourages proper use of HTTP (status codes, ...)
●    Has built in (optional) form validation (via Django), throttling, etc.
●    Supports streaming, with a small memory footprint.
●    Stays out of your way.
How to ?
Include on urls.py
url(r'^api/', include('api.urls')),

Include on settings.py

INSTALLED_APPS = (
  ….......
  'api',

Create folder name /api/ on project
directory and file.
-API/
-----handlers.py
-----__init__.py
-----urls.py
Rest API'S urls.py

from django.conf.urls.defaults import *
from piston.resource import Resource
from piston.authentication import HttpBasicAuthentication
from api.handlers import *

auth = HttpBasicAuthentication(realm="website-monitoring")
ad = { 'authentication': auth }

main = Resource(handler=Main, **ad)
monitor = Resource(handler=Monitor, **ad)

urlpatterns = patterns('',
  url(r'^(?P<obj_id>[^/]+)/$', main),
  url(r'^monitor/(?P<obj_id>[^/]+)/$', monitor),
)
Rest API'S handlers.py
from piston.handler import BaseHandler
class Main(BaseHandler):
   allowed_methods = ('GET')
   def read(self, request, obj_id):
      data = collection_user.find_one({'pk': obj_id})
      if data:
         return data
      data = collection_monitor.find_one({'pk': obj_id})
      if data:
         return data
class Monitor(BaseHandler):
   allowed_methods = ('GET', 'PUT', 'DELETE')
   fields = ('url', 'status', 'hit', 'fail_hit', 'year', 'month', 'day', 'hour', 'email', 'period', 'diff')
   def read(self, request, obj_id):
      try:
         if obj_id == 'all':
            data = list(collection_monitor.find({'username': str(request.user)}))
         elif obj_id =="status_running":
            data = list(collection_monitor.find({'status':'running'}))
            ….........
      except Exception, err:
         return rc.BAD_REQUEST
      return data

  def update(self, request, obj_id):
    try:
       if obj_id == 'create':
          url_list = []
          for i in collection_monitor.find({'username': str(request.user)}):
              url_list.append(i['url'])
          if request.PUT.get('url') in url_list:
              print '[+] Url is exist '
              print '[+] Data will be Update '

       else:
         raise Exception
     except Exception, err:
       print err
       return rc.BAD_REQUEST
      …......................
def delete(self, request, obj_id):
     try:
        if obj_id == 'all':
           for i in collection_monitor.find({'username': str(request.user)}):
              collection_monitor.remove({'username': str(request.user)})
        else:
           if collection_monitor.find_one({'pk': obj_id}):
              collection_monitor.remove({'pk': obj_id})

    except Exception, err:
      print err
      return rc.FORBIDDEN
    else:
      print 'deleted'
      return rc.DELETED
Facebook Integration ?
●   Just for lazy people
●   You don't have to fill the register form just login
    in to your facebook then klick – klick & klick .
●   Good for bussiness marketing
●   Easy integrate & Etc
●   Download :
●    git clone
    http://github.com/dickeytk/django_facebook_oauth.git
Question ?
●   Twitter :@jimmyromanticde
●   Facebook:https://www.facebook.com/jimmy.ro
    mantic.devil
●   Email : romanticdevil.jimmy@gmail.com
●   Bitbucket:
    https://bitbucket.org/jimmyromanticdevil/
●   Blog : http://jimmyromanticdevil.wordpress.com
References
               http://www.python.org
          https://www.djangoproject.com
              http://www.mongodb.org
             http://www.rabbitmq.com
        http://pypi.python.org/pypi/pymongo

           https://github.com/ask/carrot/

https://bitbucket.org/jespern/django-piston/wiki/Home

http://github.com/dickeytk/django_facebook_oauth.git

         Life in a Queue “Tareque Hossain”
             Google “Message Queue”
Thank You ! :)

More Related Content

Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

  • 1. Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django
  • 2. About me? ● Rahmat Ramadhan Irianto ● Software Developer at Void-Labs & Defpy-Labs ● is a Open Source Software Developer Team ● A Student from Indonesian University STMIK Dipanegara 2010 Makassar ● Lives in Indonesian, Makassar ● Write Python Apps every day
  • 3. What is Website-Monitoring ? ● Website monitoring provides page change monitoring and notification services to internet users worldwide. Website monitoring will create a change log for the page and alert user by email when it detects a change in the page text.
  • 4. What Useful For ? ● Website monitoring can monitor almost any page on the internet and when it detect page changes then it will alert you by email. ● Website Monitoring can be your good choice for business intelligence strategy. Track your competition and get timely alerts when a they changes their website. or You can Watch for developments at your customer's websites. ● Monitor the press release page of companies you are invested in. Keep track of their current executives. Be alerted to changes on their home page. ● Monitoring page privacy policies or terms and conditions without notice companies on the web , Now you can use website monitoring for alert you to these changes. ● Monitor the new job listings pages at companies where you would like to work. When they post a new listing, we will email you. ● Keep your up to date news. Monitor news page of your top site news. When they update it, you'll get an email alert. Inspirate from changedetection ● And much more http://www.changedetection.com
  • 5. What Power build Website- monitoring? http://goo.gl/hCf34
  • 6. Python ! http://goo.gl/sSqHh ( Powerfull,Efficient,flexibility,ideal language,Effective for OOP,Elegant syntax,Rich of library & etc ) www.python.org
  • 7. http://goo.gl/YXnA9 Django ! ( Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design & Etc) https://www.djangoproject.com/
  • 8. Mongodb ( flexibility, powerfull, Fast, and ease of use ) http://www.mongodb.org http://goo.gl/NZQ18
  • 9. RabbitMQ ( Powerfull,fast, reliable & high availability for message queuing system. open source queueing option & Greats for building and managing scalable applications) http://www.rabbitmq.com http://goo.gl/Pvd9Q
  • 11. Ajax Post Post Api request If Post Api Rest Api Save data If ajax post Procces task Scrape page Message queue Create worker worker Myview Publish task Save result If changepage Save data Alert Email Report Diff Mongodb
  • 12. Lets Talk About http://goo.gl/m8QUH
  • 13. Why Mongodb ? ● Greats features of document databases,key- value stores, and relational databases. ● How greats ? ● Fast ● Smart ● Scalable ● Schema-less ● Dynamic Query ● Easy use & etc..
  • 14. What we gonna Need ? + = Pymongo http://pypi.python.org/pypi/pymongo/
  • 15. How to ? import pymongo from pymongo import Connection collection_user = pymongo.Connection().website_monitor.user collection_monitor = pymongo.Connection().website_monitor.monitor collection_task = pymongo.Connection().website_monitor.task INSERT monitor = {'username':smart_str(request.user), 'user_id':request.user.id, 'url':url, 'datetime':datetime.utcnow(), 'status':status, 'hit':0, 'fail_hit':0, 'period':int(request.POST.get('period')), 'email':collection_user.find_one({'name':str(request.user)})['email'], 'pk':pk, 'last_checking':None, 'task_id':task_id, } collection_monitor.insert(monitor)
  • 16. UPDATE collection_user.update({'name':data_user['id']},{'$set': {'email':data_user['email'], 'firstname':smart_str(data_user['first_name']), 'lastname':smart_str(data_user['last_name']), 'ip': request.META.get('REMOTE_ADDR','unknown'), 'login':datetime.now(), 'user_agent': request.META.get('HTTP_USER_AGENT','unknown'), 'session': request.META.get('XDG_SESSION_COOKIE','unknown'), 'session_fb':session_key, 'ts':datetime.now(), 'authkey':authkey, } } ) REMOVE if collection_content.find({'url':i['url']}).count() == 3: collection_content.remove({'url':i['url'][0]})
  • 17. Why we must use Distributed Computing Distributed Computing Is a method of solving computational problem by dividing the problem into many tasks run simultaneously on many hardware or software systems (Wikipedia)
  • 18. What is Message queue ? Message Queues are: 0->Communication Buffers 0->Between independent sender & receiver processes 0->Asynchronous • Time of sending not necessarily same as receiving • In context of Web Applications: o Sender: Web Application Servers o Receiver: Background worker processes o Queue items: Tasks that the web server doesn’t have time/resources to do
  • 19. How it work ? Say a web application server has a task it doesn’t have time to do • It puts the task in the message queue • Other web servers can access the same queue(s) and put tasks there • Workers are greedy and they all watch the queues for tasks • Workers asynchronously pick up the first available task on the queue when they are ready
  • 20. What usefull for ? • Message Queues are useful in certain situations • General guidelines: 0->Does your web applications take more than a few seconds to generate a response? o->Are you using a lot of cron jobs to process data in the background? o->Do you wish you could distribute the processing of the data generated by your application among many servers?
  • 21. What We Need To Make Message Queue ?
  • 23. Why Choice AMQP & RabbitMQ ? 1.RabbitMQ is free to use 2.The documentation is decent 3.There is decent clustering support, even though we never needed clustering 4.We didn’t want to lose queues or messages upon broker crash/ restart 5. We develop applications using Python/django and setting up an AMQP backend using carrot was easy
  • 24. Now Lets Talk about RabbitMQ
  • 25. RabbitMQ ? RabbitMQ is Erlang-based open source application that serves as a message broker or message-oriented middleware. RabbitMQ implementation refers to the application layer protocol that is the Advanced Message Queuing Protocol(AMQP). AMQP provide an interoperable standard protocol between the vendor to regulate the exchange of messages on enterprise-scale systems.
  • 26. Why Use RabbitMQ ? ● We need For... ● Running Task / Procces in the backround ● Asynchronous tasking process ● Scheduling system & Etc
  • 27. So .. What make Rabbit Focus ?
  • 28. Carrot ! Carrot is an AMQP messaging queue framework. AMQP is the Advanced Message Queuing Protocol, an open standard protocol for message orientation, queuing, routing, reliability and security. Easy way to connect to RabbitMQ. Easy way to pull stuff out of the queue. Easy way to throw stuff into the queue. https://github.com/ask/carrot/
  • 29. Concept ? ● Publishers (Publishers sends messages to an exchange.) ● Exchanges (Messages are sent to exchanges. Exchanges are named and can be configured to use one of several routing algorithms. The exchange routes the messages to consumers by matching the routing key in the message with the routing key the consumer provides when binding to the exchange.) ● Consumers (Consumers declares a queue, binds it to a exchange and receives messages from it.) ● Queues ( Queues receive messages sent to exchanges. The queues are declared by consumers. ) ● Routing keys ( Every message has a routing key. The interpretation of the routing key depends on the exchange type. There are four default exchange types defined by the AMQP standard, and vendors can define custom types (so see your vendors manual for details ) ● Exchange types defined by AMQP/0.8: ● Direct exchange ( Matches if the routing key property of the message and the routing_key attribute of the consumer are identical. ) ● Fan-out exchange(Always matches, even if the binding does not have a routing key.) ● Topic exchange (Matches the routing key property of the message by a primitive pattern matching scheme.)
  • 30. Creating Connetion on Django Settings.py RABBITMQ_HOST = 'localhost' RABBITMQ_PORT = 5672 RABBITMQ_USER = 'guest' RABBITMQ_PASS = 'guest' RABBITMQ_VHOST = '/' Views.py from carrot.messaging import Publisher, Consumer from carrot.connection import AMQPConnection from django.conf import settings conn_for_carrot = AMQPConnection(hostname=settings.RABBITMQ_HOST, port=settings.RABBITMQ_PORT, userid=settings.RABBITMQ_USER, password=settings.RABBITMQ_PASS, vhost=settings.RABBITMQ_VHOST)
  • 31. Publisher publisher = Publisher(connection=conn_for_carrot, exchange='website_monitoring_exchange', exchange_type = 'direct') publisher.send({'msg':{'do': 'check', 'task_id':task_id, } }) publisher = Publisher(connection=conn_for_carrot, exchange='website_monitoring_exchange', exchange_type = 'direct') publisher.send({'msg':{'do': 'check', 'task_id':hashlib.md5(str(task_id) +request.PUT.get('url')).hexdigest(), } })
  • 32. Consumer def monitoring_check(): def call(message_data,message): if message_data['msg']['do'] == 'check': print '[+] receiving message' message.ack() task_id = message_data['msg']['task_id'] get_pid = subprocess.Popen(['python','scraper.py', task_id]) pid = get_pid.pid collection_task.update({'task_id':task_id}, {'$set': {'status':'RUNNING', 'pid':pid}}) print '[Starting PID:%s]'%pid get_pid.wait() else: message.ack() queuename = 'website_monitoring_checker' consumer = Consumer(connection=conn_for_carrot, queue=queuename, exchange='website_monitoring_exchange', exchange_type = 'direct') consumer.register_callback(call) try: print '[queue:%s]consume..' % queuename consumer.wait() except Exception, err: print err
  • 33. Cooking soup with beautifullsoup? from BeautifulSoup import BeautifulSoup monitor = collection_monitor.find_one({'pk':pk}) contents = [collection_content.find({'url':str(monitor['url'])}) [1],collection_content.find({'url':str(monitor['url'])})[0]] texts = BeautifulSoup(BeautifulSoup(i['content']).prettify()).findAll(text=True) data = {'content': ' '.join(filter(visible, texts)), 'datetime': i['datetime'], } def visible(element): if element.parent.name in ['style', 'script', '[document]', 'head', 'title']: return False if re.search('<!--', str(element)) or re.search('-->', str(element)) or re.search('&nbsp;', str(element)): return False return True
  • 34. Alert by email ! def sending_email(to,sub,msg): try: gmail_user = 'romanticdevil.jimmy@gmail.com' gmail_pwd = '***************' smtpserver = smtplib.SMTP("smtp.gmail.com",587) smtpserver.ehlo() smtpserver.starttls() smtpserver.ehlo smtpserver.login(gmail_user, gmail_pwd) header = 'To:' + to + 'n' + 'From: Website-Monitoring <'+gmail_user+'>n' + 'Subject: %sn'%sub msg = header + msg smtpserver.sendmail(gmail_user,to, msg) smtpserver.close() except Exception ,err : print err
  • 35. Task / Scheduling Checking ? task_id = sys.argv[1] print task_id raw_delay = collection_task.find_one({'task_id':task_id})['schedule'] print raw_delay if raw_delay == "1": delay = 60*60 elif raw_delay =="12": delay = 720*60 else: delay = 1440*60 while True: try: print '[+] Starting task: %s' %sys.argv[1] log(task_id, 'INFO', 'starting session') main() except Exception, err: log(task_id, 'exception', err) print err collection_task.update({'task_id':task_id}, {'$set': {'status':'STOPPED', 'pid':None}}) log(task_id, 'INFO', 'updating database [status:STOPPED]') else: collection_task.update({'task_id':task_id}, {'$set': {'status':'SLEEP', 'pid':None}}) log(task_id, 'INFO', 'updating database [status:SLEEP] for %s sec' %delay) time.sleep(delay)
  • 36. Django-Piston ( A mini-framework for Django but powerfull for creating RESTful APIs ) https://bitbucket.org/jespern/django-piston/wiki/Home ● Ties into Django's internal mechanisms. ● Supports OAuth out of the box (as well as Basic/Digest or custom auth.) ● Doesn't require tying to models, allowing arbitrary resources. ● Speaks JSON, YAML, Python Pickle & XML (and HATEOAS.) ● Ships with a convenient reusable library in Python ● Respects and encourages proper use of HTTP (status codes, ...) ● Has built in (optional) form validation (via Django), throttling, etc. ● Supports streaming, with a small memory footprint. ● Stays out of your way.
  • 37. How to ? Include on urls.py url(r'^api/', include('api.urls')), Include on settings.py INSTALLED_APPS = ( …....... 'api', Create folder name /api/ on project directory and file. -API/ -----handlers.py -----__init__.py -----urls.py
  • 38. Rest API'S urls.py from django.conf.urls.defaults import * from piston.resource import Resource from piston.authentication import HttpBasicAuthentication from api.handlers import * auth = HttpBasicAuthentication(realm="website-monitoring") ad = { 'authentication': auth } main = Resource(handler=Main, **ad) monitor = Resource(handler=Monitor, **ad) urlpatterns = patterns('', url(r'^(?P<obj_id>[^/]+)/$', main), url(r'^monitor/(?P<obj_id>[^/]+)/$', monitor), )
  • 39. Rest API'S handlers.py from piston.handler import BaseHandler class Main(BaseHandler): allowed_methods = ('GET') def read(self, request, obj_id): data = collection_user.find_one({'pk': obj_id}) if data: return data data = collection_monitor.find_one({'pk': obj_id}) if data: return data
  • 40. class Monitor(BaseHandler): allowed_methods = ('GET', 'PUT', 'DELETE') fields = ('url', 'status', 'hit', 'fail_hit', 'year', 'month', 'day', 'hour', 'email', 'period', 'diff') def read(self, request, obj_id): try: if obj_id == 'all': data = list(collection_monitor.find({'username': str(request.user)})) elif obj_id =="status_running": data = list(collection_monitor.find({'status':'running'})) …......... except Exception, err: return rc.BAD_REQUEST return data def update(self, request, obj_id): try: if obj_id == 'create': url_list = [] for i in collection_monitor.find({'username': str(request.user)}): url_list.append(i['url']) if request.PUT.get('url') in url_list: print '[+] Url is exist ' print '[+] Data will be Update ' else: raise Exception except Exception, err: print err return rc.BAD_REQUEST …......................
  • 41. def delete(self, request, obj_id): try: if obj_id == 'all': for i in collection_monitor.find({'username': str(request.user)}): collection_monitor.remove({'username': str(request.user)}) else: if collection_monitor.find_one({'pk': obj_id}): collection_monitor.remove({'pk': obj_id}) except Exception, err: print err return rc.FORBIDDEN else: print 'deleted' return rc.DELETED
  • 42. Facebook Integration ? ● Just for lazy people ● You don't have to fill the register form just login in to your facebook then klick – klick & klick . ● Good for bussiness marketing ● Easy integrate & Etc ● Download : ● git clone http://github.com/dickeytk/django_facebook_oauth.git
  • 43. Question ? ● Twitter :@jimmyromanticde ● Facebook:https://www.facebook.com/jimmy.ro mantic.devil ● Email : romanticdevil.jimmy@gmail.com ● Bitbucket: https://bitbucket.org/jimmyromanticdevil/ ● Blog : http://jimmyromanticdevil.wordpress.com
  • 44. References http://www.python.org https://www.djangoproject.com http://www.mongodb.org http://www.rabbitmq.com http://pypi.python.org/pypi/pymongo https://github.com/ask/carrot/ https://bitbucket.org/jespern/django-piston/wiki/Home http://github.com/dickeytk/django_facebook_oauth.git Life in a Queue “Tareque Hossain” Google “Message Queue”