Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django
- 2. About me?
● Rahmat Ramadhan Irianto
● Software Developer at Void-Labs & Defpy-Labs
● is a Open Source Software Developer Team
● A Student from Indonesian University STMIK
Dipanegara 2010 Makassar
● Lives in Indonesian, Makassar
● Write Python Apps every day
- 3. What is Website-Monitoring ?
● Website monitoring provides page change monitoring
and notification services to internet users worldwide.
Website monitoring will create a change log for the
page and alert user by email when it detects a change
in the page text.
- 4. What Useful For ?
● Website monitoring can monitor almost any page on the internet and when it
detect page changes then it will alert you by email.
● Website Monitoring can be your good choice for business intelligence
strategy. Track your competition and get timely alerts when a they changes
their website. or You can Watch for developments at your customer's
websites.
● Monitor the press release page of companies you are invested in. Keep
track of their current executives. Be alerted to changes on their home page.
● Monitoring page privacy policies or terms and conditions without notice
companies on the web , Now you can use website monitoring for alert you to
these changes.
● Monitor the new job listings pages at companies where you would like to
work. When they post a new listing, we will email you.
● Keep your up to date news. Monitor news page of your top site news. When
they update it, you'll get an email alert.
Inspirate from changedetection
● And much more http://www.changedetection.com
- 6. Python !
http://goo.gl/sSqHh
( Powerfull,Efficient,flexibility,ideal language,Effective for
OOP,Elegant syntax,Rich of library & etc )
www.python.org
- 7. http://goo.gl/YXnA9
Django !
( Django is a high-level Python Web framework that
encourages rapid development and clean, pragmatic
design & Etc)
https://www.djangoproject.com/
- 8. Mongodb
( flexibility, powerfull, Fast,
and ease of use )
http://www.mongodb.org
http://goo.gl/NZQ18
- 9. RabbitMQ
( Powerfull,fast, reliable & high availability
for message queuing system. open source
queueing option & Greats for building and
managing scalable applications)
http://www.rabbitmq.com
http://goo.gl/Pvd9Q
- 11. Ajax Post Post Api
request If Post Api Rest Api Save data
If ajax post Procces task
Scrape page
Message queue Create worker worker
Myview
Publish task
Save result
If changepage
Save data
Alert Email
Report Diff
Mongodb
- 13. Why Mongodb ?
● Greats features of document databases,key-
value stores, and relational databases.
● How greats ?
● Fast
● Smart
● Scalable
● Schema-less
● Dynamic Query
● Easy use & etc..
- 14. What we gonna Need ?
+ = Pymongo
http://pypi.python.org/pypi/pymongo/
- 15. How to ?
import pymongo
from pymongo import Connection
collection_user = pymongo.Connection().website_monitor.user
collection_monitor = pymongo.Connection().website_monitor.monitor
collection_task = pymongo.Connection().website_monitor.task
INSERT
monitor = {'username':smart_str(request.user),
'user_id':request.user.id,
'url':url,
'datetime':datetime.utcnow(),
'status':status,
'hit':0,
'fail_hit':0,
'period':int(request.POST.get('period')),
'email':collection_user.find_one({'name':str(request.user)})['email'],
'pk':pk,
'last_checking':None,
'task_id':task_id,
}
collection_monitor.insert(monitor)
- 16. UPDATE
collection_user.update({'name':data_user['id']},{'$set':
{'email':data_user['email'],
'firstname':smart_str(data_user['first_name']),
'lastname':smart_str(data_user['last_name']),
'ip': request.META.get('REMOTE_ADDR','unknown'),
'login':datetime.now(),
'user_agent':
request.META.get('HTTP_USER_AGENT','unknown'),
'session':
request.META.get('XDG_SESSION_COOKIE','unknown'),
'session_fb':session_key,
'ts':datetime.now(),
'authkey':authkey,
}
}
)
REMOVE
if collection_content.find({'url':i['url']}).count() == 3:
collection_content.remove({'url':i['url'][0]})
- 17. Why we must use Distributed
Computing
Distributed Computing
Is a method of solving computational
problem by dividing the problem into
many tasks run simultaneously on
many hardware or software systems
(Wikipedia)
- 18. What is Message queue ?
Message Queues are:
0->Communication Buffers
0->Between independent sender & receiver processes
0->Asynchronous
• Time of sending not necessarily same as receiving
• In context of Web Applications:
o Sender: Web Application Servers
o Receiver: Background worker processes
o Queue items: Tasks that the web server doesn’t
have time/resources to do
- 19. How it work ?
Say a web application server has a task it
doesn’t have time to do
• It puts the task in the message queue
• Other web servers can access the same
queue(s)
and put tasks there
• Workers are greedy and they all watch the
queues for tasks
• Workers asynchronously pick up the first
available task on the queue when they are ready
- 20. What usefull for ?
• Message Queues are useful in certain
situations
• General guidelines:
0->Does your web applications take more than
a few seconds to generate a response?
o->Are you using a lot of cron jobs to process
data in the background?
o->Do you wish you could distribute the
processing of the data generated by your
application among
many servers?
- 23. Why Choice AMQP & RabbitMQ ?
1.RabbitMQ is free to use
2.The documentation is decent
3.There is decent clustering support, even though we
never needed clustering
4.We didn’t want to lose queues or messages upon
broker crash/ restart
5. We develop applications using Python/django and
setting up an AMQP backend using carrot was
easy
- 25. RabbitMQ ?
RabbitMQ is Erlang-based open source
application that serves as a message broker or
message-oriented middleware.
RabbitMQ implementation refers to the
application layer protocol that is the Advanced
Message Queuing Protocol(AMQP).
AMQP provide an interoperable standard
protocol between the vendor to regulate the
exchange of messages on enterprise-scale
systems.
- 26. Why Use RabbitMQ ?
● We need For...
● Running Task / Procces in the
backround
● Asynchronous tasking process
● Scheduling system & Etc
- 28. Carrot !
Carrot is an AMQP messaging
queue framework. AMQP is the
Advanced Message Queuing
Protocol, an open standard
protocol for message orientation,
queuing, routing, reliability and
security.
Easy way to connect to
RabbitMQ.
Easy way to pull stuff out of the
queue.
Easy way to throw stuff into the
queue.
https://github.com/ask/carrot/
- 29. Concept ?
● Publishers (Publishers sends messages to an exchange.)
● Exchanges (Messages are sent to exchanges. Exchanges are named and can be
configured to use one of several routing algorithms. The exchange routes the
messages to consumers by matching the routing key in the message with the routing
key the consumer provides when binding to the exchange.)
● Consumers (Consumers declares a queue, binds it to a exchange and receives
messages from it.)
● Queues ( Queues receive messages sent to exchanges. The queues are declared by
consumers. )
● Routing keys ( Every message has a routing key. The interpretation of the routing
key depends on the exchange type. There are four default exchange types defined by
the AMQP standard, and vendors can define custom types (so see your vendors
manual for details )
● Exchange types defined by AMQP/0.8:
● Direct exchange ( Matches if the routing key property of the message and the
routing_key attribute of the consumer are identical. )
● Fan-out exchange(Always matches, even if the binding does not have a routing
key.)
● Topic exchange (Matches the routing key property of the message by a primitive
pattern matching scheme.)
- 30. Creating Connetion on Django
Settings.py
RABBITMQ_HOST = 'localhost'
RABBITMQ_PORT = 5672
RABBITMQ_USER = 'guest'
RABBITMQ_PASS = 'guest'
RABBITMQ_VHOST = '/'
Views.py
from carrot.messaging import Publisher, Consumer
from carrot.connection import AMQPConnection
from django.conf import settings
conn_for_carrot =
AMQPConnection(hostname=settings.RABBITMQ_HOST,
port=settings.RABBITMQ_PORT,
userid=settings.RABBITMQ_USER,
password=settings.RABBITMQ_PASS,
vhost=settings.RABBITMQ_VHOST)
- 31. Publisher
publisher = Publisher(connection=conn_for_carrot,
exchange='website_monitoring_exchange', exchange_type = 'direct')
publisher.send({'msg':{'do': 'check',
'task_id':task_id,
}
})
publisher = Publisher(connection=conn_for_carrot,
exchange='website_monitoring_exchange', exchange_type = 'direct')
publisher.send({'msg':{'do': 'check',
'task_id':hashlib.md5(str(task_id)
+request.PUT.get('url')).hexdigest(),
}
})
- 32. Consumer
def monitoring_check():
def call(message_data,message):
if message_data['msg']['do'] == 'check':
print '[+] receiving message'
message.ack()
task_id = message_data['msg']['task_id']
get_pid = subprocess.Popen(['python','scraper.py', task_id])
pid = get_pid.pid
collection_task.update({'task_id':task_id}, {'$set': {'status':'RUNNING',
'pid':pid}})
print '[Starting PID:%s]'%pid
get_pid.wait()
else:
message.ack()
queuename = 'website_monitoring_checker'
consumer = Consumer(connection=conn_for_carrot, queue=queuename,
exchange='website_monitoring_exchange', exchange_type = 'direct')
consumer.register_callback(call)
try:
print '[queue:%s]consume..' % queuename
consumer.wait()
except Exception, err:
print err
- 33. Cooking soup with beautifullsoup?
from BeautifulSoup import BeautifulSoup
monitor = collection_monitor.find_one({'pk':pk})
contents = [collection_content.find({'url':str(monitor['url'])})
[1],collection_content.find({'url':str(monitor['url'])})[0]]
texts = BeautifulSoup(BeautifulSoup(i['content']).prettify()).findAll(text=True)
data = {'content': ' '.join(filter(visible, texts)),
'datetime': i['datetime'],
}
def visible(element):
if element.parent.name in ['style', 'script', '[document]', 'head', 'title']:
return False
if re.search('<!--', str(element)) or re.search('-->', str(element)) or
re.search(' ', str(element)):
return False
return True
- 34. Alert by email !
def sending_email(to,sub,msg):
try:
gmail_user = 'romanticdevil.jimmy@gmail.com'
gmail_pwd = '***************'
smtpserver = smtplib.SMTP("smtp.gmail.com",587)
smtpserver.ehlo()
smtpserver.starttls()
smtpserver.ehlo
smtpserver.login(gmail_user, gmail_pwd)
header = 'To:' + to + 'n' + 'From: Website-Monitoring <'+gmail_user+'>n' +
'Subject: %sn'%sub
msg = header + msg
smtpserver.sendmail(gmail_user,to, msg)
smtpserver.close()
except Exception ,err :
print err
- 35. Task / Scheduling Checking ?
task_id = sys.argv[1]
print task_id
raw_delay = collection_task.find_one({'task_id':task_id})['schedule']
print raw_delay
if raw_delay == "1":
delay = 60*60
elif raw_delay =="12":
delay = 720*60
else:
delay = 1440*60
while True:
try:
print '[+] Starting task: %s' %sys.argv[1]
log(task_id, 'INFO', 'starting session')
main()
except Exception, err:
log(task_id, 'exception', err)
print err
collection_task.update({'task_id':task_id}, {'$set': {'status':'STOPPED', 'pid':None}})
log(task_id, 'INFO', 'updating database [status:STOPPED]')
else:
collection_task.update({'task_id':task_id}, {'$set': {'status':'SLEEP', 'pid':None}})
log(task_id, 'INFO', 'updating database [status:SLEEP] for %s sec' %delay)
time.sleep(delay)
- 36. Django-Piston
( A mini-framework for Django but powerfull for creating RESTful APIs )
https://bitbucket.org/jespern/django-piston/wiki/Home
● Ties into Django's internal mechanisms.
● Supports OAuth out of the box (as well as Basic/Digest or custom auth.)
● Doesn't require tying to models, allowing arbitrary resources.
● Speaks JSON, YAML, Python Pickle & XML (and HATEOAS.)
● Ships with a convenient reusable library in Python
● Respects and encourages proper use of HTTP (status codes, ...)
● Has built in (optional) form validation (via Django), throttling, etc.
● Supports streaming, with a small memory footprint.
● Stays out of your way.
- 37. How to ?
Include on urls.py
url(r'^api/', include('api.urls')),
Include on settings.py
INSTALLED_APPS = (
….......
'api',
Create folder name /api/ on project
directory and file.
-API/
-----handlers.py
-----__init__.py
-----urls.py
- 38. Rest API'S urls.py
from django.conf.urls.defaults import *
from piston.resource import Resource
from piston.authentication import HttpBasicAuthentication
from api.handlers import *
auth = HttpBasicAuthentication(realm="website-monitoring")
ad = { 'authentication': auth }
main = Resource(handler=Main, **ad)
monitor = Resource(handler=Monitor, **ad)
urlpatterns = patterns('',
url(r'^(?P<obj_id>[^/]+)/$', main),
url(r'^monitor/(?P<obj_id>[^/]+)/$', monitor),
)
- 39. Rest API'S handlers.py
from piston.handler import BaseHandler
class Main(BaseHandler):
allowed_methods = ('GET')
def read(self, request, obj_id):
data = collection_user.find_one({'pk': obj_id})
if data:
return data
data = collection_monitor.find_one({'pk': obj_id})
if data:
return data
- 40. class Monitor(BaseHandler):
allowed_methods = ('GET', 'PUT', 'DELETE')
fields = ('url', 'status', 'hit', 'fail_hit', 'year', 'month', 'day', 'hour', 'email', 'period', 'diff')
def read(self, request, obj_id):
try:
if obj_id == 'all':
data = list(collection_monitor.find({'username': str(request.user)}))
elif obj_id =="status_running":
data = list(collection_monitor.find({'status':'running'}))
….........
except Exception, err:
return rc.BAD_REQUEST
return data
def update(self, request, obj_id):
try:
if obj_id == 'create':
url_list = []
for i in collection_monitor.find({'username': str(request.user)}):
url_list.append(i['url'])
if request.PUT.get('url') in url_list:
print '[+] Url is exist '
print '[+] Data will be Update '
else:
raise Exception
except Exception, err:
print err
return rc.BAD_REQUEST
…......................
- 41. def delete(self, request, obj_id):
try:
if obj_id == 'all':
for i in collection_monitor.find({'username': str(request.user)}):
collection_monitor.remove({'username': str(request.user)})
else:
if collection_monitor.find_one({'pk': obj_id}):
collection_monitor.remove({'pk': obj_id})
except Exception, err:
print err
return rc.FORBIDDEN
else:
print 'deleted'
return rc.DELETED
- 42. Facebook Integration ?
● Just for lazy people
● You don't have to fill the register form just login
in to your facebook then klick – klick & klick .
● Good for bussiness marketing
● Easy integrate & Etc
● Download :
● git clone
http://github.com/dickeytk/django_facebook_oauth.git
- 43. Question ?
● Twitter :@jimmyromanticde
● Facebook:https://www.facebook.com/jimmy.ro
mantic.devil
● Email : romanticdevil.jimmy@gmail.com
● Bitbucket:
https://bitbucket.org/jimmyromanticdevil/
● Blog : http://jimmyromanticdevil.wordpress.com
- 44. References
http://www.python.org
https://www.djangoproject.com
http://www.mongodb.org
http://www.rabbitmq.com
http://pypi.python.org/pypi/pymongo
https://github.com/ask/carrot/
https://bitbucket.org/jespern/django-piston/wiki/Home
http://github.com/dickeytk/django_facebook_oauth.git
Life in a Queue “Tareque Hossain”
Google “Message Queue”