1

In our project we are using Django and Django Rest Framework as main application to get/query the data from database and send it to the frontend. Those endpoints are very fast as they should be. However we need to implement some sort of ETL module that would get data form excel files or clients CRMs and that would take longer time to run. However in the end the models are saved in the tables that are used by the DRF and sent to the frontend.

The idea was to create separate application (as microservice) maybe in Flask or FastAPI or even other tech stack (not python) that would only do the ETL things like loading data and saving it to database. It would then use some other heavy tools like Pyspark or airflow if needed. Django would then send request to this microservice using Messege Queue (Celery+RabbitMQ) so that main app is not busy at all with this stuff. I know I can use task queue within one django project too, but this way I could have more control of how ETL looks like (even to the point of going with diferent language/framework)

The thing is when thinking about it I am not sure if this is worth it, because this way I need to write models+schemas in the Flask app that needs to mirror the same models that I created in Django since the ETL saves the data that is later used by the Django app. That creates double amount of work + later maintaining, taking care of model changes if both apps. Also the two apps share database and all the database tables which from my experience can be quite a burden.

Im not sure what is the consensus way in such situation, but do you think that I should pursue the microservice approach or just implement ETL as new app in my django project and put all tasks in task queue?

4
  • Django would then send request to this microservice for what?
    – Laiv
    Commented Aug 4, 2021 at 7:29
  • To start the ETL process initiated by the user on frontend (on demand data load).
    – Alex T
    Commented Aug 4, 2021 at 7:48
  • Why don't you send the file directly to the ETL?
    – Laiv
    Commented Aug 4, 2021 at 7:55
  • I don't know how much time you've spent using or researching FastAPI but, aside from the asyncio performance benefits, it leverages Pydantic in a lot of interesting and useful ways. Pydantic, however, can be used in any app running on a recent version of Python. These Pydantic models solve a lot of problems with very little effort and you can share their definitions across multiple apps.
    – JimmyJames
    Commented Aug 4, 2021 at 17:45

1 Answer 1

4

Using a microservice architecture is mostly about two things:

  1. independent development
  2. lean scaling (with as little software on each new server as possible)

Independent development is likely not relevant if your team is small.
Lean scaling is likely not relevant if your codebase is modest.

If both of these are true in your case, go the straightforward way:

  • integrate the functionality into the existing codebase
  • and perhaps deploy it onto an ETL-only server eventually,
  • but before you do, remember that "premature optimization is the root of all evil" (Donald Knuth) and therefore you should "make it work first before you make it work fast" (general programming proverb).

Not the answer you're looking for? Browse other questions tagged or ask your own question.