-1

enter image description here

What are the pros and cons of using agile/iterative approach in ETL/ELT (Extract Transform Load or Extract Load Transform) data warehouses/data lakes/lakehouses systems development?

I often find that many business analysts / project managers tend to plan ingesting all data first, then building all semantic data and other horizontal layers only then build reports and go to the business. I have drawn the diagram in endeavour to to show the differences between vertical slices planning and horizontal ETL/ELT implementation.

Can ETL development be the area where agile approaches bring value to avoid rework risks? Should we ingest all sources first (blue arrows) or should we prioritise implementation of one vertical piece (green arrows) ?

5
  • Can I please find the reasons for down-voting? Is there any way I can see the users who down-voted? I had searched for analogous questions but could not find the answer. I feel like different opinions and experience here can be a valuable input. Commented Apr 20, 2021 at 5:40
  • 1
    It’s not me :-) The question is very interesting, but it is formulated as a subjective question. This is probably the reason for the downvoted; There are already 3 close votes. Could you close this one and reformulate it in a more objective way (e.g is is suitable, what are the pros and cons, ...), calling not for opinions/experience but for objective arguments?
    – Christophe
    Commented Apr 20, 2021 at 7:09
  • Thanks, Christophe. I am afraid stackexchange does not like closing questions even if they have been downvoated so I am editing. To be honest I don't understand what is wrong about sharing experience and opinions. I (and hopefully I am not completely alone) really want to know everyone's experience. Isn't it how science progresses after all ? ... But I will call for pros and cons if that helps. Commented Apr 20, 2021 at 8:15
  • 1
    @EugeneLycenok: not all downvotes are objective, and not all downvotes have a valid reason. One of the reasons you could get the downvotes is because of the formatting. The question starts with a badly cut screenshot of a diagram, which looks messy and dirty, and follows with a paragraph which looks like a wall of text. Then there is a typo in the title of the question. Small things like that, or the fact that you don't tell what is an ETL (and what's an ELT), can encourage the other users to downvote it. Commented Apr 20, 2021 at 9:36
  • 1
    Agile != Not Planning. Commented Apr 20, 2021 at 13:43

1 Answer 1

1

Setting up an ETL is not different from any development task; therefore, proceed in Agile way, unless there are serious reasons not to use Agile.

Note that agility doesn't imply that you don't need to think/talk about what will be added to the ETL in the future. If you know that you'll need to import data from three sources, it may be useful to check first what are those sources, how the information is organized inside, what is the information, etc. Sometimes, this would give you an opportunity to save some time.

Here's an example. Imagine that you have two sources. You start with the first one which has a list of employees, with some basic information available. For every employee, you need to show his country, but the source doesn't give the country, so you need to deduce it from the email address, the city, or some other information. The second source, however, contains much more information about the employees. If you knew that, you would simply wait until you use this second source in order to populate the country field. Time saved.

Be cautious, however, not to take as granted that all the sources will necessarily be used, and that none of the sources will change. Business requirements are often volatile, and more often than not, stakeholders imagine that an ETL would handle exabytes of data, use one hundred sources, and do magic. Keep YAGNI in mind, and try to stick to the sources that you are very likely to use in the next sprints. This way, you will avoid rework due to the fact that you expected a given source to be used, but the business decided otherwise a few sprints later.

Not the answer you're looking for? Browse other questions tagged or ask your own question.