1

What is the best way to vertically combine two large tables of the same structure. Each table is about 2 mln rows. Is there any performance advantage to do it in M, not in DAX?

M approach

BigTable_M  = Table.Combine( {Table1, Table2} )

DAX approach

BigTable_DAX = UNION ( 'Table1', 'Table2' )

I have a feeling that M way loads the tables two times. Separately each primary source (Table1 and Table2) and then again both tables while loading rows to BigTable_M. Is there any reason to suffer this double load for better performance later?

By this article it seems that M is faster.

https://www.sqlbi.com/articles/comparing-dax-calculated-columns-with-power-query-computed-columns/

2 Answers 2

4

Best practice would be to do it in M/Power Query first, before loading the data to the data model. You always want the data model to be quick and responsive with as little overhead in calculations. I always recommend working from the lowest level,for example, if you can do it in the source do it there, then if you can't do it there do it in Power Query, and as a last resort do it in the Dax/Power Pivot part. This works well if you are working with a database, as you let the technology designed to do the heavy lifting/shifting of data, rather then do it all in Power BI.

If you are working with files, then it would be best to do it in the Power Query part were possible, and again let the Power Pivot engine be as quick as possible.

When consulting with clients data models, both Power BI and Analysis services, most of the trouble comes from doing stuff in the data model, rather than doing it before then. For example, data type transformations, string replacement, iterative calculations, ranking etc that would be best placed to do long before it hits the model.

1
  • 1
    I would be grateful for pointing to documentation stating that M is faster for that purpose than DAX. Alternatively sharing personal experience of measuring the performance with Daxstudio. Commented Apr 25, 2020 at 17:18
2

Doing it in the query editor, you can choose to only load the combined table into your data model while having Table1 and Table2 exist merely as staging tables. That should address your concern about loading the tables twice.

I'd expect combining them in M would result in better compression (though the difference might not be very much). Combining in M would also allow for query folding in some situations.

3
  • How to set it up? Should I disable Enable Data Load for Table1 and Table2? Should I hide them in Data Model? Should I have them ticked Include In Report Refresh? Commented Apr 30, 2020 at 15:22
  • Uncheck "Enable load" for Table1 and Table2. Then they won't load to the Data Model so you won't have to worry about hiding them. The big table still needs to load to the model and you need to keep "Include in report refresh" checked for Table1 and Table2 if you intend for the big table to be refreshable. Commented Apr 30, 2020 at 15:27
  • I have a terrible feeling that using the M approach reads the data from server twice. Could it be so? I expected that BigTable_M = Table.Combine( {Table1, Table2} ) would be done "on the fly" since both previous tables are already sucked from server. During the refresh it shows BigTable_M xxx rows from server which amuses me. Commented Jul 14, 2020 at 15:45

Not the answer you're looking for? Browse other questions tagged or ask your own question.