I came to ask this question after coding in a script component for some time, testing mainly how to loop over columns of the Row
object, see Looping Through Columns in SSIS Script Component - Stack Overflow and the remark below of it that asks me to open a new question. I also checked thoroughly the even older question on the same thing at Apply row transformation for multiple input columns in Script Transformation so that I found out how to loop through the columns. This is therefore not the question here.
I migrate some SQL Agent jobs to SSIS. The Agent jobs run SQL and have a lot of nested select
:s or CTE:s. My aim is to take each given SQL code and split it by each nested query level so that each level catches the eye at first sight when looking at the SSIS Data Flow Task. Step by step is better than looking at large nested SQL queries.
I am not the first to ask questions about staying inside the DFT and changing the data source, see SSIS data flow to update source table rows after copying to destination, though that question is about changing the source table itself while this question is about putting changes on top of the source table and making that the new output within the DFT.
Take this pattern:
select abc.*, abc.column1 + abc.column2 as column3 from (select * from xyz) abc
Or with a CTE:
with abc as (
select * from xyz
)
select abc.*, abc.column1 + abc.column2 as column3 from abc
The two are the same, and I try to split them into two steps in the Data Flow Task (DFT), one after another, so that there is at best just one level of the SQL query for each nested query level:
OLE DB Source "DataSourceAbc":
select * from xyz
Script component:
create ##tmpDataSourceAbc as (...);
select abc.*, abc.column1 + abc.column2 as column3 from ##tmpDataSourceAbc;
It will be often just about adding a few columns or changing the columns a bit, or making a lookup or the like, and for this, SSIS or third parties give you tools like Derived Column or Lookup components. But I want to avoid rewriting the SQL code that I have at hand into the SSIS language with its own data type. Call me lazy or afraid, but I would like to take over the SQL code as it is instead of making it SSIS code.
What is more, I would like to take the built-in tools that allow SQL and avoid script components where I can:
- That is why I take the "OLE DB Source" at the beginning to fetch data with SQL from the server instead of just selecting into a temporary table inside the script component.
- And what does not work is to make a temp table in a Data Source and pass that further downstream since it gets lost right after leaving the Data Source item, see Is it possible to use a temp table in a data flow source?.
And since I want to work further downstream on the output, I cannot just attach a Data Destination since that item does not have an output arrow. Instead, I need a Script Component that has both input and output arrow so that it can pass all of the columns further downstream. I want to stay inside the DFT all the time since the the query that I put into SSIS is also just one sort of SQL data flow task in a big stored procedure of many steps.
Therefore, the plan is to make a script component that takes up the data from an OLE DB Source (for example), then makes a temporary table and fills that temporary table with the data source.
The aim after this and which is not in the question anymore is that afterwards, I want to add a new column or change something on top of the temporary table, and since it is a temporary table, the new column can be done with mere SQL inside the same DFT.
What I know but still do not want to do:
I know how to select data into a temporary table with SQL in C#, see Select Into Explanation / Temporary Tables. But that would mean to have a SQL that fills the tempoary table while I want to fill it with the imported data that I got from a component that is upstream in the data flow.
I also know that I can make the temporary table in the control flow, see How to create a temporary table in SSIS control flow task and then use it in data flow task?. Yet, I want the temporary table to be made on the run in the DFT right when I need it, like a nested query or CTE that is just replaced by a temporary table query. Then, the package is more readable if I keep oversight over the whole data flow in on DFT without going back to the Control Flow.
Thus, the plan is:
- A DataTable stores the data from the input rows with all its original columns.
- In the PreExecute method, dynamically create columns in the DataTable based on the input columns coming in from the Data Source.
- In the Input0_ProcessInputRow method:
- fill the DataTable with each input row.
- In the PostExecute method, insert all rows from the DataTable into the temporary table.
And after this, which is not in the question anymore:
- Add a column in that temporary table.
- Change something in that temporary table.
- Save the temporary table to a steady table to save the work.
This would all be one script component that adds a column to the data source.
But I struggle with mapping the Row object columns with the DataTable columns. Thus, the question:
Has anybody found out how the columns of the Row object can be mapped to a DataTable that I take to fill the temporary table? Something like:
DataTable dt = new DataTable();
DataRow dr = dt.NewRow();
...
foreach (var c in columns){
dr[c] = c;
}
dt.Rows.Add(dr);
I get for this code: "Column 'myCol' does not belong to table ." Thus, the c
which was a column name of the Row
object, was not found in the DataRow
object dr
of the DataTable
object dt
.
The Row object columns do not or not always match the column name of the DataTable. How do I paste all columns of the Row
object in the DataTable for each row so that in the end, I can copy the full data from the DataTable to the temporary table?